Find the answer to your Linux question:
Results 1 to 9 of 9
Hi, guys Currently, I'm working on personal project. and I'm kinda stuck. What I want to do is that open a file, and edit that file (deleting unwanted characters). The ...
  1. #1
    Just Joined!
    Join Date
    Mar 2010
    Posts
    7

    [C] Deleting characters in file

    Hi, guys

    Currently, I'm working on personal project. and I'm kinda stuck.
    What I want to do is that open a file, and edit that file (deleting unwanted characters).

    The problem arises after I deleted unwanted characters, the file still has
    the same length of the original one.

    Let's assume that we have a file with "1234" in it.
    I deleted "3" ( I overwrite "\0" ) so now when I check the file, it's 124
    But when I check the length, the both have the same size as 4

    Here is an example source code

    int length, length2;
    num = open("a.dat", 2)

    length = lseek(num, 0, 2); // Initial length

    lseek(num, 2, 0); // editing
    write(num, "\0", 1);

    length2 = lseek(num, 0, 2); // Final length

    close(num);

    when I print those values those are exactly the same. Length2 should be one less than length, but the both are 4.

    What's wrong in m code? Am I supposed to use different character rather than "\0"?

    Thanks in advance

  2. #2
    Linux Newbie
    Join Date
    Mar 2010
    Posts
    121
    You have, ironically, answered your own question:

    I deleted "3" ( I overwrite "\0" )
    You've overwritten '3' with '\0', not deleted '3'. For example a hexdump of the original file (with the "hd" command) would show:

    Code:
    00000000  31 32 33 34                                       |1234|
    and the modified file would be:

    Code:
    00000000  31 32 00 34                                       |12.4|
    The '\0' is not printed out.

    In order to do what you want, you have to open the file, get it's contents, and rewrite the file with only the characters you want.

  3. #3
    Just Joined!
    Join Date
    Mar 2010
    Posts
    7
    thanks John, now I figured what the issue is.

    Then, your answer leads me to another question.
    Is it the only way that I can edit contents of file?
    I really don't want to make a copy of the file.

    The thing is that file size might be very large. I mean, if file size gets larger, copy process might affect to the entire program in terms of performance.

    So, Is there any other options?

  4. #4
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    This is a typical space vs. performance issue. You can move the rest of the file up one character (byte) when you delete one, but that means you are potentially rewriting a lot of data - disc I/O is expensive. If the file is small enough, you can read it into memory in its entirety and only write back to disc when you have finished deleting all the characters you want to remove. In that case, you can simply removed them in your in-memory buffer with the NUL (\0) byte, and skip over those when you rewrite the data to disc. That would require enough memory to hold the entire file, but if it's small enough to not require use of swap space, then that would be the most efficient method.

    So, as I said, this is a trade off of memory vs. performance. If the file may be quite large in that you cannot justify copying it into RAM first, then the method mentioned of copying it to another file is reasonable. You copy only the data you want to keep to a new file, delete the old file, and rename the new file with the old name.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  5. #5
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    136
    I wish to tell you that the kernel and filesystem deceive you in some cases As it described in the kernel tutorial, the null-bytes in file can be in any place at file, and it will be included in displayed file size, but the filesystem would never store them itself. It means that sequences of null-bytes inside a file aren't borrow any disk space.

  6. #6
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Quote Originally Posted by Schmidt View Post
    I wish to tell you that the kernel and filesystem deceive you in some cases As it described in the kernel tutorial, the null-bytes in file can be in any place at file, and it will be included in displayed file size, but the filesystem would never store them itself. It means that sequences of null-bytes inside a file aren't borrow any disk space.
    Wrong. This may be true if the file was opened in Windows as a text file, but that isn't the case in Linux/Unix systems. In the case of Linux/Unix where all file opens are treated alike, and the file is just a sequence of bytes, including null ones.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

  7. #7
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    136

    Post

    Quote Originally Posted by Rubberman View Post
    Wrong. This may be true if the file was opened in Windows as a text file, but that isn't the case in Linux/Unix systems. In the case of Linux/Unix where all file opens are treated alike, and the file is just a sequence of bytes, including null ones.
    maybe.. I couldn't argue with you, but I had remembered just now 2 issues about files.. The first one is the same as you said - the files in linux are interpreted just as a raw byte sequences. But there was another one in my memory - something about VFS...as a unified interface to access hard disks. Can it in any way proceed and filter data before writing on a disk, or this is just a myth, or my dream?..:)

  8. #8
    Linux Newbie
    Join Date
    Apr 2010
    Location
    Novosibirsk, Russia
    Posts
    136

    Post

    Quote Originally Posted by avanwz View Post
    thanks John, now I figured what the issue is.

    Then, your answer leads me to another question.
    Is it the only way that I can edit contents of file?
    I really don't want to make a copy of the file.

    The thing is that file size might be very large. I mean, if file size gets larger, copy process might affect to the entire program in terms of performance.

    So, Is there any other options?
    so, you can proceed just one file :) for example, you open this file, read it in some buffer, filter data and pass it to another (result) buffer (everything being executed just in your RAM and fast enough ;) ), then close the file, open it again with O_TRUNC flag, and simply write filtered content to it :)

  9. #9
    Linux Guru Rubberman's Avatar
    Join Date
    Apr 2009
    Location
    I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
    Posts
    8,974
    Since you are actually removing data from the file, it is easiest to do a single pass thru the file. Open the file for read/write access. Scan to the first character to remove and set that as your position to write to. Using a reasonably sized buffer, skip over the bytes you are removing, and write the others to the buffer until it is full, then after seaking back to the stored position, write the buffer as a chunk back to the file and update the stored write position by incrementing it with the number of bytes you saved. You also want to keep track of where you last read from, seek to that position and continue the cycle of seek to last read, read + skip, seek back to last written, write data, and loop back to seek to last read again... until you are done. Finally, seek back to past the end of the last write and truncate the file there. This process is as efficient in both the space as well as the time perspective that you can get. The seeks take a bit of time, but since it is likely that the disc driver has cached those sectors of the disc since you just read+wrote them and it does do some read-ahead activities as well, you are not likely to be very much impacted by that issue.
    Sometimes, real fast is almost as good as real time.
    Just remember, Semper Gumbi - always be flexible!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...