Results 1 to 9 of 9
Hi, guys
Currently, I'm working on personal project. and I'm kinda stuck.
What I want to do is that open a file, and edit that file (deleting unwanted characters).
The ...
- 03-31-2010 #1Just Joined!
- Join Date
- Mar 2010
- Posts
- 7
[C] Deleting characters in file
Hi, guys
Currently, I'm working on personal project. and I'm kinda stuck.
What I want to do is that open a file, and edit that file (deleting unwanted characters).
The problem arises after I deleted unwanted characters, the file still has
the same length of the original one.
Let's assume that we have a file with "1234" in it.
I deleted "3" ( I overwrite "\0" ) so now when I check the file, it's 124
But when I check the length, the both have the same size as 4
Here is an example source code
int length, length2;
num = open("a.dat", 2)
length = lseek(num, 0, 2); // Initial length
lseek(num, 2, 0); // editing
write(num, "\0", 1);
length2 = lseek(num, 0, 2); // Final length
close(num);
when I print those values those are exactly the same. Length2 should be one less than length, but the both are 4.
What's wrong in m code? Am I supposed to use different character rather than "\0"?
Thanks in advance
- 03-31-2010 #2Linux Newbie
- Join Date
- Mar 2010
- Posts
- 121
You have, ironically, answered your own question:
You've overwritten '3' with '\0', not deleted '3'. For example a hexdump of the original file (with the "hd" command) would show:I deleted "3" ( I overwrite "\0" )
and the modified file would be:Code:00000000 31 32 33 34 |1234|
The '\0' is not printed out.Code:00000000 31 32 00 34 |12.4|
In order to do what you want, you have to open the file, get it's contents, and rewrite the file with only the characters you want.
- 04-01-2010 #3Just Joined!
- Join Date
- Mar 2010
- Posts
- 7
thanks John, now I figured what the issue is.
Then, your answer leads me to another question.
Is it the only way that I can edit contents of file?
I really don't want to make a copy of the file.
The thing is that file size might be very large. I mean, if file size gets larger, copy process might affect to the entire program in terms of performance.
So, Is there any other options?
- 04-01-2010 #4Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
This is a typical space vs. performance issue. You can move the rest of the file up one character (byte) when you delete one, but that means you are potentially rewriting a lot of data - disc I/O is expensive. If the file is small enough, you can read it into memory in its entirety and only write back to disc when you have finished deleting all the characters you want to remove. In that case, you can simply removed them in your in-memory buffer with the NUL (\0) byte, and skip over those when you rewrite the data to disc. That would require enough memory to hold the entire file, but if it's small enough to not require use of swap space, then that would be the most efficient method.
So, as I said, this is a trade off of memory vs. performance. If the file may be quite large in that you cannot justify copying it into RAM first, then the method mentioned of copying it to another file is reasonable. You copy only the data you want to keep to a new file, delete the old file, and rename the new file with the old name.Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 04-06-2010 #5Linux Newbie
- Join Date
- Apr 2010
- Location
- Novosibirsk, Russia
- Posts
- 136
I wish to tell you that the kernel and filesystem deceive you in some cases
As it described in the kernel tutorial, the null-bytes in file can be in any place at file, and it will be included in displayed file size, but the filesystem would never store them itself. It means that sequences of null-bytes inside a file aren't borrow any disk space.
- 04-06-2010 #6Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!
- 04-06-2010 #7Linux Newbie
- Join Date
- Apr 2010
- Location
- Novosibirsk, Russia
- Posts
- 136
maybe.. I couldn't argue with you, but I had remembered just now 2 issues about files.. The first one is the same as you said - the files in linux are interpreted just as a raw byte sequences. But there was another one in my memory - something about VFS...as a unified interface to access hard disks. Can it in any way proceed and filter data before writing on a disk, or this is just a myth, or my dream?..:)
- 04-06-2010 #8Linux Newbie
- Join Date
- Apr 2010
- Location
- Novosibirsk, Russia
- Posts
- 136
so, you can proceed just one file :) for example, you open this file, read it in some buffer, filter data and pass it to another (result) buffer (everything being executed just in your RAM and fast enough ;) ), then close the file, open it again with O_TRUNC flag, and simply write filtered content to it :)
- 04-06-2010 #9Linux Guru
- Join Date
- Apr 2009
- Location
- I can be found either 40 miles west of Chicago, or in a galaxy far, far away.
- Posts
- 8,974
Since you are actually removing data from the file, it is easiest to do a single pass thru the file. Open the file for read/write access. Scan to the first character to remove and set that as your position to write to. Using a reasonably sized buffer, skip over the bytes you are removing, and write the others to the buffer until it is full, then after seaking back to the stored position, write the buffer as a chunk back to the file and update the stored write position by incrementing it with the number of bytes you saved. You also want to keep track of where you last read from, seek to that position and continue the cycle of seek to last read, read + skip, seek back to last written, write data, and loop back to seek to last read again... until you are done. Finally, seek back to past the end of the last write and truncate the file there. This process is as efficient in both the space as well as the time perspective that you can get. The seeks take a bit of time, but since it is likely that the disc driver has cached those sectors of the disc since you just read+wrote them and it does do some read-ahead activities as well, you are not likely to be very much impacted by that issue.
Sometimes, real fast is almost as good as real time.
Just remember, Semper Gumbi - always be flexible!


Reply With Quote
