Results 1 to 2 of 2
Quickest method to read large ASCII file.
I have large ascii files. Each line is of fixed length (80 characters). Each line has a number of pieces of data (~1 ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 08-14-2012 #1Just Joined!
- Join Date
- May 2006
- Posts
- 14
C/C++ Fast reading large ASCII data file to array or struct
Quickest method to read large ASCII file.
I have large ascii files. Each line is of fixed length (80 characters). Each line has a number of pieces of data (~1
of varying type, char, int, double of varying width but the widths are fixed from line to line.
Example:
The first character denotes the type of data to follow on the rest of the line (in this case S and R).Code:R 1 80541.39366165.3 6.8 2 80552.99366160.7 6.8 3 80564.69366156.1 6.81 R 4 80576.29366151.5 6.8 5 80587.99366146.9 6.8 6 80599.69366142.4 6.81 R 7 80611.29366137.8 6.9 8 80622.99366133.2 6.9 9 80634.69366128.7 6.91 S2604I1-151 11 5229 54323.17S 111244.02E 80284.49366012.1 874.7329 846 7
The method I have working at the moment is to read the entire file into a *char buffer, this is very fast.
The next step is to work through each line and put the data into the appropriate arrays, in this case if the first character is an S. Extract the char of appropriate length, convert (using helper function extract...) to the right data type (int, double,char) and increment the buffer pointer, if the line does not begin with S skip to next line in buffer.Code:ipf = fopen(“myfile”, "r"); //determine the size of the file to allocate memory fseek(ipf,0,SEEK_END); fileSize = ftell(ipf); fseek(ipf,0,SEEK_SET); //now create array to take the input data from file. buffer = (char*) malloc(sizeof(char)*fileSize); //read content of file to memory newFileSize = fread (buffer,1,fileSize,ipf);
Here is an extract of the code.
As I said the initial read of the file to the buffer is very quick, the slow part is the iterating over the buffer to extract the data to the individual arrays. Is there a quicker and/or more efficient way to do this?Code:while (p<newFileSize){ if(buffer[0] == 'S') { Name[i] = extractChar(buffer+=1,1,12); ID1[i] = extractChar(buffer+=15,1,1); ID2[i] = extractChar(buffer+=1,1,1); Fid[i] = extractChar(buffer+=2,1,6); Don[i] = extractInt(buffer+=6,1,2); Mon[i] = extractInt(buffer+=2,1,2); Sos[i] = extractDoub(buffer+=2,1,5); n_s[i] = extractChar(buffer+=5,1,1); etc... i++; }else { //skip to next line buffer+=81 } p=p+81; }
As a bench mark I have a 85mb file with about 1.1 million lines and this takes about 15 sec to process. This is not too bad but it would be nice if I could speed this up if possible.
Thanks,
Pete.
- 08-14-2012 #2
It looks pretty quick as it is. My suggestions would be to try and avoid copying strings in memory (extractChar may do that), and don't try to process doubles (extractDoub) as a double precision floating point - read it as a fixed point number or even as a string and convert it when you need to use it.
You may even find it easier to not convert any of this file at all, but have a container C++ class into which the file is loaded, internally it marks all the lines, and prepares to read off values as they're processed instead of doing it all the donkey work when you read the file in. That way, if you have any redundant values in the data, you're only processing the ones that you actually want to use. You could also mark off the string values by setting a pointer to the beginning of the string and inserting a static \0 into the data stream at the end of the string - you wouldn't ever copy the string value then, only read it off.
If you're housing this in a class to contain it, you can set up accessors that read off each datum by line and name, and it could even return pointers to const when you need to access the strings inside it. You may want to consider using a std::vector<unsigned char> to store the data - it's a pretty quick handler to store the data, guarantees its contiguity, and gives you for free some of the quick STL methods for processing it.Linux user #126863 - see http://linuxcounter.net/


Reply With Quote
