Results 1 to 6 of 6
I've been searching high and low for this...but, maybe I'm just missing something. I have a file to be sorted that, unfortunately, contains binary data at the end of the ...
- 10-10-2011 #1Just Joined!
- Join Date
- Oct 2011
- Posts
- 3
sort file specifying record length
I've been searching high and low for this...but, maybe I'm just missing something. I have a file to be sorted that, unfortunately, contains binary data at the end of the line. As you may guess, this binary data may contain a newline character, which messes up the sort. I think I could resolve this by letting 'sort' know the record length (line length) of each line, but I can't seem to find out how to do that. Is there a way to do this? If so, can someone point me the way?
Thanks in advance!!
- 10-10-2011 #2
Are you sure this extra data isn't added by writing the file on another platform, such as on Windwos? The text file line endings are different - but to fix it you either need to strip the extra character by hand or run the file through a file fixing tool like 'dos2unix'.
Linux user #126863 - see http://linuxcounter.net/
- 10-10-2011 #3Just Joined!
- Join Date
- Oct 2011
- Posts
- 3
Nope, it's creating it and sorting it all on Linux. It's sorting on other text fields in the record, I just need this binary data to be carried along for the ride.
- 10-11-2011 #4Just Joined!
- Join Date
- Sep 2007
- Posts
- 51
Please send data sample
Can you send a sample of the data that you are referring to?
- 10-11-2011 #5Just Joined!
- Join Date
- Sep 2007
- Posts
- 2
I've seen many fixed length records containing both binary and text data, usually on mainframes.
Unfortunately, the Linux sort is designed to sort lines, and implicit in that is that a line-end character is always required. Whether you chose the normal Linux line-end character 0x0a or null 0x00, that value could be found in the binary part of your data. Also, text data sometimes contains the line-end character. In either case, that would throw off the Linux sort. If you know that one of those values will never be found in your binary or text data, you may be able to use dd with cbs= to split the file into lines using that value as a line-end character at the end of each fixed-length record; sort the lines; then convert the sort output back to fixed length records. Not very efficient, and you need to really know the data characteristics.
Why don't you explain more about the data.
- 10-11-2011 #6Just Joined!
- Join Date
- Oct 2011
- Posts
- 3
Thanks for your responses! My data basically is text with binary data at the end of the line...representing integers. I've decided to resolve this by writing a small perl script to do the job...and, it seems to be working like a charm. Again, thanks for the responses!


Reply With Quote