Find the answer to your Linux question:
Results 1 to 6 of 6
I've been searching high and low for this...but, maybe I'm just missing something. I have a file to be sorted that, unfortunately, contains binary data at the end of the ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Oct 2011
    Posts
    3

    sort file specifying record length


    I've been searching high and low for this...but, maybe I'm just missing something. I have a file to be sorted that, unfortunately, contains binary data at the end of the line. As you may guess, this binary data may contain a newline character, which messes up the sort. I think I could resolve this by letting 'sort' know the record length (line length) of each line, but I can't seem to find out how to do that. Is there a way to do this? If so, can someone point me the way?

    Thanks in advance!!

  2. #2
    Super Moderator Roxoff's Avatar
    Join Date
    Aug 2005
    Location
    Nottingham, England
    Posts
    3,882
    Are you sure this extra data isn't added by writing the file on another platform, such as on Windwos? The text file line endings are different - but to fix it you either need to strip the extra character by hand or run the file through a file fixing tool like 'dos2unix'.
    Linux user #126863 - see http://linuxcounter.net/

  3. #3
    Just Joined!
    Join Date
    Oct 2011
    Posts
    3
    Nope, it's creating it and sorting it all on Linux. It's sorting on other text fields in the record, I just need this binary data to be carried along for the ride.

  4. #4
    Just Joined!
    Join Date
    Sep 2007
    Location
    Silver Spring, MD
    Posts
    95

    Please send data sample

    Can you send a sample of the data that you are referring to?

  5. #5
    Just Joined!
    Join Date
    Sep 2007
    Posts
    3
    I've seen many fixed length records containing both binary and text data, usually on mainframes.

    Unfortunately, the Linux sort is designed to sort lines, and implicit in that is that a line-end character is always required. Whether you chose the normal Linux line-end character 0x0a or null 0x00, that value could be found in the binary part of your data. Also, text data sometimes contains the line-end character. In either case, that would throw off the Linux sort. If you know that one of those values will never be found in your binary or text data, you may be able to use dd with cbs= to split the file into lines using that value as a line-end character at the end of each fixed-length record; sort the lines; then convert the sort output back to fixed length records. Not very efficient, and you need to really know the data characteristics.

    Why don't you explain more about the data.

  6. #6
    Just Joined!
    Join Date
    Oct 2011
    Posts
    3
    Thanks for your responses! My data basically is text with binary data at the end of the line...representing integers. I've decided to resolve this by writing a small perl script to do the job...and, it seems to be working like a charm. Again, thanks for the responses!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •