Find the answer to your Linux question:
Results 1 to 9 of 9
I am trying to perform an ASCII sort on a file. Here is the file: $ cat test_file.txt HWI-ST141_0332:1:1:10213:593105 HWI-ST141_0332:1:1:10215:19310#G HWI-ST141_0332:1:1:1021:49310#GC HWI-ST141_0332:1:1:10213:19310AG HWI-ST141_0332:1:1:1021#4193108# HWI-ST141_0332:1:1:10213:193201 $sort test_file.txt HWI-ST141_0332:1:1:10213:19310AG HWI-ST141_0332:1:1:10213:193201 HWI-ST141_0332:1:1:10213:593105 HWI-ST141_0332:1:1:1021#4193108# ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2011
    Posts
    7

    Problem sorting


    I am trying to perform an ASCII sort on a file.
    Here is the file:

    $ cat test_file.txt
    HWI-ST141_0332:1:1:10213:593105
    HWI-ST141_0332:1:1:10215:19310#G
    HWI-ST141_0332:1:1:1021:49310#GC
    HWI-ST141_0332:1:1:10213:19310AG
    HWI-ST141_0332:1:1:1021#4193108#
    HWI-ST141_0332:1:1:10213:193201

    $sort test_file.txt
    HWI-ST141_0332:1:1:10213:19310AG
    HWI-ST141_0332:1:1:10213:193201
    HWI-ST141_0332:1:1:10213:593105
    HWI-ST141_0332:1:1:1021#4193108#
    HWI-ST141_0332:1:1:1021:49310#GC
    HWI-ST141_0332:1:1:10215:19310#G

    It looks like the sort command is ignoring the #s and :s.
    I need it to sort according to the ASCII value of the characters as follows:

    HWI-ST141_0332:1:1:1021#4193108#
    HWI-ST141_0332:1:1:10213:19310AG
    HWI-ST141_0332:1:1:10213:193201
    HWI-ST141_0332:1:1:10213:593105
    HWI-ST141_0332:1:1:10215:19310#G
    HWI-ST141_0332:1:1:1021:49310#GC

    Thanks in advance

  2. #2
    Just Joined!
    Join Date
    Jul 2011
    Posts
    7
    Any response would be appreciated.

  3. #3
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    How is your desired sort ascii-sorted? It doesn't make sense to me. Should all the '#' be treated as colons, blank space, nothing, or what? If you want to sort them numerically, add the '-n' to sort.

  4. #4
    Just Joined!
    Join Date
    Jul 2011
    Posts
    7
    Quote Originally Posted by atreyu View Post
    How is your desired sort ascii-sorted? It doesn't make sense to me. Should all the '#' be treated as colons, blank space, nothing, or what? If you want to sort them numerically, add the '-n' to sort.
    The '#' symbol has an ASCII value of 35. The ':' symbol has an ASCII value of 58. The digits 0 through 9 have ASCII values from 48 through 57.

    Example:
    The value '1021#4' comes before '102131' since the '#' has a lower ASCII value than '3'.
    The value '1021:4' comes after '102131' since the ':' has a higher ASCII value than '3'.

  5. #5
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    OH! You literally meant the ascii value...I should have not assumed you meant alphanumeric - my bad.

    Well, I doubt you'll be able to pull this off in a simple one line or two line sort command. I'd think you'd have to loop thru each line, get the ascii val for the special chars, and substitute them into the line in the proper place, then re-sort the whole list based upon your new lines.

    I know you can do the ascii value stuff in perl pretty easily, though I don't remember the exact command - wait, I think it is 'ord' (perldoc -f ord).

  6. #6
    Just Joined!
    Join Date
    Oct 2011
    Posts
    2
    I've run into a sorting problem (gnu sort on Ubuntu) that sounds at least related to the one referenced in opening this thread. So, I'll ask it here.

    First, "info sort" includes the following "If no key fields are specified, `sort' uses a default key of the entire line."

    The input file contains 619,931 lines, output by the command "ls -al", with Unix line endings (only LF). The following gives the sequence of lines that show what for me it's an entirely unexpected results.

    The first three lines don't raise my eyebrows:
    --------- 1 root root 0 2011-10-25 19:03 crond.reboot
    ---------- 1 root root 0 2011-10-25 19:03 /run/crond.reboot
    brw-rw----+ 1 root cdrom 11, 0 2011-10-25 19:03 /dev/sr0


    and so on ...

    Lines 60 and 61 don't raise my eyebrows either:
    brw-rw---- 1 root disk 8, 5 2011-10-25 19:03 sda5
    c--------- 1 root root 5, 2 2011-10-25 19:02 /dev/pts/ptmx


    and so on ...

    Lines 450 and 451 don't raise my eyebrows either:
    crw--w---- 1 root tty 4, 9 2011-10-25 19:03 tty9
    drwx------ 14 ivansoto ivansoto 4096 2011-10-26 22:26 .


    and so on ...

    Lines 84,171 and 84,172 don't raise my eyebrows either:
    dr-x--x--x 2 root root 0 2011-10-29 10:45 ns
    lrwx------ 1 ivansoto ivansoto 64 2011-10-29 10:09 10 -> anon_inode:[eventfd]


    and so on ...

    Lines 196,631 and 196,632 send me into a catatonic near collapse! What is going on here? It seems the "-" characters are being ignored, else they should have shown up at the top along with the first two lines:
    l-wx------ 1 ivansoto ivansoto 64 2011-10-29 10:45 /proc/32665/task/32665/fd/3 -> /home/ivansoto/tallyf
    -r-------- 1 colord colord 0 2011-10-29 10:43 auxv

    and so on without further shocks to the transition from line 617,495 to line 617496, which is ho-hum too:
    -r-xr-xr-x 2 root root 30076 2011-10-19 07:18 /usr/lib/cups/backend/parallel
    srw------- 1 ivansoto ivansoto 0 2011-10-25 19:04 agent.1410


    and so on to another shock, between lines 617,531 and 617,532; they again show something unexpected, in my view, that shows that hyphens (ASCII decimal 45, octal 55) are not being treated according to their byte value in comparing them to letters:
    srwxrwxr-x 1 ivansoto ivansoto 0 2011-10-28 22:00 /tmp/OSL_PIPE_1000_SingleOfficeIPC_6f65f250bf46953ee78f 296bc84742
    --w------- 1 colord colord 0 2011-10-29 10:43 clear_refs


    Would anybody please shed some light either in letting me know what's going on here, or even better yet in how to sort a one-field = full line plain text file so that it sorts according to the ASCII byte values, left to right?

    I've tried many different things but without success: the end result should be that all these lines should turn up with these prefixes clustered together:

    ---
    -r-
    -r-
    --w
    b
    c
    d
    l
    s


    Any helpful comments would be very much appreciated!

  7. #7
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Slackware, {Free, Open, Net}BSD, Solaris
    Posts
    1,283
    Hi.

    I have found that setting a few environment variables, viz. LC_ALL and LANG usually make my character / text / string sorts work correctly. If I understand both problems, I think this is a solution. Here is a sample script with the data provided by the posters:
    Code:
    #!/usr/bin/env bash
    
    # @(#) s1	Demonstrate sort with LC, LANG set to C.
    
    # Utility functions: print-as-echo, print-line-with-visual-space, debug.
    # export PATH="/usr/local/bin:/usr/bin:/bin"
    LC_ALL=C ; LANG=C ; export LC_ALL LANG
    echo "Environment: LC_ALL = $LC_ALL, LANG = $LANG"
    pe() { for _i;do printf "%s" "$_i";done; printf "\n"; }
    pl() { pe;pe "-----" ;pe "$*"; }
    db() { ( printf " db, ";for _i;do printf "%s" "$_i";done;printf "\n" ) >&2 ; }
    db() { : ; }
    
    f1=data1
    f2=data2
    
    for f in $f1 $f2
    do
      pl " Input data file $f:"
      cat $f
      pl " Results for sort $f:"
      sort $f
    done
    
    exit 0
    producing:
    Code:
    % ./s1
    Environment: LC_ALL = C, LANG = C
    
    -----
     Input data file data1:
    HWI-ST141_0332:1:1:10213:593105
    HWI-ST141_0332:1:1:10215:19310#G
    HWI-ST141_0332:1:1:1021:49310#GC
    HWI-ST141_0332:1:1:10213:19310AG
    HWI-ST141_0332:1:1:1021#4193108#
    HWI-ST141_0332:1:1:10213:193201
    
    -----
     Results for sort data1:
    HWI-ST141_0332:1:1:1021#4193108#
    HWI-ST141_0332:1:1:10213:19310AG
    HWI-ST141_0332:1:1:10213:193201
    HWI-ST141_0332:1:1:10213:593105
    HWI-ST141_0332:1:1:10215:19310#G
    HWI-ST141_0332:1:1:1021:49310#GC
    
    -----
     Input data file data2:
    ---
    c
    -r-
    --w
    s
    b
    d
    -r-
    l
    
    -----
     Results for sort data2:
    ---
    --w
    -r-
    -r-
    b
    c
    d
    l
    s
    See man man sort, man locale for details ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  8. #8
    Just Joined!
    Join Date
    Oct 2011
    Posts
    2

    Exclamation Absolutely stunning, DRL! Much obliged!

    DRL, I thank you very much! I'll satisfy my intellectual curiosity about those two environment variables later, but they sure mend sort's errant ways in a hurry! I guess your reply is what people would call an "ad rem" reply, but the geeky adjective doesn't do justice to the practical effectiveness of it!

  9. #9
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Slackware, {Free, Open, Net}BSD, Solaris
    Posts
    1,283
    Quote Originally Posted by ivansoto View Post
    ... I guess your reply is what people would call an "ad rem" reply, but the geeky adjective doesn't do justice to the practical effectiveness of it!
    See ad rem - definition of ad rem by the Free Online Dictionary, Thesaurus and Encyclopedia.
    Quote Originally Posted by ivansoto View Post
    DRL, I thank you very much! I'll satisfy my intellectual curiosity about those two environment variables later, but they sure mend sort's errant ways in a hurry!
    You're welcome. Ad rad -> Ad(d) rad -> To(o) rad -> Too radical -> Excellent; wonderful .
    ( rad - definition of rad by the Free Online Dictionary, Thesaurus and Encyclopedia. )
    whimsically, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •