Find the answer to your Linux question:
Results 1 to 8 of 8
Hello I need to write a script to parse a string and get 4 tokens from the strings and the remaining tokens as single token. I will explain with examle ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jul 2008
    Posts
    9

    AWK : Need help to print last bunch of patterns


    Hello

    I need to write a script to parse a string and get 4 tokens from the strings and the remaining tokens as single token. I will explain with examle

    input string = "80 00 00 01 00 09 08 02 80 01 5a 08 02"
    Now I need to get it as below

    "8000,0001,0009,080280015a0802"

    I guess we can do it with AWK. But not getting a solution quickly .. could some one please help me on this. If not awk what would be the best way to get this.

    PS: the input string length may vary. it is not fixed length.

    Thanks
    ~S

  2. #2
    Linux Newbie radoulov's Avatar
    Join Date
    Sep 2007
    Posts
    111
    Code:
    % s='80 00 00 01 00 09 08 02 80 01 5a 08 02'
    % awk '(ORS=!(NR%2)&&NR<8?OFS:x)||1' OFS=, RS=\  ORS=\\n<<<"$s"
    8000,0001,0009,080280015a0802
    If your shell doesn't support here-strings:

    Code:
    printf '%s\n' "$s"|awk '(ORS=!(NR%2)&&NR<8?OFS:x)||1' OFS=, RS=\  ORS=\\n

  3. #3
    Just Joined!
    Join Date
    Jul 2008
    Posts
    9
    Thanks a lot for the help. sorry to say that I am just a novice to awk script. could you please help me understand this script.

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Newbie radoulov's Avatar
    Join Date
    Sep 2007
    Posts
    111
    I'll try, it somehow not trivial and I'm not sure
    you really want to know all that

    An awk program consists of series of rules.
    A rule consists of pattern followed by an action,
    either the pattern or the action can be omitted, but not both

    In this small program we have a pattern only and
    in this case the pattern is an expression:

    Code:
    (ORS = !(NR % 2) && NR < 8 ? OFS : x) || 1
    After the program code we set some built-in variables:

    Code:
    OFS=, RS=\  ORS=\\n
    OFS=, - the Output Field Separator is set to a comma ','
    RS=\ - the input Record Sepataror is set to a single space so every non-space character
    becomes a separate record, just like this:

    Code:
    $ s='80 00 00 01 00 09 08 02 80 01 5a 08 02'
    $ awk 1 RS=\  <<<"$s"
    80
    00
    00
    01
    00
    09
    08
    02
    80
    01
    5a
    08
    02
    The ORS part is unnecessary (I set it at the beginning and I forgot to remove it).
    ORS stands for Output Record Separator.

    So our expression consists of two sub-expressions and the logical OR operator:

    sub-expression 1:

    Code:
    ( ORS = !(NR % 2) && NR < 8 ? OFS : x)
    sub-expression 2:

    Code:
    1
    Binary logical OR operator:

    Code:
    ||
    So the entire expression evaluates to true when either the first
    or the second expression evaluates true.

    This is actually a shortcut because I'm not really interested
    in the result, so I'm artistically forcing the entire result to be true
    by adding the OR operator and the second sub-expression 1,
    so that, as far as the awk programming language is concerned,
    the entire expression evaluates always true (in awk 0 (for numbers)
    and NULL "" (for strings) evaluate false, everything else evaluates true)
    Any value OR true is always true.

    So now that you understand the:
    Code:
    ||1 ...
    part, let me
    explain the other one (the first sub-expression):

    Code:
    ORS = !(NR % 2) && NR < 8 ? OFS : x
    It sets the ORS (Output Record Separator):
    - if the modulus of the current record number NR is 0 AND the
    current record number is less than 8, it sets the ORS to OFS (a comma).
    - otherwise it sets it to NULL (x is an uninitialized variable).

    In our case NR % 2 is 0 when NR is 2, 4 e 6:

    Code:
    $ awk 'NR < 8 { print NR, "=>", NR%2 }' RS=\ <<<"$s"
    1 => 1
    2 => 0
    3 => 1
    4 => 0
    5 => 1
    6 => 0
    7 => 1
    So we want a comma after the second, the fourth and the sixth records.
    And we have this:

    Code:
    $ awk 'END { print "\n" } ORS=!(NR%2)&&NR<8?OFS:x' OFS=, RS=\  <<<"$s"
    00,01,09,
    What's missing? The records terminated by x (the uninitialized variable with NULL value).
    Why? Because ORS=x is ORS=NULL and the return value of this assignment is the assigned value,
    which happens to evaluate false. We need to output those records as well, so we need to add
    the second sub-expression: 1:

    Code:
    awk '(ORS=!(NR%2)&&NR<8?OFS:x)||1' OFS=, RS=\  <<<"$s"
    8000,0001,0009,080280015a0802
    Now we have all we need.

    Hope this helps.

  6. #5
    Just Joined!
    Join Date
    Jul 2008
    Posts
    9
    That was a great explanation - thanks a ton for your help .. I think I need to dive deep into AWK .. I use very minimal features of AWK normally - like print and some small input pattern matching etc. this type of stuffs make life more fun !!!!

  7. #6
    Just Joined!
    Join Date
    Jul 2008
    Posts
    9
    I am confused on NR and NF concepts. NR is number of records and NF is number of Fields. So when we feed a line as an input to be processed there NR should be 1 and NF should be the number of fields seperated by [space] ( here assumption is FS is space and RS is \n ). But here we are doing our processing based on NR -- ORS = !(NR % 2) && NR < 8 ? OFS : x. So how does this work.

    Thanks
    Salil

  8. #7
    Linux Newbie radoulov's Avatar
    Join Date
    Sep 2007
    Posts
    111
    Hi Salil,
    remember that we modified the RS so the records are separated by a white space, as far as the awk processing is concerned - fields become records. It's NR, but we actually process the fields NF.

  9. #8
    Just Joined!
    Join Date
    Jul 2008
    Posts
    9
    aah .. thanks for that pointer !!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •