Find the answer to your Linux question:
Results 1 to 7 of 7
hi, well, hi, how are u, please please etc etc mi problem is, i have a table like this: 1 2 3 ab x ab x cd x cd x ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Apr 2011
    Posts
    4

    collapsing rows - Awk


    hi, well, hi, how are u, please please etc etc

    mi problem is, i have a table like this:

    1 2 3
    ab x
    ab x
    cd x
    cd x

    (
    this thing doesn't allow me to tab the table, the coordinates are:
    ab:1
    ab:2
    cd:2
    cd:3
    )

    and i need to end with something like this

    1 2 3
    ab x x
    cd x x


    again, coordinates should be
    ab:12
    cd:23
    of course, not in this format but as a table

    i think it can be done with awk, but im just starting at it and the clycling stuff is a some kind fuzzy for me

    thanks!!
    Last edited by alekos; 04-04-2011 at 01:36 AM.

  2. #2
    Linux User
    Join Date
    Nov 2008
    Location
    Tokyo, Japan
    Posts
    260
    I think I understand what you want. It's so easy, I'll just show you how to do it.

    Input file:
    Code:
    a b 1
    a b 2
    c d 3
    c d 4
    AWK Script:
    Code:
    #collapse.awk
    {	i = ($1 " " $2)
    	my_keys[i] = ($1 " " $2)
    	my_values[i] = (my_values[i] " " $3)
    }
    
    END {
    	for (x in my_keys) {
    		print(my_keys[x] " : " my_values[x])
    	}
    }
    Executing on the command line:
    Code:
    % awk -f collapse.awk input.txt
    c d :  3 4
    a b :  1 2

  3. #3
    Just Joined!
    Join Date
    Apr 2011
    Posts
    4
    well, i think i cant let me undestood because the lack of formatting here (or my lack of knowledge of forums-text format)
    the initial table is something like this
    <tab>1<tab>2<tab>3
    ab<tab>x
    ab <tab><tab>x
    cd <tab><tab>x
    cd <tab><tab><tab>3

    and i need to end with:

    <tab>1<tab>2<tab>3
    ab<tab>x<tab>x
    cd<tab><tab>x<tab>x

    thanks!

  4. #4
    Linux User
    Join Date
    Nov 2008
    Location
    Tokyo, Japan
    Posts
    260
    The best place to learn about AWK is the GNU AWK User's Guide. At this site, they have many examples that are very easy to understand, and an index of important built-in functions you can use, like "length" and "sub".

    The basic operation of AWK programs is simply analyze every line of an input. So each line of the input file goes one-by-one to the AWK program. Your program must contain code of the form Pattern->Action. The "Pattern" is a "regular expression". The action is simply some commands, for example "print", that are executed if the pattern matches.

    AWK has many commands, including "length(x)" to count the length of the string "x", arithmetic (+ - * / %), and "system" to execute a shell command from within the program.

    Here is a simple AWK program that uses regular expressions, and I think it will more closely match what you want. But it is more fun to learn it yourself, so please just use this as an example:
    Code:
    /^(..)(\t+)(.*)$/ {
    # ($1)($2 )($3)
    # $1 = any two characters
    # $2 = one or more <tab> characters
    # $3 = all characters after the <tab> characters
    #      until the end of the line
    	#Here is the action
    	my_keys[$1] = $1
    	my_values[$1] = ($2 $3)
    }
    
    END {
    # The "END" pattern action matches the end of the the input file.
    	for (x in my_keys) { print x "\t" my_values[x] }
    }
    
    #REGEX PATTERNS:
    # the pattern contains special characters inside of parenthases
    # (.) -> this will match any 1 character
    #        for example "a", "?", or <space>
    # (..) -> this will match any 2 characters
    #         for example "ab", "?_", or <space><tab>
    # (A+) -> this will at least 1, or more than one "A" characters
    #         for example "A", "AA", "AAA", ...
    # (A*) -> this matches 0 or more than one "A" characters
    #         for example "", "A", "AA", "AAA", ...
    # (\t+) -> Match 1 or more <tab> characters
    # (Hello$) -> Mathches "Hello" only if it is at the end of the
    #             line. "I said Hello" matches, "Hello!" does not.
    # (^Hello) -> Mathches "Hello" only if it is at the beginning of
    #             the line. "Hello world!" matches, "I said Hello"
    #             does not match.
    
    #ACTIONS:
    # In the action, the parts of the pattern are assigned to $1,
    # $2, $3, etc. That is, the first parenthases matched are placed
    # in $1, the second parenthases matched are placed in $2, etc.
    # $0 is always equal to the whole line.
    # you can assign strings to variable names:
    # a = "Hello"
    # b = "world"
    # c = (a   ", "   b   "!") # ...now c is "Hello, world!"

  5. #5
    Just Joined!
    Join Date
    Apr 2011
    Posts
    4
    hmm, ok, thank you, in fact, i feel like awk is like grep/sed, so is dificult to me to think in a way in wich it could go throught various lines, and compare those, etc

  6. #6
    Linux User
    Join Date
    Nov 2008
    Location
    Tokyo, Japan
    Posts
    260
    Actually, I make that mistake too. In fact, I made a mistake in my previous code that I posted here!

    I confused AWK with Perl! I remember now that in Perl, when you use parentheses in regular expressions, the characters matching the input line inside the parentheses are stored to variables $1, $2, $3, ... , but this is not the case in "awk"!

    Sorry! Let me show you what I got wrong.
    Code:
    /^(..)(\t+)(.*)$/ {
    # ($1)($2 )($3) <- This is true for "Perl" but not for "AWK"
    # $1, $2, $3 are NOT the contents of the parentheses of the above pattern.
    # Actually, $1, $2, and $3 are the "fields" of the input line.
    # Fields are created by breaking-up the input line between white-spaces.
    	#Here is the action
    	# lets say the input line is "ab   1" 
    	my_keys[$1] = $1 # my_key["ab"] = "ab"
    	my_values[$1] = ($2 $3) # my_value["ab"] = ("1" "") = "1"
    }
    I tested the above code to make sure it worked. By coincidence the above code was wrong but it still worked correctly because of the way the input was formatted, by separating the input lines by whitespaces!

    Everything else I said about "regular expression" patterns is true.

    Awk is a very simple programming language -- it just executes one action for every line of input that matches a pattern. If there is no pattern, the action is performed on every line. AWK does not take too much time to learn, and it is very useful.

    If you have any more questions, let us know. I will make sure not to give you wrong information next time! Sorry!

  7. #7
    Just Joined!
    Join Date
    Apr 2011
    Posts
    4
    hmm, ok, i see
    well, i think i have a lot to learn, thank you for that!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •