Find the answer to your Linux question:
Results 1 to 7 of 7
Hi I am trying to figure out a way to parse-out log records for a particular user. I have tried using both sed and awk to accomplish this but i ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Nov 2011
    Posts
    12

    SED/AWK Problem


    Hi

    I am trying to figure out a way to parse-out log records for a particular user. I have tried using both sed and awk to accomplish this but i can't seem to figure out how to do it.

    Example of some of the things i have tried:
    cat 1.log|awk '/.2012/,/.2012/' |grep 11111 > 2.log
    cat 1.log|sed -ne '/.2012/,/^)/p' |grep 11111 > 2.log


    The log entries are formatted as follows:

    08.05.2012 10:27:35
    REQUEST:
    <body>
    ....
    <subjectID type="ID" value="11111" />
    ....
    </body>

    RESPONSE:
    <body>
    ....
    blah
    ....
    </body>

    08.05.2012 10:27:46
    REQUEST:
    <body>
    ....
    <subjectID type="ID" value="11111" />
    ....
    </body>

    RESPONSE:
    <body>
    ....
    blah
    ....
    </body>


    Thanks!!

  2. #2
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    i can't tell by your awk/sed examples what exactly you are hoping to extract from your log. can you give an example of that?

  3. #3
    Just Joined!
    Join Date
    Nov 2011
    Posts
    12
    What i am looking to do is get all of the requests from a log file for a given value (eg. 11111). Below for example, the log below shows 4 entries - so i need to figure out a way to search the log for 11111 and each time that value is located i want to parse out all the request information for that particular request (shown in red).

    08.05.2012 10:27:35
    REQUEST:
    <body>
    ....
    <subjectID type="ID" value="11115" />
    ....
    </body>
    RESPONSE:
    <body>
    ....
    blah
    ....
    </body>
    08.05.2012 10:27:46
    REQUEST:
    <body>
    ....
    <subjectID type="ID" value="11113" />
    ....
    </body>
    RESPONSE:
    <body>
    ....
    blah
    ....
    </body>
    08.05.2012 10:34:26
    REQUEST:
    <body>
    ....
    <subjectID type="ID" value="11111" />
    ....
    </body>
    RESPONSE:
    <body>
    ....
    blah
    ....
    </body>

    08.05.2012 10:37:46
    REQUEST:
    <body>
    ....
    <subjectID type="ID" value="11121" />
    ....
    </body>

    RESPONSE:
    <body>
    ....
    blah
    ....
    </body>

  4. $spacer_open
    $spacer_close
  5. #4
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    okay, now i understand.

    here's a solution in perl (i didn't want to try this one in awk...). the script takes two command line arguments: the first is the path to the HTML log that you are parsing, the second is the ID you are interested in. once you create the script and make it executable, run it without any arguments and it will explain usage to you, e.g.:

    Code:
    [root@localhost /tmp]# ./read-htmllog.pl 
    
      Usage: ./read-htmllog.pl <HTML_LOG> <ID>
       e.g.: ./read-htmllog.pl /tmp/html.log 11111
    
    [root@localhost /tmp]#
    here is the code:

    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    
    # get required command line arguments
    die "
      Usage: $0 <HTML_LOG> <ID>
       e.g.: $0 /tmp/html.log 11111\n\n" unless($#ARGV==1);
    my $file = $ARGV[0];
    my $id = $ARGV[1];
    
    # make sure the file exists
    die "$file: No such file\n" unless(-f$file);
    
    print "\n\t +++ Looking in file \`$file' for ID \`$id' +++ \n\n";
    
    # a hash to store all lines of HTML, separated by record
    my %hash = ();
    
    # a simple record counter (one for each ID in the html log)
    my $i = 0;
    
    # open the HTML log and read it line by line
    open(FH,'<',$file) or die "can't open '$file': $!\n";
    while(<FH>){
    
      # remove the new line char
      chomp;
    
      # remove trailing spaces
      $_ =~ s/[ \t]+$//;
    
      # match on the date/time string - i.e. use it as the record separator
      if(/^[0-9]{2}\.[0-9]{2}\.[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}/){
    
        # one-up the record counter
        $i += 1;
    
      # save all other lines to the hash, separated by record
      }else{
        push(@{$hash{$i}},$_);
      }
    }
    close(FH);
    
    # if ID to be found is located, this will be set to its record number
    my $found;
    
    # loop thru all records found
    for my $record(sort {$a<=>$b} keys %hash){
    
      # loop thru all lines for each record
      for my $line(@{$hash{$record}}){
    
        # see if the id we asked for is in there
        if($line =~ /subjectID type=\"ID\" value=\"$id\"/){
          $found = $record;
        }
      }
    }
    
    if($found){
      print "ID $id was found in record $found :)\n\n";
      print $_,"\n" for(@{$hash{$found}});
    }else{
      print "ID $id was not found :(\n\n";
    
    }
    
    exit(0);
    btw, there is probably an easier way (well, shorter way maybe) to do this in awk, i just know perl better...

  6. #5
    Just Joined!
    Join Date
    Nov 2011
    Posts
    12
    That worked! Thanks so much for the help! I have two quick follow up questions (i am not very familiar with perl so please bare with me):

    1. How can I include the date of the record on the output?
    2. What would be the easiest why to make this recursive so it prints out all of the records for a particular ID?

    Thanks again for your help!

  7. #6
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by dbo3587 View Post
    1. How can I include the date of the record on the output?
    change this:
    Code:
      # match on the date/time string - i.e. use it as the record separator
      if(/^[0-9]{2}\.[0-9]{2}\.[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2}/){
    
        # one-up the record counter
        $i += 1;
    
      # save all other lines to the hash, separated by record
      }else{
    to this:
    Code:
      # match on the date/time string - i.e. use it as the record separator
      if(/^([0-9]{2}\.[0-9]{2}\.[0-9]{4} [0-9]{2}:[0-9]{2}:[0-9]{2})/){
    
        # one-up the record counter
        $i += 1;
    
        push(@{$hash{$i}},$1);
    
      # save all other lines to the hash, separated by record
      }else{
    the two changes are: enclosing parentheses are added to the line with the regex string (which matches the date/time stamp line), and the date/time string (saved automatically to built-in perl variable $1 via use of the parentheses) is pushed to the array (denoted with an @) for printing later.

    2. What would be the easiest why to make this recursive so it prints out all of the records for a particular ID?
    are you saying that, in your html log, you'll have multiple entries for a given id, e.g. 11111? that hadn't occurred to me...

  8. #7
    Just Joined!
    Join Date
    Nov 2011
    Posts
    12
    Quote Originally Posted by atreyu View Post

    are you saying that, in your html log, you'll have multiple entries for a given id, e.g. 11111? that hadn't occurred to me...

    Correct. I think i figured it out, i changed this:

    Code:
      # loop thru all lines for each record
      for my $line(@{$hash{$record}}){
    
        # see if the id we asked for is in there
        if($line =~ /subjectID type=\"ID\" value=\"$id\"/){
          $found = $record;
    to this:

    Code:
      # loop thru all lines for each record
      for my $line(@{$hash{$record}}){
    
        # see if the id we asked for is in there
        if($line =~ /subjectID type=\"ID\" value=\"$id\"/){
          #prints every matching record
          print $_,"\n" for(@{$hash{$record}});
        }
    Thanks again for your help!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •