Find the answer to your Linux question:
Results 1 to 4 of 4
Hi,All! Need one's help. i have a big file looking like this Code: <page> <title>...</title> <..../> .... </page>... that is what i want to extract: Code: <page> <title>Talk:Atlas Shrugged</title> <id>128</id> ...
  1. #1
    Just Joined!
    Join Date
    Oct 2007
    Posts
    2

    extract part of the file/read next lines if condition is fullfiled on the current/awk

    Hi,All!

    Need one's help.
    i have a big file looking like this

    Code:
    <page>
    <title>...</title>
    <..../>
    ....
    </page>...
    that is what i want to extract:

    Code:
    <page>
     <title>Talk:Atlas Shrugged</title>
        <id>128</id>
        <revision>
          <id>152717854</id>
          <timestamp>2007-08-21T16:32:33Z</timestamp>
          <contributor>
            <username>Marlith</username>
            <id>4871029</id>
          </contributor>
          <minor />
          <comment>/* The quotes do not belong here */</comment>
          <text
    xml:space="preserve">{{NovelsWikiProject|class=B|importance=High}}
    ......
    </text>
    </revision>
    </page>
    so what i need is

    Code:
    if('<title>Talk:')
    -> start printing/store in other file up to
    Code:
    if(!'<page>')
    will appear, then stop reading and search for
    Code:
    if('<title>Talk:')
    again

    I don't get how can i achieve to read lines after
    Code:
    <title>Talk:
    - sed and grep are reading each line and next or getline is not enough for me. Is there any possibility to define global flag or any other ideas?

    Please if possible comments with examples.

    Thank you in forehand,
    Zina

  2. #2
    Linux Newbie radoulov's Avatar
    Join Date
    Sep 2007
    Posts
    111
    Code:
    awk '/<page/ { tag = $0 }
    /<title>Talk:/ { $0 = tag "\n" $0; f=1 }
    f && /<\/page/ { print; f=0 } 
    f' filename

  3. #3
    Just Joined!
    Join Date
    Oct 2007
    Posts
    2

    perfect

    Wow! it works perfectly! Thank you very much!
    Anyway i don't want to left dummy. Let me understand the way of your coding.

    you find <page and save it in tag
    next you read line and it is not <page but it submits the next condition <title then you edit currently read line by adding tag "\n" $0; and define a flag f=1
    next you read the third line and first and second condition are not fulfilled but flag is 1 and </page condition is not full filled - the line is $1, then going on with reading and comparing with all three conditions(<page, <title, </page) - lines are $2-${infinity}

    after we met
    Code:
    </page
    we print everything starting from $0 and set our flag to zero. Now we look again for fulfilling the condition.

    Is it so?
    But really wow!
    Thank you and respect,
    Zina

  4. #4
    Linux Newbie radoulov's Avatar
    Join Date
    Sep 2007
    Posts
    111
    Quote Originally Posted by pet_ra View Post
    Wow! it works perfectly! Thank you very much!
    Anyway i don't want to left dummy. Let me understand the way of your coding.

    you find <page and save it in tag
    next you read line and it is not <page but it submits the next condition <title then you edit currently read line by adding tag "\n" $0; and define a flag f=1
    [...]
    So,
    when the current record matches the pattern "<title>Talk:",
    we modify the current record $0: we prepend tag (the previous record)
    and a new line, then we set f (our flag) to 1.
    We continue, if the flag is set to 1(true) AND the current record matches
    the pattern "</page>", we print it and set the flag to 0(false).
    The final f do the real work: it reads all the records and if the flag is set to 1(true), it prints the record.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...