Find the answer to your Linux question:
Results 1 to 6 of 6
I'd like to pass HTML output from curl to be parsed using sed. The problem I'm having is being able to delete all lines before a <PRE> tag. Right now, ...
  1. #1
    Just Joined!
    Join Date
    Jun 2008
    Posts
    4

    Deleting all text before a given HTML tag using sed

    I'd like to pass HTML output from curl to be parsed using sed. The problem I'm having is being able to delete all lines before a <PRE> tag. Right now, the output looks something like the following:

    <HTML>
    LINE OF GARBAGE
    LINE OF GARBAGE
    <PRE>GOOD LINE (1)
    GOOD LINE (2)
    GOOD LINE (3)
    GOOD LINE (4)
    </PRE>
    LINE OF GARBAGE
    </HTML>

    So far, I have the following:
    curl -s site.com/page.html | sed -e '1,/<pre>/d' -e '/<\/pre>/,$d'

    and the output is:
    GOOD LINE (2)
    GOOD LINE (3)
    GOOD LINE (4)

    However, the line containing the PRE tag (GOOD LINE (1)) gets deleted. :\ If anyone knows how to modify the code to delete all the lines before PRE (and not including it) that would be greatly appreciated if you could help me out.

  2. #2
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    Code:
    curl ..... | awk '/<PRE>/,/<\/PRE>/'

  3. #3
    Just Joined!
    Join Date
    Jun 2008
    Posts
    4
    Thank you very much. Your help has been much appreciated.

  4. #4
    scm
    scm is offline
    Linux Engineer
    Join Date
    Feb 2005
    Posts
    1,044
    Or if you really want to use sed:
    Code:
    curl ..... | sed -n '/<PRE>/,/</PRE>/p'

  5. #5
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    Quote Originally Posted by scm View Post
    Or if you really want to use sed:
    Code:
    curl ..... | sed -n '/<PRE>/,/</PRE>/p'
    you left out the escape.
    Code:
    curl ..... | sed -n '/<PRE>/,/<\/PRE>/p'

  6. #6
    scm
    scm is offline
    Linux Engineer
    Join Date
    Feb 2005
    Posts
    1,044
    Quote Originally Posted by ghostdog74 View Post
    you left out the escape.
    Code:
    curl ..... | sed -n '/<PRE>/,/<\/PRE>/p'
    Yeah, that's what sed told me when I ran it - thanks for being so observant. I should have cut'n'pasted the corrected one I ran after that.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...