Results 1 to 6 of 6
I'd like to pass HTML output from curl to be parsed using sed. The problem I'm having is being able to delete all lines before a <PRE> tag. Right now, ...
- 06-16-2008 #1Just Joined!
- Join Date
- Jun 2008
- Posts
- 4
Deleting all text before a given HTML tag using sed
I'd like to pass HTML output from curl to be parsed using sed. The problem I'm having is being able to delete all lines before a <PRE> tag. Right now, the output looks something like the following:
<HTML>
LINE OF GARBAGE
LINE OF GARBAGE
<PRE>GOOD LINE (1)
GOOD LINE (2)
GOOD LINE (3)
GOOD LINE (4)
</PRE>
LINE OF GARBAGE
</HTML>
So far, I have the following:
curl -s site.com/page.html | sed -e '1,/<pre>/d' -e '/<\/pre>/,$d'
and the output is:
GOOD LINE (2)
GOOD LINE (3)
GOOD LINE (4)
However, the line containing the PRE tag (GOOD LINE (1)) gets deleted. :\ If anyone knows how to modify the code to delete all the lines before PRE (and not including it) that would be greatly appreciated if you could help me out.
- 06-16-2008 #2Linux User
- Join Date
- Aug 2006
- Posts
- 458
Code:curl ..... | awk '/<PRE>/,/<\/PRE>/'
- 06-16-2008 #3Just Joined!
- Join Date
- Jun 2008
- Posts
- 4
Thank you very much. Your help has been much appreciated.
- 06-16-2008 #4Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044
Or if you really want to use sed:
Code:curl ..... | sed -n '/<PRE>/,/</PRE>/p'
- 06-17-2008 #5Linux User
- Join Date
- Aug 2006
- Posts
- 458
- 06-17-2008 #6Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044


Reply With Quote
