Results 1 to 5 of 5
I have parsed an HTML file to pull an IP address. The grep command I used was:
Code:
egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' lab.html > mailmessage.txt
The output pulls the appropriate line out ...
- 06-10-2008 #1Just Joined!
- Join Date
- Jun 2008
- Posts
- 2
Question about grep
I have parsed an HTML file to pull an IP address. The grep command I used was:
The output pulls the appropriate line out of the HTML file and looks like this:Code:egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' lab.html > mailmessage.txt
I can live with this, but I would like to be able to parse out only the IP address and get rid of the other HTML tags on either side.Code:<td class="datasmall">192.168.1.20</td>
Is there an easy way to do this?
- 06-11-2008 #2
hi,
you can pipe it with awk as follows
there should a far better way to do this but im dead tired rite now to thinkCode:egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' lab.html| awk -F">" '{print $2}' | awk -F"<" '{print $1}'Linux and me it's a love story
- 06-12-2008 #3Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044
Use sed to remove the HTML tags:
This removes all sequences of zero or more characters that are not '>' between '<' and '>' characters.Code:sed 's/<[^>]*>//g'
- 06-18-2008 #4Just Joined!
- Join Date
- Jun 2008
- Posts
- 2
Thanks! It worked! How about getting rid of leading space characters?
- 06-18-2008 #5Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044
Leading spaces can be removed by
This matches zero or more occurrences (indicated by the * character) of space at the start of the line (indicated by the ^ character) and replaces them with nothing. If you replace the space character with a space and a tab enclosed in square brackets it'll cope with all leading whitespace.Code:sed 's/^ *//'


Reply With Quote