Find the answer to your Linux question:
Results 1 to 5 of 5
I have parsed an HTML file to pull an IP address. The grep command I used was: Code: egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' lab.html > mailmessage.txt The output pulls the appropriate line out ...
  1. #1
    Just Joined!
    Join Date
    Jun 2008
    Posts
    2

    Question about grep

    I have parsed an HTML file to pull an IP address. The grep command I used was:

    Code:
    egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' lab.html > mailmessage.txt
    The output pulls the appropriate line out of the HTML file and looks like this:

    Code:
    <td class="datasmall">192.168.1.20</td>
    I can live with this, but I would like to be able to parse out only the IP address and get rid of the other HTML tags on either side.

    Is there an easy way to do this?

  2. #2
    Linux Engineer khafa's Avatar
    Join Date
    Apr 2008
    Location
    Tokyo, Japan
    Posts
    858
    hi,


    you can pipe it with awk as follows
    Code:
    egrep '[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}\.[[:digit:]]{1,3}' lab.html| awk -F">" '{print $2}' | awk -F"<" '{print $1}'
    there should a far better way to do this but im dead tired rite now to think
    Linux and me it's a love story

  3. #3
    scm
    scm is offline
    Linux Engineer
    Join Date
    Feb 2005
    Posts
    1,044
    Use sed to remove the HTML tags:
    Code:
    sed 's/<[^>]*>//g'
    This removes all sequences of zero or more characters that are not '>' between '<' and '>' characters.

  4. #4
    Just Joined!
    Join Date
    Jun 2008
    Posts
    2
    Thanks! It worked! How about getting rid of leading space characters?

  5. #5
    scm
    scm is offline
    Linux Engineer
    Join Date
    Feb 2005
    Posts
    1,044
    Leading spaces can be removed by
    Code:
    sed 's/^ *//'
    This matches zero or more occurrences (indicated by the * character) of space at the start of the line (indicated by the ^ character) and replaces them with nothing. If you replace the space character with a space and a tab enclosed in square brackets it'll cope with all leading whitespace.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...