Find the answer to your Linux question:
Results 1 to 5 of 5
This should be a very simple question but I can't find the answer anywhere. I want to perform a regex match but only have the match returned, not the line ...
  1. #1
    Linux Newbie
    Join Date
    Aug 2005
    Location
    Sterling, VA
    Posts
    100

    Regex Matching

    This should be a very simple question but I can't find the answer anywhere.

    I want to perform a regex match but only have the match returned, not the line containing the match. For example, consider the following:

    Code:
    cat webpage.html | awk '/<title>[^<]*<\/title>/'
    That will return the line containing the title element, but I just want the match, not the whole line. How can I accomplish this using awk, or some other standard utility?

    Thank you,

    Dave
    - EndianX -

  2. #2
    Linux Engineer hazel's Avatar
    Join Date
    May 2004
    Location
    Harrow, UK
    Posts
    955
    It is possible in awk to use a regular expression as a field separator. I can't remember the syntax for defining it, but if you used \<.*title\> it would turn the title into an awk field which you could then print out.
    "I'm just a little old lady; don't try to dazzle me with jargon!"

  3. #3
    Just Joined!
    Join Date
    Feb 2009
    Posts
    45
    Attention: the regular expression in the following answer is simplified for the sake of readability.

    Quote Originally Posted by EndianX
    I want to perform a regex match but only have the match returned, not the line containing the match. For example, consider the following:
    Code:
    cat webpage.html | awk '/<title>[^<]*<\/title>/'
    That will return the line containing the title element, but I just want the match, not the whole line. How can I accomplish this using awk, or some other standard utility?
    hazel already gave the right hint by saying:
    Quote Originally Posted by hazel
    It is possible in awk to use a regular expression as a field separator. I can't remember the syntax for defining it, but if you used \<.*title\> it would turn the title into an awk field which you could then print out.
    The correct syntax would be:
    Code:
    cat webpage.html | awk -F '</?title>' '/(<title>.*<\/title>)/ { print $2 }'
    Alternative solutions (which totally depend on what you define as a “standard utility”) are:
    Code:
    cat webpage.html | grep '<title>' | sed -e 's/^.*\(<title>.*<\/title>\).*$/\1/'
    cat webpage.html | perl -ne 'print "$1\n" if /(<title>.*<\/title>)/'
    cat webpage.html | grep -o '<title>.*</title>'

  4. #4
    Just Joined!
    Join Date
    Oct 2004
    Posts
    62
    Hi Endianx,
    I often encounter this type of problems, but i use "alternative" solutions
    Code:
    cat xxx.html|grep -i '<title'|cut -d'<' -f2|cut -d '>' -f2
    With such a shell expression, you get "only" the title (w/out tags) and
    it is working even if the tag has parameters.
    A bit of explanation:
    Code:
                              
    cat xxx.htm                  = list the html to be treated
    grep -i '<title'             = grep (w/out considering  the case) the tag (w/out '>')
    cut -d'<' -f2|cut -d '>' -f2 = isolating the title (-d '<' or '>') as the 2nd field (-f2)
    Bye.

  5. #5
    Linux Newbie
    Join Date
    Aug 2005
    Location
    Sterling, VA
    Posts
    100
    Thanks all.

    These all look very helpful. I like the sed and perl solutions. And somehow I'd never even heard of the cut command before.

    Thanks again,

    Dave
    - EndianX -

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...