Results 1 to 5 of 5
This should be a very simple question but I can't find the answer anywhere.
I want to perform a regex match but only have the match returned, not the line ...
- 02-23-2009 #1Linux Newbie
- Join Date
- Aug 2005
- Location
- Sterling, VA
- Posts
- 100
Regex Matching
This should be a very simple question but I can't find the answer anywhere.
I want to perform a regex match but only have the match returned, not the line containing the match. For example, consider the following:
That will return the line containing the title element, but I just want the match, not the whole line. How can I accomplish this using awk, or some other standard utility?Code:cat webpage.html | awk '/<title>[^<]*<\/title>/'
Thank you,
Dave- EndianX -
- 02-23-2009 #2
It is possible in awk to use a regular expression as a field separator. I can't remember the syntax for defining it, but if you used \<.*title\> it would turn the title into an awk field which you could then print out.
"I'm just a little old lady; don't try to dazzle me with jargon!"
- 02-23-2009 #3Just Joined!
- Join Date
- Feb 2009
- Posts
- 45
Attention: the regular expression in the following answer is simplified for the sake of readability.
hazel already gave the right hint by saying:
Originally Posted by EndianX
The correct syntax would be:
Originally Posted by hazel
Alternative solutions (which totally depend on what you define as a “standard utility”) are:Code:cat webpage.html | awk -F '</?title>' '/(<title>.*<\/title>)/ { print $2 }'
Code:cat webpage.html | grep '<title>' | sed -e 's/^.*\(<title>.*<\/title>\).*$/\1/' cat webpage.html | perl -ne 'print "$1\n" if /(<title>.*<\/title>)/' cat webpage.html | grep -o '<title>.*</title>'
- 02-23-2009 #4Just Joined!
- Join Date
- Oct 2004
- Posts
- 62
Hi Endianx,
I often encounter this type of problems, but i use "alternative" solutions
With such a shell expression, you get "only" the title (w/out tags) andCode:cat xxx.html|grep -i '<title'|cut -d'<' -f2|cut -d '>' -f2
it is working even if the tag has parameters.
A bit of explanation:
Bye.Code:cat xxx.htm = list the html to be treated grep -i '<title' = grep (w/out considering the case) the tag (w/out '>') cut -d'<' -f2|cut -d '>' -f2 = isolating the title (-d '<' or '>') as the 2nd field (-f2)
- 02-23-2009 #5Linux Newbie
- Join Date
- Aug 2005
- Location
- Sterling, VA
- Posts
- 100
Thanks all.
These all look very helpful. I like the sed and perl solutions. And somehow I'd never even heard of the cut command before.
Thanks again,
Dave- EndianX -


Reply With Quote