Results 1 to 9 of 9
Hi all,
I understand how to find a specific pattern but how do you "print" that specific pattern your looking for instead printing the whole line that pattern is on. ...
- 04-11-2008 #1Just Joined!
- Join Date
- Oct 2006
- Posts
- 24
extracting specific line instead of printing whole line using awk/sed
Hi all,
I understand how to find a specific pattern but how do you "print" that specific pattern your looking for instead printing the whole line that pattern is on. In my case, I have to extract websites address that have a similiar pattern of:
"http://www.address.com"
- all of the addresses in the file start with quotes, followed by "http://" and end with double quotes. So thats the thing i need to match to get those patterns out. But how do I only get the output to print out what i'm looking for and not the whole line?
- A sample of the type of file im dealing with is below:
asdasdsad9999123123===080"http://www.linux.com"kajnsknd123123
ijiasdkkjnas"http://www.ign.com/reviews"nas,mdn123123as
etc....
Any suggestions or help would be greatly appreciated,
Thanks
- 04-11-2008 #2Just Joined!
- Join Date
- Oct 2006
- Posts
- 24
Oh yeah i used the following sed command, to extract the pattern "http://"
Hope that helps so you get an idea of what i'm trying to do.
sed -n '/\"http:\/\/.*"/ p' bookmarks.html
- 04-11-2008 #3Linux User
- Join Date
- Jun 2007
- Posts
- 318
If you don't mind using a bash script instead here's one. It's crude but it works.
Code:#!/bin/bash typeset -i _pos while read _lne do _pos="`echo "$_lne" | awk -F"^" '{print match($1,"http://")}'`" _tmp="`echo "$_lne^$_pos" | awk -F"^" '{print substr($1,$2)}'`" _tmp2='"' _pos="`echo "$_tmp^$_tmp2" | awk -F"^" '{print match($1,$2)}'`" _pos=_pos-1 _tmp2="`echo "$_tmp^$_pos" | awk -F"^" '{print substr($1,1,$2)}'`" echo "$_tmp2" done < bookmarks.html
- 04-12-2008 #4Linux User
- Join Date
- Aug 2006
- Posts
- 458
output:Code:awk 'BEGIN{FS="\""} { for(i=1;i<=NF;i++){ if( $i ~ /http/ ) print $i } } ' file
Code:# ./test.sh http://www.linux.com http://www.ign.com/reviews
- 04-12-2008 #5Linux User
- Join Date
- Aug 2006
- Posts
- 458
- 04-12-2008 #6Just Joined!
- Join Date
- Oct 2006
- Posts
- 24
thanks for the help!! greatly appreciated
- 04-14-2008 #7Linux User
- Join Date
- Jun 2007
- Posts
- 318
I manage over 30 servers that are either Red Hat Linux or (Tru64) UNIX. Some of those Tru64 UNIX servers are in one of two TruCluster UNIX clusters which are very complicated. I have to know bash, ksh, csh, perl, plus a couple of other scripting languages plus numerous software products used on the servers. I simply don't have the time or desired to learn another cryptic scripting language.
- 12-08-2008 #8Just Joined!
- Join Date
- Dec 2008
- Posts
- 1
If you like perl
Just to note you could use something like
Code:perl -e 'while(<>) {s#.+?(https?://.+?)".+#$1#;print;}' <file>
- 12-09-2008 #9Linux Newbie
- Join Date
- Jul 2008
- Posts
- 181
grep can do that:
Code:grep -o '"http://[^"]\+"'


Reply With Quote
