Results 1 to 5 of 5
I would like to run a regexp across a large file and print out only the sub matches. I'm happy to use sed, awk, perl but i'm looking for the ...
- 09-22-2006 #1Just Joined!
- Join Date
- Sep 2006
- Posts
- 6
regexp filter a file
I would like to run a regexp across a large file and print out only the sub matches. I'm happy to use sed, awk, perl but i'm looking for the simpelest way - a one liner hopefully - to find all [HEADLINE](.*)[WORDS] and return only the sub match i.e. the (.*) part.
For example I would like to parse this:
[HEADLINE]
headline 1
[Words]34
[HEADLINE]
headline 2
[Words]66
[HEADLINE]
headline 3
[Words]55
to give this:
headline 1
headline 2
headline 3
I am also interested in general what is the quickest one liner to filter and return for all submatches, that may or may not me multiline, of a regexp in a text file.
Thanks
Adam
- 09-22-2006 #2
I don't know which would be quickest, but I would use sed.
Flies of a particular kind, i.e. time-flies, are fond of an arrow.
Registered Linux User #408794
- 09-22-2006 #3Just Joined!
- Join Date
- Sep 2006
- Posts
- 6
How would I do it in sed
Thanks for your reply - how would I do it in sed? I'm looking for something like:
match the regexp [HEADLINE]\n(.*)[WORDS].
and then print \1, i.e. the characters that were matched in the (.*) bit.
But I don't know how to get sed to do this, can anyone give me a suggested sed script?
- 09-22-2006 #4Linux Engineer
- Join Date
- Apr 2006
- Location
- Saint Paul, MN, USA / CentOS, Debian, Solaris, SuSE
- Posts
- 1,117
Hi.
I have used agrep for things like this. You might need to install it, but it's a useful utility to know about. Part of the man page is below.
You might consider csplit, which would create a file for each occurrence, but in some circumstances, that is actually the best part.
However, perl or sed may be your best bet. I've worked on and off on a multi-line matching script in perl, but it is definitely not a one-liner.
Let us know what your solution turns out to be ... cheers, drl
Code:agrep -d '^From ' 'breakdown;internet' mbox outputs all mail messages (the pattern '^From ' separates mail messages in a mail file) that contain keywords 'breakdown' and 'internet'. -- excerpt from man agrepWelcome - get the most out of the forum by reading forum basics and guidelines: click here.
90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
We look forward to helping you with the challenge of the other 10%.
( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )
- 09-25-2006 #5
Substitute foo with bar on lines between RE1 and RE2 but not ON lines containing RE1 and RE2. ~>
Substitute, anything between [HEADLINE] and [Words] with anything between [HEADLINE] and [Words] , on lines between [HEADLINE] and [Words] but not ON lines containing [HEADLINE] and [Words]. ~>Code:sed '/RE1/,/RE2/{;/RE1/b;/RE2/b;s/foo/bar/;}' file
So basicly, print anything between [HEADLINE] and [Words], last p stands for print.
Don't print the rest, that's the -n option.
Code:$ cat infile.txt [HEADLINE] headline 1 [Words]34 [HEADLINE] headline 2 [Words]66 [HEADLINE] headline 3 [Words]55 $ sed -n '/\[HEADLINE\]/,/\[Words\]/{;/\[HEADLINE\]/b;/\[Words\]/b;s/.*/&/p;}' infile.txt headline 1 headline 2 headline 3Now what? You have Linux installed and running. The GUI is working fine, but you are getting tired of changing your desktop themes. You keep seeing this "terminal" thing. Don't worry, they'll show you what to do @
<~ http://www.linuxcommand.org/ ~>


Reply With Quote
