Find the answer to your Linux question:
Results 1 to 5 of 5
I would like to run a regexp across a large file and print out only the sub matches. I'm happy to use sed, awk, perl but i'm looking for the ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Sep 2006
    Posts
    6

    regexp filter a file


    I would like to run a regexp across a large file and print out only the sub matches. I'm happy to use sed, awk, perl but i'm looking for the simpelest way - a one liner hopefully - to find all [HEADLINE](.*)[WORDS] and return only the sub match i.e. the (.*) part.

    For example I would like to parse this:

    [HEADLINE]
    headline 1
    [Words]34
    [HEADLINE]
    headline 2
    [Words]66
    [HEADLINE]
    headline 3
    [Words]55

    to give this:

    headline 1
    headline 2
    headline 3

    I am also interested in general what is the quickest one liner to filter and return for all submatches, that may or may not me multiline, of a regexp in a text file.

    Thanks

    Adam

  2. #2
    Linux Engineer Javasnob's Avatar
    Join Date
    Jul 2005
    Location
    Wisconsin
    Posts
    942
    I don't know which would be quickest, but I would use sed.
    Flies of a particular kind, i.e. time-flies, are fond of an arrow.

    Registered Linux User #408794

  3. #3
    Just Joined!
    Join Date
    Sep 2006
    Posts
    6

    How would I do it in sed

    Thanks for your reply - how would I do it in sed? I'm looking for something like:
    match the regexp [HEADLINE]\n(.*)[WORDS].

    and then print \1, i.e. the characters that were matched in the (.*) bit.

    But I don't know how to get sed to do this, can anyone give me a suggested sed script?

  4. #4
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Slackware, {Free, Open, Net}BSD, Solaris
    Posts
    1,286
    Hi.

    I have used agrep for things like this. You might need to install it, but it's a useful utility to know about. Part of the man page is below.

    You might consider csplit, which would create a file for each occurrence, but in some circumstances, that is actually the best part.

    However, perl or sed may be your best bet. I've worked on and off on a multi-line matching script in perl, but it is definitely not a one-liner.

    Let us know what your solution turns out to be ... cheers, drl
    Code:
           agrep -d '^From ' 'breakdown;internet' mbox
                  outputs  all  mail messages (the pattern '^From ' separates mail
                  messages in a mail file) that contain keywords  'breakdown'  and
                  'internet'.
    -- excerpt from man agrep
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  5. #5
    Linux User muha's Avatar
    Join Date
    Jan 2006
    Posts
    290
    Substitute foo with bar on lines between RE1 and RE2 but not ON lines containing RE1 and RE2. ~>
    Code:
    sed '/RE1/,/RE2/{;/RE1/b;/RE2/b;s/foo/bar/;}' file
    Substitute, anything between [HEADLINE] and [Words] with anything between [HEADLINE] and [Words] , on lines between [HEADLINE] and [Words] but not ON lines containing [HEADLINE] and [Words]. ~>
    So basicly, print anything between [HEADLINE] and [Words], last p stands for print.
    Don't print the rest, that's the -n option.
    Code:
    $ cat infile.txt
    [HEADLINE]
    headline 1
    [Words]34
    [HEADLINE]
    headline 2
    [Words]66
    [HEADLINE]
    headline 3
    [Words]55
    $ sed -n '/\[HEADLINE\]/,/\[Words\]/{;/\[HEADLINE\]/b;/\[Words\]/b;s/.*/&/p;}' infile.txt
    headline 1
    headline 2
    headline 3
    Now what? You have Linux installed and running. The GUI is working fine, but you are getting tired of changing your desktop themes. You keep seeing this "terminal" thing. Don't worry, they'll show you what to do @
    <~ http://www.linuxcommand.org/ ~>

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •