Find the answer to your Linux question:
Results 1 to 7 of 7
Hi all - I have been working on a sed script to pull some information out of an html file. The original file is like this: ... <span class="text1">50mL $210<br>100mL ...
  1. #1
    Just Joined!
    Join Date
    Apr 2007
    Posts
    2

    Question Problems with sed matches

    Hi all - I have been working on a sed script to pull some information out of an html file.

    The original file is like this:

    ...
    <span class="text1">50mL $210<br>100mL $340<br>250mL $690<br>500mL $1190<BR></span>
    ...

    I am trying to *just* get the prices so I do the following command:

    sed -n 's/.*100mL \(.*\)<br>.*/\1/p;T;q;' <filename>

    and I thought it would give me: $340
    but instead I get:

    $340<br>250mL $690

    How can I get it to stop on the first match of <br>?

    I've looked all over and haven't found a way to do this - any help would be great!

    Thanks.

  2. #2
    Banned CodeRoot's Avatar
    Join Date
    Sep 2005
    Posts
    567
    It might be worthwhile to simplify it first... Try each of these commands to see what I mean:

    Code:
    sed 's/<br>/ /gi' <filename>
    sed 's/<br>/ /gi' <filename> | sed 's.</span>..'
    sed 's/<br>/ /gi' <filename> | sed 's.</span>..' | sed 's/<.*>//'

  3. #3
    Linux Newbie birdman's Avatar
    Join Date
    Mar 2006
    Location
    Ireland
    Posts
    141
    Try this:

    Code:
     sed -n "s/.*100mL \(\\$\)\([0-9]\+\).*/\1\2/p" <filename>
    Regards

  4. #4
    Just Joined!
    Join Date
    Apr 2007
    Posts
    18
    This gets you all the prices line by line:

    Code:
    sed -e 's/[><]/\n/g' <filename> | sed -e '/\$/!d' | cut -f2 -d' '
    Without 'cut' you get the pairs like:

    50mL $210
    100mL $340
    250mL $690
    500mL $1190

    With sed as I guess the option 'g' forces to continue searching for next occurrence of regexp, without 'g' the script stops at the first one.

    Cheers

  5. #5
    Just Joined!
    Join Date
    Apr 2007
    Posts
    2

    Thumbs up

    Thanks Rtoip and CodeRoot!

    I ended up using

    sed -e 's/[<>]/\n/g' 2001|sed -e '/\'mL'/!d'

    and it returned

    50mL $210
    100mL $340
    250mL $690
    500mL $1190

    There was some other cruft in the file with $ that kept getting into the result, but the mL seemed to be unique. Now I can use awk to sort through the results.

    Birdman - if I try the line you gave, I get something that looks right, but if I use 50mL, it returns the price for 250mL - I'm guessing because that is the last instance of "50mL" on that line...

    Thanks everyone.

  6. #6
    Linux Newbie Sangal-Arun's Avatar
    Join Date
    May 2006
    Location
    Gurgaon, India + Denver Colorado USA
    Posts
    101

    Thumbs up

    [/E*Fare/Users/qabuild/aksutil/P/pawan] $ echo '<span class="text1">50mL $210<br>100mL $340<br>250mL $690<br>500mL $1190<BR></span>
    '| grep -o "[0-9]*mL \$[0-9]*"
    50mL $210
    100mL $340
    250mL $690
    500mL $1190

    Just by using GREP!!!

    /A



    Quote Originally Posted by penguinpower80
    Thanks Rtoip and CodeRoot!

    I ended up using

    sed -e 's/[<>]/\n/g' 2001|sed -e '/\'mL'/!d'

    and it returned

    50mL $210
    100mL $340
    250mL $690
    500mL $1190

    There was some other cruft in the file with $ that kept getting into the result, but the mL seemed to be unique. Now I can use awk to sort through the results.

    Birdman - if I try the line you gave, I get something that looks right, but if I use 50mL, it returns the price for 250mL - I'm guessing because that is the last instance of "50mL" on that line...

    Thanks everyone.
    Brgds,

    ARUN SANGAL
    SCM: 1- 720 251 9962
    Email: sangal.ak04@gmail.com
    Email: sangal_ak04@yahoo.com

  7. #7
    Just Joined!
    Join Date
    Apr 2007
    Posts
    18
    Sangal fine, there are many utilities which will lead to the same result, like nearly always

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...