Find the answer to your Linux question:
Results 1 to 4 of 4
So I'm trying to replace everything before the pattern and everything afterward. and print it out using sed for instance, i have a text file or html file and i ...
  1. #1
    Just Joined!
    Join Date
    Jan 2012
    Posts
    2

    sed:trying to replace everything before the pattern with an emp string

    So I'm trying to replace everything before the pattern and everything afterward. and print it out using sed

    for instance, i have a text file or html file and i want to only print the URL. so i use $grep pattern filename fist to look up all the line with URLs in it and i was thinking about pipe it to sed to remove the rest of the string before and after it.

    <td><....><href="URLs goes here">blah</...>
    <td><...><href="URLs goes here">blahblah</...>

    Assume that there's no pattern before and after the URLs, as in it could be anything that comes before it....

    I'm trying to print just this to standard output:
    URLs
    URLs
    (i'm not allowed to post URLs yet but i hope you get the point)


    I was thinking that I should find the way to replace whatever that comes before URLs and after. I have spent hours look up examples and options for sed and i'm so lost. please help!

  2. #2
    Just Joined!
    Join Date
    Aug 2011
    Posts
    48
    Try:
    $ sed -n 's/\(.*<href=\"\)\(.*\)\(\">.*\)/\2/p' yourInputFile.txt
    URLs goes here
    URLs goes here
    $
    Check to make sure that its not to greedy
    Last edited by histrungalot; 01-29-2012 at 03:49 PM.

  3. #3
    Just Joined!
    Join Date
    Jan 2012
    Posts
    2
    Quote Originally Posted by histrungalot View Post
    Try:
    $ sed -n 's/\(.*<href=\"\)\(.*\)\(\">.*\)/\2/p' yourInputFile.txt
    URLs goes here
    URLs goes here
    $
    Check to make sure that its not to greedy
    I think it should work thanks!
    but the thing is that what if whatever that comes before h t t p : / / is not always href.. that's my problem

    is there any options in sed that let me print something by finding pattern from the beginning to then end. I can't use ^ or $ because there might not be a space in front of h t t p: / / and at the end of .com . net etc..

    $sed some-other-option?/http:\/\/.*.[a-z][a-z][a-z]/..


    h t t p:\/\/.*.\.[a-z][a-z][a-z] <-- just printing the pattern?
    someone suggest getting rid of everything before and after the pattern but i don't know how to do that..... is that even efficient ?

  4. #4
    Just Joined!
    Join Date
    Aug 2011
    Posts
    48
    Post example .txt

    sed -n 's/\(.*http:\/\/\)\(.*\)\(\">.*\)/\2/p'

    Not at my computer to test.
    Last edited by histrungalot; 01-29-2012 at 08:30 PM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...