Find the answer to your Linux question:
Results 1 to 9 of 9
Hi everyone, I have a file that contains links (of cities) and each is on a separate line. Problem is that one of the cities in the link has a ...
  1. #1
    Just Joined!
    Join Date
    Jan 2009
    Location
    Halifax, NS
    Posts
    19

    A little grep problem

    Hi everyone,

    I have a file that contains links (of cities) and each is on a separate line.
    Problem is that one of the cities in the link has a space.

    for instance, if these were the links:

    http://www.website.com/Toronto....
    http://www.website.com/Montreal....
    http://www.website.com/Niagara Falls... <-- this one is the problem

    With this code:

    Code:
    cat cityLinks.txt | 
    while read a
    do echo `grep -o 'destination.*' | 
    sed 's/destination=//' | 
    sed 's/&country.*RACK//'`
    done
    it will lists the cities all on one line, so like this:

    Toronto Montreal Niagara Falls

    I want to make directories named after the cities in the links, thus I exchanged 'echo' with 'mkdir'. The problem is that it will create two folders for Niagara Falls (1 for Niagara and 1 for Falls).

    How can I get it to create one folder for cities with a space in between (thus a folder called 'Niagara Falls' in this case)?

    Any help is appreciated.

  2. #2
    Just Joined!
    Join Date
    Oct 2004
    Posts
    62
    Why not sustituting the space with '_' (underscore)?
    By hand if such towns are few, or by
    Code:
     cat cityLinks.txt!sed 's/ /_/' (substitution of the first blank....)

  3. #3
    Just Joined!
    Join Date
    Jan 2009
    Location
    Halifax, NS
    Posts
    19
    Quote Originally Posted by fiomba View Post
    Why not sustituting the space with '_' (underscore)?
    By hand if such towns are few, or by
    Code:
     cat cityLinks.txt!sed 's/ /_/' (substitution of the first blank....)
    Not by hand because I want it to be automated as much as possible.
    Your second suggestion works though, I can't believe I didn't think of that.. my brain is too fried at the moment.

    Thanks!

  4. #4
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    Quote Originally Posted by dennis89 View Post
    Code:
    cat cityLinks.txt | 
    while read a
    do echo `grep -o 'destination.*' | 
    sed 's/destination=//' | 
    sed 's/&country.*RACK//'`
    done
    This is horribly inefficient. Since you did not give the correct links, I'll have to make a few guesses, but it seems as if you wish to discard everything up to and including "destination=", and everything including and following "&country". Sed can do all that in one go, and handle quoting:

    Code:
    sed 's/.*destination=\(.*\)&country.*/"\1"/' cityLinks.txt| xargs mkdir

  5. #5
    Just Joined!
    Join Date
    Jan 2009
    Location
    Halifax, NS
    Posts
    19
    Quote Originally Posted by burschik View Post
    This is horribly inefficient. Since you did not give the correct links, I'll have to make a few guesses, but it seems as if you wish to discard everything up to and including "destination=", and everything including and following "&country". Sed can do all that in one go, and handle quoting:

    Code:
    sed 's/.*destination=\(.*\)&country.*/"\1"/' cityLinks.txt| xargs mkdir
    Thanks for showing me that. I was taught with all the piping, but I like your method.
    Since I'm still new to this, can you explain what xargs does, I've never seen it been used before.

    edit: I found out that \( \) treats the expression inside as a group and saves the matched characters into a temporary holding area. So I'm assuming that you reference that temporary holding area with xargs?

  6. #6
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    Quote Originally Posted by dennis89 View Post
    edit: I found out that \( \) treats the expression inside as a group and saves the matched characters into a temporary holding area. So I'm assuming that you reference that temporary holding area with xargs?
    No! When using sed's "s" command, you can create groups in the regular expression and refer to the string they matched in the substitute text. There is no way to reference this information from another process! In my example, sed replaces the entire line with the part we are interested in and then prints it out. The pipe redirects the output to xargs, which uses the input to construct a command line for the following program argument, in this case "mkdir". This is a very common pattern.

  7. #7
    Just Joined!
    Join Date
    Jan 2009
    Location
    Halifax, NS
    Posts
    19
    Ok I believe I understand everything so far, but I have another question.

    If I want to download the pages (thus the links inside cityLinks.txt) with wget, to that city's directory I just created (e.g. download Toronto's page from its link in the txt file and store it in the Toronto directory.)

    Would it look something like this?

    Code:
    sed 's/.*destination=\(.*\)&country.*/"\1"/' cityLinks.txt| xargs mkdir | wget -w 5 -i cityLinks.txt -P ./city's_directory
    I'm pretty sure that's wrong, but a hint in the right direction would be appreciated.

    edit: I've used wget with the -i option before, but that downloads all links in the file to one location.
    What I'm trying to do is download each link in the file to a different location, how would I be able to do that?

  8. #8
    Linux Newbie
    Join Date
    Jul 2008
    Posts
    181
    Code:
    while read line
    do
            dir=${line##*destination=}
            dir=${dir%%&country*}
            mkdir "$dir"
            wget -P "$dir" $line
    done < cityLinks.txt

  9. #9
    Just Joined!
    Join Date
    Jan 2009
    Location
    Halifax, NS
    Posts
    19
    Quote Originally Posted by burschik View Post
    Code:
    while read line
    do
            dir=${line##*destination=}
            dir=${dir%%&country*}
            mkdir "$dir"
            wget -P "$dir" $line
    done < cityLinks.txt
    Thanks again burschik.
    I just needed to change the wget line to
    Code:
    wget $line -P "$dir"

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...