Find the answer to your Linux question:
Results 1 to 3 of 3
(Sorry for the xxx below, newbs are not allowed to post certain banned strings.) I want to mirror a website using wget -r, but even after reading the wget manual, ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jun 2010
    Posts
    5

    wget: "following links" vs. "retrieving directory tree"


    (Sorry for the xxx below, newbs are not allowed to post certain banned strings.)

    I want to mirror a website using wget -r, but even after reading the wget manual, I'm still a little unclear on how it works. With a command like:

    Code:
    wget -r --level=inf hxxp://xxx.foo.com/bar
    will wget automatically follow all html links and recreate the entire directory tree under the bar/ directory? What if I don't want to follow any html links but still want to recreate the entire directory tree under the bar/ directory? What if I want the opposite: I don't need the directory tree but just want to follow all links in bar/index.html, up to say, N links away from index.html? I don't see how these operations could be distinguished using wget's recursive retrieval capabilities. Thanks for any input!

  2. #2
    Linux Guru
    Join Date
    Nov 2007
    Posts
    1,754
    Code:
    man wget
    
    --mirror
               Turn on options suitable for mirroring.  This option turns on recursion and time-stamping, sets infinite recursion depth and
               keeps FTP directory listings.  It is currently equivalent to -r -N -l inf --no-remove-listing.
    Wget doesn't know about a "directory tree" under /bar unless there is an http link to the folder. If you want a backup independent of the webserver (folders, files, etc.) then you're better off using something like rsync.

  3. #3
    Just Joined!
    Join Date
    Jun 2010
    Posts
    5
    Ok that would explain my confusion. So wget really can only follow links; those "mirroring options" don't guarantee anything about recreating the complete directory tree?



    Quote Originally Posted by HROAdmin26 View Post
    Code:
    man wget
    
    --mirror
               Turn on options suitable for mirroring.  This option turns on recursion and time-stamping, sets infinite recursion depth and
               keeps FTP directory listings.  It is currently equivalent to -r -N -l inf --no-remove-listing.
    Wget doesn't know about a "directory tree" under /bar unless there is an http link to the folder. If you want a backup independent of the webserver (folders, files, etc.) then you're better off using something like rsync.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •