Find the answer to your Linux question:
Results 1 to 3 of 3
Hi, fellow members of LinuxForums.org. I have a friend who would like me to edit his website. To do this, I would like to recursively download all the pages within ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jan 2011
    Posts
    7

    Question [SOLVED] Recursive Site Downloader


    Hi, fellow members of LinuxForums.org.

    I have a friend who would like me to edit his website.

    To do this, I would like to recursively download all the pages within the domain, such that their link structure is preserved.

    This would be tedious to do by hand, however.

    As it stands, I could probably use wget for this, but I would prefer something more specially designed for site downloading.

    I have already tried webHTTrack, but found it unsuitable. Perhaps httrack with a particular set of command line parameters would work better? Please advise, if you have anything that could get me started.

    Thanks for reading. Feel free to PM me, also.

  2. #2
    Linux Guru reed9's Avatar
    Join Date
    Feb 2009
    Location
    Boston, MA
    Posts
    4,651
    wget is pretty straightforward to do this. Why is that not an option for you?

    Downloading an Entire Web Site with wget | Linux Journal
    Create a mirror of a website with Wget | FOSSwire
    Using wget to mirror a website | townx

    I would just create an alias for the command you want to use to make it easy. Something in ~/.bashrc like

    Code:
    alias grab='wget -mk'
    Then you could just type
    Code:
    grab www.mywebsite.com

  3. #3
    Just Joined!
    Join Date
    Jan 2011
    Posts
    7

    Thumbs up wget OR httrack

    Someone pm-ed me about httrack's wizard. Used that with the N0 option, and successfully got the site.

    Just read your top link about wget. That would work perfectly, and I'll probably just use that in the future bloat-free.

    Never mind the stuff below:

    The main reason is that I am going to need to maintain the directory structure, etc., and I think that wget does not do this.

    Doesn't wget just download the target's index page?
    Or does it get the whole domain or subdomain if you do not pass a full URL as its parameter?
    Last edited by Slekvic; 04-23-2011 at 06:12 AM. Reason: just read something that made my reply pointless.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •