Find the answer to your Linux question:
Results 1 to 5 of 5
I'm having troubles getting the exact syntax down for a wget command. I am wanting to download a single page and all of its dependencies (css, images, etc) into a ...
  1. #1
    Just Joined!
    Join Date
    Nov 2008
    Location
    Nashville, TN
    Posts
    3

    Question wget - download single page

    I'm having troubles getting the exact syntax down for a wget command. I am wanting to download a single page and all of its dependencies (css, images, etc) into a single directory (not a directory for each subdomain) and have the target page named index.html

    So that if the following page was located at http://example.com/some/path/pageIwant.php
    HTML Code:
    <html>
    <head>
    <link rel="stylesheet" href="http://static.example.com/style.css" />
    <script type="text/javascript" src="../js/main.js"></script>
    </head>
    <body>
    <h1>Hello World!</h1>
    <img src="http://images.example.com/img/kitty.jpg" />
    </body>
    </html>
    I want to be able to do (just need help with the options)
    Code:
    wget [options] http://example.com/some/path/pageIwant.php
    and have the following in a directory that I specify (say "/archives/1/") and I would get the following output so that if I were to look at the index.html page locally it would look exactly like the example.com page.

    Code:
    user$ ls /archives/1/
    index.html
    kitty.jpg
    main.js
    style.css
    Let me know if that doesn't make sense or if you need any more information. Also, it doesn't necessarily have to be wget either, that is just what seems like it would be the best solution...but I am up for any suggestions.

  2. #2
    Just Joined!
    Join Date
    Oct 2004
    Posts
    62
    I use an alias...
    Code:
    alias wgethtml='wget -E -H -k -K -p -nd -o logwget.txt'
    written in .bashrc (so it is alway at your disposal....)

    Usage example:
    Code:
    wgethtml http://www.linuxforums.org/
    If I run it from a folder with no other files, I can immediately identify:
    Code:
    -rw-r--r-- 1 d3 d3  6583 2008-12-02 21:56 bnr-UH-234x60-VPS.gif
    -rw-r--r-- 1 d3 d3  2772 2008-11-11 20:45 btn-vpsL10.gif
    -rw-r--r-- 1 d3 d3  2718 2008-11-11 20:45 btn-vpsL20.gif
    -rw-r--r-- 1 d3 d3  2715 2008-11-11 20:45 btn-vpsL40.gif
    -rw-r--r-- 1 d3 d3  2654 2008-11-11 20:30 btn-vpsL5.gif
    -rw-r--r-- 1 d3 d3 12139 2008-12-12 15:14 common.css
    -rw-r--r-- 1 d3 d3  2366 2008-12-16 21:17 front.asp?ipid=16081
    -rw-r--r-- 1 d3 d3 47988 2008-12-16 21:17 index.html <---------------
    -rw-r--r-- 1 d3 d3 48432 2008-12-16 21:17 index.html.orig
    -rw-r--r-- 1 d3 d3  5036 2008-11-20 22:04 logo.gif
    -rw-r--r-- 1 d3 d3  6398 2008-12-16 21:17 logwget.txt
    -rw-r--r-- 1 d3 d3  1217 2008-11-20 22:04 nav_bar_search.gif
    -rw-r--r-- 1 d3 d3   301 2006-10-09 20:09 robots.txt
    -rw-r--r-- 1 d3 d3   301 2006-10-09 20:09 robots.txt.1
    -rw-r--r-- 1 d3 d3    28 1997-01-10 02:21 robots.txt.2
    -rw-r--r-- 1 d3 d3    40 2008-12-16 21:17 robots.txt
    You can leave off "-o logweb.txt".
    Obviously also js files are saved (if present).

  3. #3
    Just Joined!
    Join Date
    Nov 2008
    Location
    Nashville, TN
    Posts
    3

    Thanks

    Quote Originally Posted by fiomba View Post
    I use an alias...
    Code:
    alias wgethtml='wget -E -H -k -K -p -nd -o logwget.txt'
    Awesome, thank you very much! I'm going to play around with a few different scenarios but that appears to be exactly what I was looking for.

  4. #4
    Linux Newbie imranka's Avatar
    Join Date
    Dec 2007
    Location
    Kolkata
    Posts
    177
    To download in a specific directory use -P

    like

    wget -P /path/to/<directory> etc etc .....

    -nd option is for no-directory, everything will be downloaded without the original directory structure and land flat into your directory.
    Imran
    Linux User #467555 | Debian Squeeze | Intel(R) Core(TM)2 Duo CPU 4500 @ 2.20GHz | Gigabyte GA-G41MT-ES2L
    | 2 GB RAM | 320 GB SATA | Kernel: 2.6.32-5-686

  5. #5
    Just Joined!
    Join Date
    Nov 2008
    Location
    Nashville, TN
    Posts
    3
    I've actually come across two scenarios that are causing issues.

    1) If you download http://www.example.com/page.html then it saves it as "page.html" in the output directory. Is there a way to force it to always be renamed back to "index.html"? Granted, that might should be done with another tool (ie. not wget) but if I can do it all in one command then I'd prefer that.
    2) Similar to #1, if you download an image http://www.example.com/test.jpg then it just downloads the image (obviously, as you would expect). However, what I'm wanting to accomplish is to have it so that I can always ensure that when you go to the directory root, there will be an index.html file that will display the content.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...