Results 1 to 5 of 5
I'm having troubles getting the exact syntax down for a wget command. I am wanting to download a single page and all of its dependencies (css, images, etc) into a ...
- 11-11-2008 #1Just Joined!
- Join Date
- Nov 2008
- Location
- Nashville, TN
- Posts
- 3
wget - download single page
I'm having troubles getting the exact syntax down for a wget command. I am wanting to download a single page and all of its dependencies (css, images, etc) into a single directory (not a directory for each subdomain) and have the target page named index.html
So that if the following page was located at http://example.com/some/path/pageIwant.php
I want to be able to do (just need help with the options)HTML Code:<html> <head> <link rel="stylesheet" href="http://static.example.com/style.css" /> <script type="text/javascript" src="../js/main.js"></script> </head> <body> <h1>Hello World!</h1> <img src="http://images.example.com/img/kitty.jpg" /> </body> </html>
and have the following in a directory that I specify (say "/archives/1/") and I would get the following output so that if I were to look at the index.html page locally it would look exactly like the example.com page.Code:wget [options] http://example.com/some/path/pageIwant.php
Let me know if that doesn't make sense or if you need any more information. Also, it doesn't necessarily have to be wget either, that is just what seems like it would be the best solution...but I am up for any suggestions.Code:user$ ls /archives/1/ index.html kitty.jpg main.js style.css
- 12-16-2008 #2Just Joined!
- Join Date
- Oct 2004
- Posts
- 62
I use an alias...
written in .bashrc (so it is alway at your disposal....)Code:alias wgethtml='wget -E -H -k -K -p -nd -o logwget.txt'
Usage example:
If I run it from a folder with no other files, I can immediately identify:Code:wgethtml http://www.linuxforums.org/
You can leave off "-o logweb.txt".Code:-rw-r--r-- 1 d3 d3 6583 2008-12-02 21:56 bnr-UH-234x60-VPS.gif -rw-r--r-- 1 d3 d3 2772 2008-11-11 20:45 btn-vpsL10.gif -rw-r--r-- 1 d3 d3 2718 2008-11-11 20:45 btn-vpsL20.gif -rw-r--r-- 1 d3 d3 2715 2008-11-11 20:45 btn-vpsL40.gif -rw-r--r-- 1 d3 d3 2654 2008-11-11 20:30 btn-vpsL5.gif -rw-r--r-- 1 d3 d3 12139 2008-12-12 15:14 common.css -rw-r--r-- 1 d3 d3 2366 2008-12-16 21:17 front.asp?ipid=16081 -rw-r--r-- 1 d3 d3 47988 2008-12-16 21:17 index.html <--------------- -rw-r--r-- 1 d3 d3 48432 2008-12-16 21:17 index.html.orig -rw-r--r-- 1 d3 d3 5036 2008-11-20 22:04 logo.gif -rw-r--r-- 1 d3 d3 6398 2008-12-16 21:17 logwget.txt -rw-r--r-- 1 d3 d3 1217 2008-11-20 22:04 nav_bar_search.gif -rw-r--r-- 1 d3 d3 301 2006-10-09 20:09 robots.txt -rw-r--r-- 1 d3 d3 301 2006-10-09 20:09 robots.txt.1 -rw-r--r-- 1 d3 d3 28 1997-01-10 02:21 robots.txt.2 -rw-r--r-- 1 d3 d3 40 2008-12-16 21:17 robots.txt
Obviously also js files are saved (if present).
- 12-16-2008 #3Just Joined!
- Join Date
- Nov 2008
- Location
- Nashville, TN
- Posts
- 3
- 12-17-2008 #4
To download in a specific directory use -P
like
wget -P /path/to/<directory> etc etc .....
-nd option is for no-directory, everything will be downloaded without the original directory structure and land flat into your directory.Imran
Linux User #467555 | Debian Squeeze | Intel(R) Core(TM)2 Duo CPU 4500 @ 2.20GHz | Gigabyte GA-G41MT-ES2L
| 2 GB RAM | 320 GB SATA | Kernel: 2.6.32-5-686
- 12-17-2008 #5Just Joined!
- Join Date
- Nov 2008
- Location
- Nashville, TN
- Posts
- 3
I've actually come across two scenarios that are causing issues.
1) If you download http://www.example.com/page.html then it saves it as "page.html" in the output directory. Is there a way to force it to always be renamed back to "index.html"? Granted, that might should be done with another tool (ie. not wget) but if I can do it all in one command then I'd prefer that.
2) Similar to #1, if you download an image http://www.example.com/test.jpg then it just downloads the image (obviously, as you would expect). However, what I'm wanting to accomplish is to have it so that I can always ensure that when you go to the directory root, there will be an index.html file that will display the content.


Reply With Quote
