Results 1 to 2 of 2
Thread: wget -K not functioning
Enjoy an ad free experience by logging in. Not a member yet? Register.
- Join Date
- Oct 2004
wget -K not functioning
I am trying to download around 2500 online journals from a website.
These journals are in .pdf format and are served by a CGI script.
For the past 2 weeks I have been able to download everything else but the actual .pdf files itself. I realised then that it was because it was not saving the CGI generated page as .html and then parsing to find the actual .pdf. Instead it was saving the links as files without any extension and leaving it at that.
So I turned on --html-extensions and then it started saving the files to .html.
Now the problem is that there are around 5000 .html files it needs to download before getting to the .pdf files. It takes around 24 hours to download the 5000 .html files. And it started re-downloading them each time I interrupt the download (when I need the PC for something else). So what I am facing is a 24 hour delay before it starts with the .pdf; each time I download.
I found that turning on -nc and -K did the trick for a test folder. Then I tried on the main website only to find that after 24 hours it did not create any .orig files as it was supposed to do.
Now I am left with 5000 .html files having links to the actual .pdf files that I need. I see no point in trying further if I am unable to resume my downloads; as I started this project only to run when I am not at work.
Any help will be appreciated.
The actual syntax I used:
wget -nc --html-extension -K --convert-links -l 6 --wait=2 --random-wait -X /cgi/issue_pdf,/cgi/content,/adsystem,/reports,/searchall,/subscriptions,/tips -rH -Djdr.iadrjournals.org http://jdr.iadrjournals.org/
- Join Date
- Apr 2004
I think more esaier to download somethin like Teleport, insteed of using wget. I've used wget lot of times, but I'm not sure that you can solve your problem with wget.