Find the answer to your Linux question:
Results 1 to 2 of 2
I am relatively new to wget. But I have been trying for weeks and am really frustrated so any help will be appreciated. I have googled and read the man ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Oct 2004
    Location
    Bangalore
    Posts
    9

    wget -K not functioning


    I am relatively new to wget. But I have been trying for weeks and am really frustrated so any help will be appreciated. I have googled and read the man page of wget inside out.

    I am trying to download around 2500 online journals from a website.

    The site:
    http://jdr.iadrjournals.org/contents-by-date.0.shtml

    These journals are in .pdf format and are served by a CGI script.
    For the past 2 weeks I have been able to download everything else but the actual .pdf files itself. I realised then that it was because it was not saving the CGI generated page as .html and then parsing to find the actual .pdf. Instead it was saving the links as files without any extension and leaving it at that.
    So I turned on --html-extensions and then it started saving the files to .html.

    Now the problem is that there are around 5000 .html files it needs to download before getting to the .pdf files. It takes around 24 hours to download the 5000 .html files. And it started re-downloading them each time I interrupt the download (when I need the PC for something else). So what I am facing is a 24 hour delay before it starts with the .pdf; each time I download.

    I found that turning on -nc and -K did the trick for a test folder. Then I tried on the main website only to find that after 24 hours it did not create any .orig files as it was supposed to do.

    Now I am left with 5000 .html files having links to the actual .pdf files that I need. I see no point in trying further if I am unable to resume my downloads; as I started this project only to run when I am not at work.

    Any help will be appreciated.

    The actual syntax I used:

    wget -nc --html-extension -K --convert-links -l 6 --wait=2 --random-wait -X /cgi/issue_pdf,/cgi/content,/adsystem,/reports,/searchall,/subscriptions,/tips -rH -Djdr.iadrjournals.org http://jdr.iadrjournals.org/

  2. #2
    Just Joined!
    Join Date
    Apr 2004
    Location
    Armenia
    Posts
    26
    I think more esaier to download somethin like Teleport, insteed of using wget. I've used wget lot of times, but I'm not sure that you can solve your problem with wget.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •