Find the answer to your Linux question:
Results 1 to 3 of 3
I'm currently trying to write some form of update software for my work to throw in our servers crontab so it'll update our installation packages every night. The problem is ...
  1. #1
    Just Joined!
    Join Date
    Sep 2011
    Posts
    3

    Downloading from Dynamic Sources

    I'm currently trying to write some form of update software for my work to throw in our servers crontab so it'll update our installation packages every night.

    The problem is that most of the software I want is located at dynamic URLs(they change whenever it updates) so I can't just use wget to grab the URLs.

    Can anyone provide me with a method of essentially going to the site and clicking the link, but with a shell script? An example would be anything from FileHippo.com, I know it's possible because Ketarin does it, but I need this to be scripted.
    Last edited by Virucyde; 01-23-2012 at 09:40 AM.

  2. #2
    Linux Newbie
    Join Date
    Nov 2008
    Location
    Tokyo, Japan
    Posts
    243
    First off, you really should use a distribution with automated package management. If you can't use Yum, Apt-Get, you risk all kinds of problems with packages not updating correctly, which can break your OS. In the worst case, someone gets you to download a package that has been hacked, and you end up installing malware into your system with Sudo. Just don't do this.

    Secondly, can't ketarin be used from the command line? I'm pretty sure it can. You may want to check their wiki.

    But since you asked...
    For stuff like this, I usually use a regular expression (please refere to this link for more information), and everyone's old favorite program: Sed. Now this doesn't guarantee success, because web developers more clever than you or I would rather people not mooch files off of them without seeing the ads that keep them in business. But you can try the simple way first:

    1. Use "wget to download the page HTML source like this:
    Code:
    wget -p -l 1 http://some.site.whatever.com/"
    A clever site operator will use redirects to prevent this from working smoothly. Wget may end up downloading several unrelated pages, they are doing this just to make things difficult for you. You need to figure out which downloaded page is has the correct link. It may be different every time, so you need to use Grep and other tools to automate your search through the downloaded pages for the correct link. Grep uses the same regular expressions as Sed. If you Wget only gets you one page, and it is the download page, then you are in luck, and things will be easier for you.

    2. Look for a link like this:
    Code:
    <a href="/download_scribus/download/f8b5080fa7f6bce5358e29b8999bb5a1/"><img src="http://cache.filehippo.com/img/down5.png" alt="Download"/></a>
    A clever site operator embeds the actual link in a JavaScript, so this may not work without a running script engine, in which case you are out of luck. You may be able to find programs that can execute a JavaScript in the command line without rendering the page to recover the URL to which link redirects, but I don't know of any off hand.

    3. Now you can use Sed. You want to output only the part that says "/download_scribus/download/f8b5080fa7f6bce5358e29b8999bb5a1", so you tell sed to output only the part of the line that matches this regular expression.
    Code:
    sed -ne 's@^.*<a href="/\([[:alnum:]_]\+/download/[0-9A-Fa-f]\+\)/"><img src="http://cache[.]filehippo[.]com/img/.*[.]png" alt="Download"/></a>.*$@\1@p' file-downloaded-by-wget.html
    I didn't test it so, that regular expression may not actually work, you will need to tweak it and experiment to make it work for your purposes.

    The first letter in the sed command is "s", that means to do a "substitution". Then you have the @regular-expression-search@and-replace@ expression, where "regular-expression-search" is what you are searching for, and "and-replace" is "\1" which means, the part in the search expression surrounded by \( parenthases \) should be written back to the output buffer. The dot has a special meaning in regular expressions, so it must be escaped with square-brackets, like this: example[.]com. The last letter in the sed command is "p" which writes the output buffer.

    This technique is the same used by the Mark Zuckerberg character in that first "hacking" scene of the movie "The Social Network." What the movie didn't tell you was that it doesn't take a genius to do this kind of thing.
    Last edited by ramin.honary; 01-24-2012 at 02:35 AM.

  3. #3
    Just Joined!
    Join Date
    Sep 2011
    Posts
    3
    Awesome! That looks about along the lines of what I was looking for, the reason I'm using this instead of package management is because it is simply a script to update installation files to be run over the network on fresh Windows installs, not Linux, and Ketarin is a Windows program, so short of a VM or a secondary Windows server(Wine appears to fail with Ketarin), I can't use it.

    Thanks though, I'm fairly certain that should answer my question, I was thinking along these lines, but seeing the sample code makes it a lot more obvious

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...