Find the answer to your Linux question:
Results 1 to 8 of 8
Hello, Please, help me. I'm not newb from programming standpoint, but I'm new to scripting and new to Linux all together. If I have a web page, using firefox browser ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined! zabalex's Avatar
    Join Date
    Sep 2011
    Posts
    6

    Newb question, how to use curl command to get "generated source"


    Hello,
    Please, help me. I'm not newb from programming standpoint, but I'm new to scripting and new to Linux all together.

    If I have a web page, using firefox browser I can manually [view source] or [view generated source].

    Does curl have similar parameter / option?

    Any advise will be highly appreciated.

    Thanks,

  2. #2
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,307
    If you grab the page with curl (or wget - my preferred grabber), you ARE getting the source, although I'm not sure what generated source means...

    e.g.
    Code:
    curl http://www.google.com -o google.html
    The program tidy can make it easier to read sometimes, e.g.:
    Code:
    tidy -o google-pretty.html google.html

  3. #3
    Just Joined! zabalex's Avatar
    Join Date
    Sep 2011
    Posts
    6
    Thank you Atreyu,

    But it does not get "generated source"

    Source:
    <body>
    <script>document.write("Hello, Linux Forums!");</script>
    </body>


    Generated source:
    <body>
    <script>document.write("Hello, Linux Forums!");</script>
    Hello, Linux Forums!
    </body>

    So, in the above case "source" is what is loaded by the browser in response to the page-request.
    "Generated source" is what the browser has after all the javascript executes.

  4. #4
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,307
    Hmmmm, yes javascript, you may have a problem...see here and a link from that thread, here.

  5. #5
    Just Joined! zabalex's Avatar
    Join Date
    Sep 2011
    Posts
    6
    Hello atreyu,
    Thank you for your response.
    I can't believe it, but I finally found a case where generated source is significantly different from the view-source which curl function provides. Please, open the following page through your browser. You should see bunch of thumbnails, and the navigation to go to the next page of results.
    If you click on next page (for example page2), you'll of course see different set of thumbnails. If you click on view-source, you'll see totally different source from if you use FF->view generated source option. As a matter of fact, you won't even see the references to those thumbnails on the regular source, but on generated source - yes.
    freelancer.com/contest/Logo-Design-for-Q-Essence-2337.html

    Please, let me know if you know the solution. Honestly I don't even know how they did it that way, but I'd like to use curl or wget for the thumbnails.

    Thanks again,

    Sasha

  6. #6
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,307
    maybe this?
    Code:
    wget http://www.freelancer.com/contest/Logo-Design-for-Q-Essence-2337.html
    urls=$(grep -o src=\'http.*jpg Logo-Design-for-Q-Essence-2337.html |sed -e "s|src='||")
    for url in $urls; do
      wget $url
    done

  7. #7
    Just Joined! zabalex's Avatar
    Join Date
    Sep 2011
    Posts
    6
    Thank you Atreyu,

    My goal is not to download images (thumbs), but rather figure out a way to use curl to download source of page(2)->page(n).
    What you did in your 1 line of parsing code I finally managed to do in about 10 separate steps writing to temp files, then parsing them, and writing again, so I managed to download the images.
    The problem is that I can't download 2-nd page and so on.
    Please, help me if you can.

    Thanks,

  8. #8
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,307
    Don't think I can help you there. It is javascript that is the problem.

    I assume you know about the Page Info view in Firerfox? You can go the to Media tab there and directly see links to all the jpegs. You can even download them from there. I know, you want a scriptable solution, so this does not really help. But at least with this hack (untested) you can download one whole page's worth at once.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •