Find the answer to your Linux question:
Results 1 to 5 of 5
Hi, I am trying to see whether wget can be used to generate actual url hits on a web page. This does not look good so far…. I changed the ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jan 2011
    Posts
    4

    Question Wget - URL hit generator


    Hi,

    I am trying to see whether wget can be used to generate actual url hits on a web page. This does not look good so far….

    I changed the following lines in /etc/wgetrc to:
    Code:
    http_proxy=http : / /<proxy_ip>:<port>/
    use_proxy on
    Output :
    Code:
    root# wget -c <url>/ > /dev/null 
    --2011-01-16 12:26:38--  <url>
    Connecting to <proxy_ip>:<port>... connected.
    Proxy request sent, awaiting response... 200 OK
    Length: unspecified [text/html]
    Saving to: `index.html.3'
    
        [   <=>                                 ] 50 548      88,9K/s   in 0,6s    
    
    2011-01-16 12:26:39 (88,9 KB/s) - `index.html.3' saved [50548]
    This does NOT generate a hit on the actual web page!
    It does not seem like the, > /dev/null part is working either...

    How can I get this to work?

  2. #2
    Linux User
    Join Date
    Dec 2009
    Posts
    264
    What counter do you wanna trigger?

    Is the proxy counting the hits?!

  3. #3
    Just Joined!
    Join Date
    Jan 2011
    Posts
    4
    I am not sure what trigger this will affect, but when I use the exact same proxy in firefox (preferences -> Advanced -> network) I do get a hit on the particular web page.

    I would really like the same "mechanism" to be trigger by using wget from the commad line..

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    What is this "hit" for you? Do you see it in a tool like google analytics?

    Then there are at least two possibilities:

    1] google analytics doesn't count the Wget User-Agent (a http header) as count-worthy and you won't be able to much about that unless you change the headers you send over

    2] google analytics has already counted the request and regards any further requests as following the first. in that case you can't do much as these checks use some data to identify you (usually your ip address together with user-agent and other data usable by heuristic algorithms) and you would have to change at least one part of that data (i.e. if you have a dialup connection and your ISP supports it reconnect to get a new ip address).

    If you use a masquerading proxy like tor and the hits don't count up you're certainly out of luck and the analytics is well designed and doesn't use IP addresses to identify users. Good luck for that case!

  6. #5
    Just Joined!
    Join Date
    Jan 2011
    Posts
    3
    Most likely whatever is counting hits checks the user-agent field, as well as a couple others, of the http request before deciding whether or not to consider it a hit. I'm pretty sure wget doesn't set this field by default. Check out the --user-agent and --header options in wget's manpage.

    This link will be able to answer more of you're questions:
    www(.)askapache(.)com/dreamhost/wget-header-trick(.)html
    Last edited by mharv87; 01-25-2011 at 04:23 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •