Results 1 to 4 of 4
I tried to download a multipage html document but all I got was the first page and "robots.txt":
Code:
User-agent: *
Disallow: /email/
Disallow: /cgi-bin/james
Disallow: /misc/
Disallow: /stuff/
Disallow: ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 03-19-2005 #1Linux Guru
- Join Date
- May 2004
- Location
- forums.gentoo.org
- Posts
- 1,814
Is wget blocked?
I tried to download a multipage html document but all I got was the first page and "robots.txt":
I'm pretty sure that I know what that means,Code:User-agent: * Disallow: /email/ Disallow: /cgi-bin/james Disallow: /misc/ Disallow: /stuff/ Disallow: /test/ Disallow: /library/images/ Disallow: /library/js/ Disallow: /library/css/ Disallow: /yourselections/ Disallow: /links/ Disallow: /estore/ Disallow: /site/ Disallow: /comments/ Disallow: /general-comments/ Disallow: /register/ Disallow: /admin/ Disallow: /oradoc/ Disallow: /wp/display/117/ User-agent: WebReaper User-agent: Anawave User-agent: EmailCollector User-agent: EmailSiphon User-agent: ExtractorPro User-agent: FlashSite User-agent: Go-Get-It User-agent: Grab-a-Site User-agent: HotCargo User-agent: HttpLoader User-agent: MemoWeb User-agent: NearSite User-agent: NetAttache User-agent: Radview User-agent: Radview/HttpLoader User-agent: Second Site User-agent: SecondSite User-agent: SiteSnagger User-agent: SpidyBot User-agent: Teleport User-agent: Teleport Pro User-agent: Visual Web User-agent: VisualWeb User-agent: WBI_Client User-agent: WebCompass User-agent: WebCopy User-agent: WebDownloader User-agent: WebRetriever User-agent: WebSnake User-agent: WebVCR User-agent: WebWhacker User-agent: WebZIP User-agent: Wget Disallow: /
but is there a way around it?
/IMHO
//got nothin'
///this use to look better
- 03-19-2005 #2
You should respect peoples robots.txt, but if you want to get the files just use a grabber that you can change the UA(user agent) of, cause this is causing the problem to change the UA of wget use the command below.
Code:wget --user-agent=MyGreatBot http://target.com
- 03-19-2005 #3Linux Guru
- Join Date
- May 2004
- Location
- forums.gentoo.org
- Posts
- 1,814
Thanks, Giro. I don't really know the bounds of good etiquette on his sort of thing. I see in the man page for wget it says this about --user agent:
I don't even know what that means, except that I do know it's directed to people like me.
Originally Posted by man wget /IMHO
//got nothin'
///this use to look better
- 03-20-2005 #4
That means that if you use that option you will be ripping that site against the will of the webmaster. The message tells you to think about that and if that really is what you want to do .
I\'m so tired .....
#200472


Reply With Quote
