Results 1 to 7 of 7
Hi,
I've spent hours trying to figure this out without any luck. Would appreciate your help
I need to "crawl" a website and search for a specific string in the ...
- 12-06-2011 #1Just Joined!
- Join Date
- Dec 2011
- Posts
- 4
Bash script: wget a website and save urls containing a specific string
Hi,
I've spent hours trying to figure this out without any luck. Would appreciate your help
I need to "crawl" a website and search for a specific string in the html. If a string is found the url to the page containing it is saved to a file.
So the end result should be a file containing a list of urls.
I tried writing a bash script using wget. My knowledge of Linux is very basic. I am using Cygwin for Windows.
- 12-06-2011 #2
KrazyWorks » Wget examples and scripts
number 5 sounds close to what you are trying to dolinux user # 503963
- 12-06-2011 #3Just Joined!
- Join Date
- Dec 2011
- Posts
- 4
scathefire, thank you for taking the time to look into it.
The thing is that I don't have a file with urls to loop through. I have the main url to the website (i.e. example.com), and I want wget to crawl it and check every page for my search string.
- 12-06-2011 #4
does the --spider option not work?
of course, i don't think it will download the page persay. there are other software options out there though.linux user # 503963
- 12-06-2011 #5Just Joined!
- Join Date
- Dec 2011
- Posts
- 4
--spider only checks if the page exists, it doesn't get the actual page contents. I need the page contents to search for the string in the html.
My plan B is to download the whole site with wget, and then do the search on my local version:
wget -r -l 2 SITE_URL
But this will not give me the list of URLs.
- 12-06-2011 #6
why not do something like:
then, with your local copy, filter through the files.Code:wget -mk SITE_URL
linux user # 503963
- 12-06-2011 #7Just Joined!
- Join Date
- Dec 2011
- Posts
- 4
Yes, this is what I ended up doing. Thank you for your help!


Reply With Quote