Results 1 to 1 of 1
Hi,
I'm a linguistics researcher and have put together a bash script which scrapes together text from web forums, using lynx and then sed to look for text between a ...
- 12-07-2011 #1Just Joined!
- Join Date
- Dec 2011
- Location
- France
- Posts
- 1
CGI bash script
Hi,
I'm a linguistics researcher and have put together a bash script which scrapes together text from web forums, using lynx and then sed to look for text between a left-limit and a right-limit. I'd like now to put this on a server, with a html form, where the user can enter the URL, and possibly a name for the output file, then download the file on execution of the script. I've hunted around the web a bit, but have not found anything that seems remotely to explain how to go about this. Any help to a scripting novice would be much appreciated!
The script is given below. Thanks in advance!
Code:#!/bin/bash # This script extracts text from any of the wordreference forums # The next two lines ask for the url and for the output file name; these are variables in the command echo Please enter the url read URL echo Please enter your output filename read OUTPUT # This lynx command takes the text from the webpage, with image links etc. lynx -dump $URL | # These sed commands 1. eliminate all but the text between the markers of left- and right-limits; # 2. and 3. remove text of the form [image.png] or [image.gif]; # 4. removes the digits between square brackets. # Then the output is transferred to the named file. sed -n '/Thread:/,/Quick Navigation/p' | sed 's/\[.*[.]png\]//g' | sed 's/\[.*[.]gif\]//' | sed 's/\[.*\]//' > $OUTPUT


Reply With Quote