Results 1 to 2 of 2
Suppose I have a .htm file that has a lot of complex HTML code. I want to run a command from the shell that says "For this file index.htm extract ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 12-15-2011 #1Just Joined!
- Join Date
- Dec 2011
- Posts
- 3
Easy way to extract HTML text from a .htm file
Suppose I have a .htm file that has a lot of complex HTML code. I want to run a command from the shell that says "For this file index.htm extract all anchor tags (<a href...) and everything inside them and ending with .../a> and post it to standard output and separate each with a newline." What utility should I use for this? Should I use sed? awk? vi?
- 12-15-2011 #2Just Joined!
- Join Date
- Mar 2007
- Location
- Bogotá, Colombia
- Posts
- 43
Well, I haven't tried this yet, but top of my head, I think that this should work:
Code:cat index.htm | sed 's/a>/a>\n/g' | egrep "<a |</a>" | sed 's/^.*<a /<a /'


Reply With Quote
