Find the answer to your Linux question:
Results 1 to 2 of 2
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1

    Easy way to extract HTML text from a .htm file

    Suppose I have a .htm file that has a lot of complex HTML code. I want to run a command from the shell that says "For this file index.htm extract all anchor tags (<a href...) and everything inside them and ending with .../a> and post it to standard output and separate each with a newline." What utility should I use for this? Should I use sed? awk? vi?

  2. #2
    Just Joined!
    Join Date
    Mar 2007
    Bogotá, Colombia
    Well, I haven't tried this yet, but top of my head, I think that this should work:

    cat index.htm | sed 's/a>/a>\n/g' | egrep "<a |</a>" | sed 's/^.*<a /<a /'

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts