Results 1 to 2 of 2
Hi All,
This is a bit off topic but I'm trying to create an essential reading list for myself and have come across this site:
Lists of Bests
What I ...
- 07-21-2011 #1
Script to Extract from Website
Hi All,
This is a bit off topic but I'm trying to create an essential reading list for myself and have come across this site:
Lists of Bests
What I am hoping to do is make a compilation using books that others have in their list, such as this one:
jsherry's "Recommended Reading List for the Well Educated Adult" on Lists of Bests
I've now entered 200 book titles, including author and what not, and it's taking too long. So now the question is, can I make an easy script to extract the book titles (and maybe even the links to amazon that they have) from certain lists? I'm not even sure if this is legal (if it's not, I apologize in advance and don't need to know how to do it
)
Thanks in advanceBodhi 1.3 & Bodhi 1.4 using E17
Dell Studio 17, Intel Graphics card, 4 gigs of RAM, E17
"The beauty in life can only be found by moving past the materialism which defines human nature and into the higher realm of thought and knowledge"
- 07-21-2011 #2Linux Guru
- Join Date
- May 2011
- Posts
- 1,842
You mean you want to grab the titles from the page on Amazon? That's perfectly legal. Use wget, something like:
Then parse the output file, "books-page1.html". You can use html2text.py (google for it), to convert the html to plain text and then grep for the titles. A program called "tidy" might help you clean up the html, too. If you wanted to grab the links, then you'd have to parse the html itself, not a plain text version of it.Code:wget -O books-page1.html http://www.listsofbests.com/list/2366-recommended-reading-list-for-the-well-educated-adult?name=comment_page&page=1


Reply With Quote