Results 1 to 3 of 3
Hi all,
when I save from the web a file (html or other) I find the need to have a
web reference. Here is a solution I found:
Code:
- ...
- 12-11-2008 #1Just Joined!
- Join Date
- Oct 2004
- Posts
- 62
Save web references
Hi all,
when I save from the web a file (html or other) I find the need to have a
web reference. Here is a solution I found:
There is no need of sorting the output list,. You can at max, indent theCode:- save the web reference in a file (I use "web~filename.txt") for ex. http://pybrary.net/pyPdf/ saved in web~pyPdf.html.txt - get all the web references by an alias: alias web='find . -name "web~*.txt"> pippo; for l in `cat pippo`; do echo; echo $l ; echo $l |xargs cat ; done' - executing the alias (web), you obtain something like that: Wed 10 ~ web ./python/pyPdf/pyPdf_files/web~pyPdf.html.txt http://pybrary.net/pyPdf/ ./python/pyPdf/pyPdf_doc_files/web~pyPdf_doc.html.txt http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html
web reference... to obtain a more readable output:
For any help, I am at the disposalCode:./macro/AutoIt/AutoIt_on_Linux_files/web~AutoIt_on_Linux.html.txt http://www.scripticon.com/2008/07/installing-autoit-on-linux-so-you -can-do-windows-scripting/ ./macro/xmacro/xmacro-0.4.5/doc/web~xmacro0.4.5_QBallCow.txt.txt http://download.sarine.nl/xmacro/Description.html ./macro/xmacro/PDFRead_files/web~PDFRead.html.txt http://pdfread.sourceforge.net/ ./macro/xmacro/IkeHall_files/web~IkeHall.html.txt http://ikester.blogspot.com/2007/01/im-huge-fan-of-autohotkey.html ./python/pyPdf/pyPdf_files/web~pyPdf.html.txt http://pybrary.net/pyPdf/ ./python/pyPdf/pyPdf_doc_files/web~pyPdf_doc.html.txt http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html
...
I am curious to get experts comments, also to verify if there is a less
involved system...
Bye
- 12-11-2008 #2Just Joined!
- Join Date
- Nov 2008
- Posts
- 4
If I understand this correctly, you are basically trying to create a little database on your file system.
Sort of meta data about your web data.
1) Having 1-line files for each web file is pretty expensive (taking up an entire inode for just one line)
So I would put the information of interest into one file or one one file per project. Then you just need to grep/process that/those files (or better fgrep so you don't have to escape all the .dots.)
2) Did you think of having duplicate file names (many web sites have an index.html
wget basically transfers the URL into a directory structure.
If you run "wget -r -o yourURL.log 'http://your.url.net'
you get a nice report in yourURL.log .
You could clean up that log file to build your 'database' of meta data.
3) Dynamic pages may all show the same URL for very different content (which dynamically changes
How would you handle this?
My $0.02 -- just a few thoughts.
- 12-13-2008 #3Just Joined!
- Join Date
- Oct 2004
- Posts
- 62
Thank you, horsu, for your contribution.
You are basically correct (at least for the 1st point).
I could have saved all the web references in only one file (bigger) or even have used a small
database to organize the web references meta data.
Unluckily I'm not very organized
(i'll never will be a "guru"!).
Your worrying about i-nodes reminds me the old times of DOS, with small HD capacities and FAT (not even FAT32!).
Fortunately my web references are a small number, generally referring to those documents so interesting that I also printed them (all or in part).
As far as the db is concerned, it seems to me like to shoot to a fly with a gun....
Referring to your other two points... duplicate files are not possible and in any case web references are not the primary key (so Codd's rules are saved!), and dynamic pages are not a problem... I'm not quering a web db... (or in any case the web reference includes the query).
Bye... and excuse my poor English


Reply With Quote