Find the answer to your Linux question:
Results 1 to 3 of 3
Hi all, when I save from the web a file (html or other) I find the need to have a web reference. Here is a solution I found: Code: - ...
  1. #1
    Just Joined!
    Join Date
    Oct 2004
    Posts
    62

    Save web references

    Hi all,
    when I save from the web a file (html or other) I find the need to have a
    web reference. Here is a solution I found:
    Code:
    - save the web reference in a file (I use "web~filename.txt")
            for ex. http://pybrary.net/pyPdf/
                    saved in web~pyPdf.html.txt
     
    - get all the web references by an alias:
             alias web='find . -name "web~*.txt"> pippo; 
                        for l in `cat pippo`;
                        do 
                            echo;
                            echo $l ;
                            echo $l |xargs cat ;                     
                        done'
    - executing the alias (web), you obtain something like that:
      Wed 10 ~ web
      
      ./python/pyPdf/pyPdf_files/web~pyPdf.html.txt
      http://pybrary.net/pyPdf/
    
      ./python/pyPdf/pyPdf_doc_files/web~pyPdf_doc.html.txt   
      http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html
    There is no need of sorting the output list,. You can at max, indent the
    web reference... to obtain a more readable output:
    Code:
    ./macro/AutoIt/AutoIt_on_Linux_files/web~AutoIt_on_Linux.html.txt
            http://www.scripticon.com/2008/07/installing-autoit-on-linux-so-you 
                                                -can-do-windows-scripting/
    
    ./macro/xmacro/xmacro-0.4.5/doc/web~xmacro0.4.5_QBallCow.txt.txt
            http://download.sarine.nl/xmacro/Description.html
    
    ./macro/xmacro/PDFRead_files/web~PDFRead.html.txt
            http://pdfread.sourceforge.net/
    
    ./macro/xmacro/IkeHall_files/web~IkeHall.html.txt
            http://ikester.blogspot.com/2007/01/im-huge-fan-of-autohotkey.html
    
    ./python/pyPdf/pyPdf_files/web~pyPdf.html.txt
            http://pybrary.net/pyPdf/
    
    ./python/pyPdf/pyPdf_doc_files/web~pyPdf_doc.html.txt
            http://pybrary.net/pyPdf/pythondoc-pyPdf.pdf.html
    For any help, I am at the disposal ...
    I am curious to get experts comments, also to verify if there is a less
    involved system...

    Bye

  2. #2
    Just Joined!
    Join Date
    Nov 2008
    Posts
    4
    If I understand this correctly, you are basically trying to create a little database on your file system.
    Sort of meta data about your web data.

    1) Having 1-line files for each web file is pretty expensive (taking up an entire inode for just one line)
    So I would put the information of interest into one file or one one file per project. Then you just need to grep/process that/those files (or better fgrep so you don't have to escape all the .dots.)

    2) Did you think of having duplicate file names (many web sites have an index.html
    wget basically transfers the URL into a directory structure.
    If you run "wget -r -o yourURL.log 'http://your.url.net'
    you get a nice report in yourURL.log .
    You could clean up that log file to build your 'database' of meta data.

    3) Dynamic pages may all show the same URL for very different content (which dynamically changes
    How would you handle this?

    My $0.02 -- just a few thoughts.

  3. #3
    Just Joined!
    Join Date
    Oct 2004
    Posts
    62
    Thank you, horsu, for your contribution.

    You are basically correct (at least for the 1st point).
    I could have saved all the web references in only one file (bigger) or even have used a small
    database to organize the web references meta data.

    Unluckily I'm not very organized (i'll never will be a "guru"!).
    Your worrying about i-nodes reminds me the old times of DOS, with small HD capacities and FAT (not even FAT32!).

    Fortunately my web references are a small number, generally referring to those documents so interesting that I also printed them (all or in part).
    As far as the db is concerned, it seems to me like to shoot to a fly with a gun....

    Referring to your other two points... duplicate files are not possible and in any case web references are not the primary key (so Codd's rules are saved!), and dynamic pages are not a problem... I'm not quering a web db... (or in any case the web reference includes the query).

    Bye... and excuse my poor English

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...