Find the answer to your Linux question:
Results 1 to 5 of 5
Hi, I am really having a problem in retrieving a web page in a shell variable and grepping a text pattern,for example get a google page and save it in ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Sep 2012
    Posts
    1

    getting a web page in text format


    Hi,
    I am really having a problem in retrieving a web page in a shell variable and grepping a text pattern,for example get a google page and save it in a file and grep for google in it.Please somebody help on this as i am nearing my final project date in college.

  2. #2
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    477
    in most browsers, <ctrl>-s will permit saving the (html) page,
    then use the command "html2text" to extract the text
    then grep
    the sun is new every day (heraclitus)

  3. #3
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by Jal123 View Post
    Hi,
    I am really having a problem in retrieving a web page in a shell variable and grepping a text pattern,for example get a google page and save it in a file and grep for google in it.Please somebody help on this as i am nearing my final project date in college.
    Code:
    wget -O /tmp/google.html http://www.google.com
    grep --color google /tmp/google.html

  4. #4
    drl
    drl is offline
    Linux Engineer drl's Avatar
    Join Date
    Apr 2006
    Location
    Saint Paul, MN, USA / CentOS, Debian, Slackware, {Free, Open, Net}BSD, Solaris
    Posts
    1,288
    Hi.

    See also text-mode browsers, like lynx:
    Code:
           -dump  dumps the formatted output of the default document or those
                  specified on the command line to standard output.  Unlike
                  interactive mode, all documents are processed.  This can be used
                  in the following way:
    
                  lynx -dump http://www.subir.com/lynx.html
    
    excerpt from man lynx
    This eliminates the markups and can make the task of searching the resulting text easier.

    Best wishes ... cheers, drl
    Welcome - get the most out of the forum by reading forum basics and guidelines: click here.
    90% of questions can be answered by using man pages, Quick Search, Advanced Search, Google search, Wikipedia.
    We look forward to helping you with the challenge of the other 10%.
    ( Mn, 2.6.n, AMD-64 3000+, ASUS A8V Deluxe, 1 GB, SATA + IDE, Matrox G400 AGP )

  5. #5
    Just Joined!
    Join Date
    Sep 2012
    Location
    India
    Posts
    29
    Hi,

    As per your requirement,

    you can display page in browser using commands like elinks and curl with -dump option mentioned by drl
    you can perform below steps

    elinks -dump "web site name" > text1
    grep 'pattern' text1
    this command will display the web page content which will be saved it in text1 variable and u can grep it
    for respective pattern.

    curl -dump "web site name" > text1
    grep 'pattern' text1
    this command will display the code of the web page and will save it in text1 variable and u can grep it

    Regards,
    Best Wishes.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •