Find the answer to your Linux question:
Results 1 to 4 of 4
Hi I am a newbie and need you expert's help. I got a doc root (lets say /site/mysite/docs/) where i want to execute a recursive grep on all the directories ...
  1. #1
    Just Joined!
    Join Date
    Feb 2011
    Posts
    3

    help with grep .. URGENT

    Hi I am a newbie and need you expert's help.

    I got a doc root (lets say /site/mysite/docs/) where i want to execute a recursive grep on all the directories and get a list of files in a file_list.txt

    Now search is like this

    1. Capture all files which has "<!--# ((Any Text Here)) -->"
    2. Capture all files that has "<!--# ((Any Text Here)) -->" as well as "<!--#include virtual= ((Path To SSI/HTML)) -->" BOTH
    3. Ignore all file that has "<!--#include virtual= ((PATH TO SSI/HTML))-->" ONLY


    Can someone help ?

  2. #2
    Just Joined!
    Join Date
    Feb 2011
    Posts
    3
    I was able to get first two points done with following

    find /site/mysite/docs/ -exec grep -ls '<!--#' {} \; > ssi_file_list.txt

    However my boss needs to cut off files which has "<!--#include virtual= ((PATH TO SSI/HTML))-->" ONLY.

  3. #3
    Just Joined!
    Join Date
    Feb 2011
    Posts
    3
    Please help

  4. #4
    Just Joined!
    Join Date
    Feb 2011
    Posts
    12
    You may do,

    Code:
    uname@ubuntu:~$ grep -l '<!--#' `ls /site/mysite/docs/` > 1.tmp
    And then do,

    Code:
    uname@ubuntu:~$ grep -l '<!--#[^i]' `cat 1.tmp` >> 2.tmp
    uname@ubuntu:~$ grep -l '<!--#include [^v]' `cat 1.tmp` >> 2.tmp
    And finally do,

    Code:
    uname@ubuntu:~$ diff 1.tmp 2.tmp | grep '<' | sed 's/< //g'
    The first command will isolate the files that contain <!--# (that is, files that contain any SSI).
    The second command will further isolate the files that do not contain <!--#i (that is, files that contain SSI, but do not contain inclusions); and then the files that do not contain <!--#include v (that is, files that contain SSI inclusions, but not virtual ones).

    So far, 1.tmp contains a list of files that contain any SSI, and 2.tmp contains a list of files that contain SSI other than virtual inclusions.
    The last command will compare the results and output (after light formatting) files that contain SSI, but do not contain SSI other than virtual inclusions (that is, contain only virtual inclusions). That's what you want.

    Note: I assume the files contain simple SSI; erroneous or over-sophisticated syntax will cause the script to fail, and taking into account these extreme cases really complicates the matter.

    Also,

    Code:
    uname@ubuntu:~$ rm 1.tmp 2.tmp

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...