Find the answer to your Linux question:
Results 1 to 6 of 6
Hi all, Say below shown html is the file <!--file.html--> <html> <head> <title>1_3-BF-01</title> </head> <body> <div class="TestPurpose">The content of the script element MUST be treated as if its display property ...
  1. #1
    Just Joined!
    Join Date
    May 2009
    Posts
    5

    Retrieve Value between tags

    Hi all,


    Say below shown html is the file

    <!--file.html-->

    <html>
    <head>
    <title>1_3-BF-01</title>
    </head>
    <body>
    <div class="TestPurpose">The content of the script element MUST
    be treated as if its display property were set to the value "none"
    and the content of the noscript
    element printed</div>
    <div class="AssertionsTested">Tests for assertions 1</div>
    <script type="plain/text">The text must not be printed</script><noscript><p>This text must be printed</p></noscript>
    </body>



    and i want the content between the <div class="TestPurpose"> </div>


    which is as output




    The content of the script element MUST
    be treated as if its display property were set to the value "none"
    and the content of the noscript
    element printed

    How to do it ?

    I have simple gawk script when run like ($gawk -f getvalue.awk file.html )
    will give only

    The content of the script element MUST


    which is able to do partially (only one line). How and where do i tweak it to achieve the output shown above .

    #getvalue.awk

    function stripInputRecord ( inputRecord )
    {

    gsub ( /^\t*<div.*">/, "", inputRecord);
    gsub ( /<\/div.*>/, "", inputRecord);
    return inputRecord;
    }


    /TestPurpose/ {
    str = str stripInputRecord( $0 );
    }
    END {
    print str;
    }


    Thanks in advance for any inputs
    --
    Thanks & Regards,
    Siddu

  2. #2
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    if you have Python
    Code:
    #!/usr/bin/env python
    import re
    pat=re.compile(".*<div class=\"TestPurpose\">(.*?)<\/div>.*",re.M|re.DOTALL)
    data=open("file").read()
    print pat.findall(data)[0]
    output
    Code:
    # ./test.py
    The content of the script element MUST
    be treated as if its display property were set to the value "none"
    and the content of the noscript
    element printed

  3. #3
    Just Joined!
    Join Date
    May 2009
    Posts
    5
    I do have python

    Thanks Dude !

  4. #4
    Just Joined!
    Join Date
    May 2009
    Posts
    5
    Dude But then what if i had iterate over several files

    I am sorry ,
    I dont know python

    Help Me !

  5. #5
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    Code:
    #!/usr/bin/env  python
    import re,os
    pat=re.compile(".*<div class=\"TestPurpose\">(.*?)<\/div>.*",re.M|re.DOTALL)
    for files in os.listdir("."):
        data=open(files).read()
        print pat.findall(data)[0]
    assuming all files and the script is in current directory. please look at the documentation (see my sig) if you want to learn about Python.

  6. #6
    Just Joined!
    Join Date
    May 2009
    Posts
    5
    Yes i would some time soon

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...