Find the answer to your Linux question:
Results 1 to 4 of 4
Hai.. I am trying to extract particular tag from a xml file(<paragraph> need the text inside </paragraph>.. Xml i am trying to get this tag has a lot of paragraph ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Oct 2008
    Posts
    3

    Lightbulb A script to extract XML Tag data


    Hai..

    I am trying to extract particular tag from a xml file(<paragraph> need the text inside</paragraph>.. Xml i am trying to get this tag has a lot of paragraph tags a lot which are not needed.. The paragraph which i need could be identified another tag named code...

    Code tag has unique values which make the corresponding paragraph tag important and other not so important to extract...

    I am trying with bash and i am thinking its not the right way to go.. if any one could help me with this it could save my day..


    Code:
    <component>
                      <section ID="i4i_interactions_id_inv-0da0a5da-2792-4b25-8821-cefdbbe08819">
                         <id root="0f1e6adc-de1c-46ca-b1e3-fbad4cbbdb64"/>
                         <code codeSystem="2.16.840.1.113883.6.1" code="34073-7(this is the session that makes the below paragraph important)" displayName="DRUG INTERACTIONS SECTION"/>
                         <title>Drug Interactions </title>
                         <text>
                            <paragraph>(Also see <content styleCode="bold">CLINICAL PHARMACOLOGY</content>,<content styleCode="bold"> Pharmacokinetics</content>,* <content styleCode="italics">Drug Interactions</content>.)</paragraph>
                         </text>
                         <effectiveTime value="20100423"/>
                         <component>
                            <section ID="i4i_section_id_inv-0b406c9e-fb73-4796-84e4-47b4f22b3216">
                               <id root="0824f485-524a-438e-b0bc-ea9c8730ed95"/>
                               <code codeSystem="2.16.840.1.113883.6.1" code="42229-5" displayName="SPL UNCLASSIFIED SECTION"/>
                               <title>Estrogen/Hormone Replacement Therapy (HRT)</title>
                               <text>
                                  <paragraph>Concomitant use of HRT (estrogen ± progestin) and alendronate sodium was assessed in two clinical studies of one or two years’ duration in postmenopausal osteoporotic women. In these studies, the safety and tolerability profile of the combination was consistent with those of the individual treatments; however, the degree of suppression of bone turnover (as assessed by mineralizing surface) was significantly greater with the combination than with either component alone. The long-term effects of combined alendronate sodium and HRT on fracture occurrence have not been studied(this is what i need to extract) (see <content styleCode="bold">CLINICAL PHARMACOLOGY</content>, <content styleCode="bold">Clinical Studies</content>, <content styleCode="italics">Concomitant Use With Estrogen/Hormone Replacement Therapy (HRT)</content> and <content styleCode="bold">ADVERSE REACTIONS</content>, <content styleCode="bold">Clinical Studies</content>, <content styleCode="underline">Concomitant use with estrogen/hormone replacement therapy</content>). </paragraph>
                               </text>
                               <effectiveTime value="20100423"/>
                            </section>
                         </component>

    There is a lot of identical component element only differentiation is code value..

    Now this is kinda getting me work really hard... And i am kinda lost... Help !!!!!

  2. #2
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    Use XPath or XSL transformations.

    man xpath, xmllint and xsltproc
    Last edited by Kloschüssel; 10-19-2010 at 11:22 AM.

  3. #3
    Just Joined!
    Join Date
    Oct 2008
    Posts
    3
    Problem is i have no domain knowledge regarding those ... Is that the only option...

  4. #4
    Linux Engineer Kloschüssel's Avatar
    Join Date
    Oct 2005
    Location
    Italy
    Posts
    773
    Well, it is the option I do recommend. It's up to you (or your supervisor or his supervisor) to decide which solution would suite best all your requirements. If you have not the resources to do the things on your own you will need to hire an expert.

    Read this, this and this. Then read this and this. Finally this could also be of some help.
    Last edited by Kloschüssel; 10-19-2010 at 11:36 AM.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •