Results 1 to 4 of 4
Hai..
I am trying to extract particular tag from a xml file(<paragraph> need the text inside </paragraph>.. Xml i am trying to get this tag has a lot of paragraph ...
- 10-19-2010 #1Just Joined!
- Join Date
- Oct 2008
- Posts
- 3
A script to extract XML Tag data
Hai..

I am trying to extract particular tag from a xml file(<paragraph> need the text inside</paragraph>.. Xml i am trying to get this tag has a lot of paragraph tags a lot which are not needed.. The paragraph which i need could be identified another tag named code...
Code tag has unique values which make the corresponding paragraph tag important and other not so important to extract...
I am trying with bash and i am thinking its not the right way to go.. if any one could help me with this it could save my day..
Code:<component> <section ID="i4i_interactions_id_inv-0da0a5da-2792-4b25-8821-cefdbbe08819"> <id root="0f1e6adc-de1c-46ca-b1e3-fbad4cbbdb64"/> <code codeSystem="2.16.840.1.113883.6.1" code="34073-7(this is the session that makes the below paragraph important)" displayName="DRUG INTERACTIONS SECTION"/> <title>Drug Interactions </title> <text> <paragraph>(Also see <content styleCode="bold">CLINICAL PHARMACOLOGY</content>,<content styleCode="bold"> Pharmacokinetics</content>,* <content styleCode="italics">Drug Interactions</content>.)</paragraph> </text> <effectiveTime value="20100423"/> <component> <section ID="i4i_section_id_inv-0b406c9e-fb73-4796-84e4-47b4f22b3216"> <id root="0824f485-524a-438e-b0bc-ea9c8730ed95"/> <code codeSystem="2.16.840.1.113883.6.1" code="42229-5" displayName="SPL UNCLASSIFIED SECTION"/> <title>Estrogen/Hormone Replacement Therapy (HRT)</title> <text> <paragraph>Concomitant use of HRT (estrogen ± progestin) and alendronate sodium was assessed in two clinical studies of one or two years’ duration in postmenopausal osteoporotic women. In these studies, the safety and tolerability profile of the combination was consistent with those of the individual treatments; however, the degree of suppression of bone turnover (as assessed by mineralizing surface) was significantly greater with the combination than with either component alone. The long-term effects of combined alendronate sodium and HRT on fracture occurrence have not been studied(this is what i need to extract) (see <content styleCode="bold">CLINICAL PHARMACOLOGY</content>, <content styleCode="bold">Clinical Studies</content>, <content styleCode="italics">Concomitant Use With Estrogen/Hormone Replacement Therapy (HRT)</content> and <content styleCode="bold">ADVERSE REACTIONS</content>, <content styleCode="bold">Clinical Studies</content>, <content styleCode="underline">Concomitant use with estrogen/hormone replacement therapy</content>). </paragraph> </text> <effectiveTime value="20100423"/> </section> </component>
There is a lot of identical component element only differentiation is code value..
Now this is kinda getting me work really hard... And i am kinda lost... Help !!!!!
- 10-19-2010 #2
Use XPath or XSL transformations.
man xpath, xmllint and xsltprocLast edited by Kloschüssel; 10-19-2010 at 11:22 AM.
- 10-19-2010 #3Just Joined!
- Join Date
- Oct 2008
- Posts
- 3
Problem is i have no domain knowledge regarding those
... Is that the only option...
- 10-19-2010 #4
Well, it is the option I do recommend. It's up to you (or your supervisor or his supervisor) to decide which solution would suite best all your requirements. If you have not the resources to do the things on your own you will need to hire an expert.
Read this, this and this. Then read this and this. Finally this could also be of some help.Last edited by Kloschüssel; 10-19-2010 at 11:36 AM.


Reply With Quote