Results 1 to 4 of 4
Hey there!
I'm having to parse through large XML files containing IP addresses both IPv6 and IPv4 as well as hostnames (origins) and datacenter names. The XML format is pretty ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 09-28-2012 #1
Parsing Large XML files with Perl Script
Hey there!
I'm having to parse through large XML files containing IP addresses both IPv6 and IPv4 as well as hostnames (origins) and datacenter names. The XML format is pretty unreadable and I would like to take those values and display them in a more friendly way.
I had a similar post previously and would like to emulate the same kind of behavior if possible.
<Sorry, I can't seem to link URLs until I've posted 15 or more>
This time the variables have changed and a I'd like to add a few minor things to the script since the data is a little different.
Okay so first off, an example of the data that I'm trying to parse:
Occasionally there will be a resource name containing both an IPv4 and an IPv6 address as the "value" and sometimes just an alias name for the "value". But regardless the authoritative factor would be "origin name" that either "value" types would fall under, ex:Code:<roles> <role name="adminservice"> <datacenter name="blu"> <origin name="adminservice-s1.msoc.nsack.net"> <resource name="adminservice-s1-blu" value="adminservice-s1-blu.msoc.nsack.net" />
So there are 5 variables that I'm hoping to be outputted in a clean fashion so I can compare which origin names have IP addresses, either v4 or v6 and be able to see what Role, Role Name, and Data Center those origins belong to.Code:<origin name="p-a9-adm.msocs.nsack.net"> <resource name="BLU-ADM-PROC-Anchor" value="140.40.140.40" /> <resource name="ipv6_bba100-anchor" value="2001:222:f400:f00::66" />
- roles
- role name
- datacenter name
- origin name
- value
Sorry if this seems convoluted in any way, but it's just easier for me to see a clean output from such a huge XML database.
Thanks so much!
- 09-29-2012 #2Trusted Penguin
- Join Date
- May 2011
- Posts
- 3,673
hello again, exelan,
Your XML does not look too cluttered, so that is good. I think you could do this with the XML::Simple Perl module (assuming Perl is good w/you).
I know you have specific needs on how you format your output, once you've parsed it, but here is some quick code that should do the parsing, based upon the XML you've supplied.
First, here's an XML data file I mocked up, based upon your input:
Now here's the code:Code:<?xml version="1.0" encoding="UTF-8"?> <roles> <role name="adminservice"> <datacenter name="blu"> <origin name="adminservice-s1.msoc.nsack.net"> <resource name="adminservice-s1-blu" value="adminservice-s1-blu.msoc.nsack.net" /> </origin> </datacenter> <datacenter name="blu1"> <origin name="p-a9-adm.msocs.nsack.net"> <resource name="BLU-ADM-PROC-Anchor" value="140.40.140.40" /> <resource name="ipv6_bba100-anchor" value="2001:222:f400:f00::66" /> </origin> </datacenter> </role> <role name="adminservice1"> <datacenter name="blu2"> </datacenter> </role> </roles>
Running the perl code on the data file yields this for me:Code:#!/usr/bin/perl use strict; use warnings; use XML::Simple qw(:strict); # the file containing the XML data my $file = shift || die "Give me the XML file\n"; # options to pass to XML::Simple my $ref_in = { ForceArray => ['role','datacenter','origin','resource'], KeyAttr => [ ], SuppressEmpty => '', }; # parse the XML file and return a hash ref of it my $ref = eval { XMLin($file,%$ref_in) or die "Can't read XML: $!\n" }; if($@){ chomp($@); die $@; } # now loop thru the hash ref and print the data my @roles = @{$ref->{'role'}}; for my $role(@roles){ print " ------------------------------------------------------------ Role: \"",$role->{'name'},"\"\n"; my @datacenters = $role->{'datacenter'}; for(@datacenters){ my @dca = @{$_}; for my $el(@dca){ print "\n\tDatacenter: \"",$el->{'name'},"\"\n"; my @oref = @{$el->{'origin'}} if($el->{'origin'}); for my $el1(@oref){ print "\n\t\tOrigin: \"",$el1->{'name'},"\"\n"; my @resources = @{$el1->{'resource'}} if($el1->{'resource'}); for(my $r=0;$r<=$#resources;$r++){ for(qw/name value/){ print "\n\t\t\tResource $r $_: \"",$resources[$r]->{$_},"\""; } print "\n"; } } } } } exit(0);
Let me know if your XML data has more subtle differences that break it. You'll probably want to change how the data is printed out. Give it a whirl, or ping back if you are having trouble w/that.Code:------------------------------------------------------------ Role: "adminservice" Datacenter: "blu" Origin: "adminservice-s1.msoc.nsack.net" Resource 0 name: "adminservice-s1-blu" Resource 0 value: "adminservice-s1-blu.msoc.nsack.net" Datacenter: "blu1" Origin: "p-a9-adm.msocs.nsack.net" Resource 0 name: "BLU-ADM-PROC-Anchor" Resource 0 value: "140.40.140.40" Resource 1 name: "ipv6_bba100-anchor" Resource 1 value: "2001:222:f400:f00::66" ------------------------------------------------------------ Role: "adminservice1" Datacenter: "blu2"Last edited by atreyu; 09-29-2012 at 03:47 AM. Reason: typo
- 10-01-2012 #3
Hello again atreyu!
Thanks so much for the reply! Sorry it took so long for me to try it out, I was out of town all weekend.
I ran it against my XML data and it works out just fine! A couple different XML files that have subtle differences are returning some errors but I think it's just the source material.
I can't thank you enough again man! This saved me a LOT of time and opens up doors for automating larger files down the road
Many thank!
- 10-02-2012 #4Trusted Penguin
- Join Date
- May 2011
- Posts
- 3,673


Reply With Quote

