Find the answer to your Linux question:
Results 1 to 4 of 4
Hey there! I'm having to parse through large XML files containing IP addresses both IPv6 and IPv4 as well as hostnames (origins) and datacenter names. The XML format is pretty ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined! exelan's Avatar
    Join Date
    Jun 2012
    Location
    Montana
    Posts
    8

    Wink Parsing Large XML files with Perl Script


    Hey there!

    I'm having to parse through large XML files containing IP addresses both IPv6 and IPv4 as well as hostnames (origins) and datacenter names. The XML format is pretty unreadable and I would like to take those values and display them in a more friendly way.

    I had a similar post previously and would like to emulate the same kind of behavior if possible.

    <Sorry, I can't seem to link URLs until I've posted 15 or more>

    This time the variables have changed and a I'd like to add a few minor things to the script since the data is a little different.

    Okay so first off, an example of the data that I'm trying to parse:

    Code:
    <roles>
      <role name="adminservice">
        <datacenter name="blu">
          <origin name="adminservice-s1.msoc.nsack.net">
            <resource name="adminservice-s1-blu" value="adminservice-s1-blu.msoc.nsack.net" />
    Occasionally there will be a resource name containing both an IPv4 and an IPv6 address as the "value" and sometimes just an alias name for the "value". But regardless the authoritative factor would be "origin name" that either "value" types would fall under, ex:

    Code:
    <origin name="p-a9-adm.msocs.nsack.net">
            <resource name="BLU-ADM-PROC-Anchor" value="140.40.140.40" />
            <resource name="ipv6_bba100-anchor" value="2001:222:f400:f00::66" />
    So there are 5 variables that I'm hoping to be outputted in a clean fashion so I can compare which origin names have IP addresses, either v4 or v6 and be able to see what Role, Role Name, and Data Center those origins belong to.

    • roles
    • role name
    • datacenter name
    • origin name
    • value


    Sorry if this seems convoluted in any way, but it's just easier for me to see a clean output from such a huge XML database.

    Thanks so much!

  2. #2
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    hello again, exelan,

    Your XML does not look too cluttered, so that is good. I think you could do this with the XML::Simple Perl module (assuming Perl is good w/you).

    I know you have specific needs on how you format your output, once you've parsed it, but here is some quick code that should do the parsing, based upon the XML you've supplied.

    First, here's an XML data file I mocked up, based upon your input:

    Code:
    <?xml version="1.0" encoding="UTF-8"?>
    <roles>
      <role name="adminservice">
    
        <datacenter name="blu">
          <origin name="adminservice-s1.msoc.nsack.net">
            <resource name="adminservice-s1-blu" value="adminservice-s1-blu.msoc.nsack.net" />
          </origin>
        </datacenter>
    
        <datacenter name="blu1">
          <origin name="p-a9-adm.msocs.nsack.net">
            <resource name="BLU-ADM-PROC-Anchor" value="140.40.140.40" />
            <resource name="ipv6_bba100-anchor" value="2001:222:f400:f00::66" />
          </origin>
        </datacenter>
      </role>
    
      <role name="adminservice1">
        <datacenter name="blu2">
        </datacenter>
      </role>
    
    </roles>
    Now here's the code:
    Code:
    #!/usr/bin/perl
    use strict;
    use warnings;
    use XML::Simple qw(:strict);
    
    # the file containing the XML data
    my $file = shift || die "Give me the XML file\n";
    
    # options to pass to XML::Simple
    my $ref_in = {
      ForceArray    => ['role','datacenter','origin','resource'],
      KeyAttr       => [ ],
      SuppressEmpty => '',
    };
    
    # parse the XML file and return a hash ref of it
    my $ref = eval { XMLin($file,%$ref_in) or die "Can't read XML: $!\n" };
    if($@){
      chomp($@);
      die $@;
    }
    
    # now loop thru the hash ref and print the data
    my @roles = @{$ref->{'role'}};
    for my $role(@roles){
      print "
    ------------------------------------------------------------
    Role: \"",$role->{'name'},"\"\n";
      my @datacenters = $role->{'datacenter'};
      for(@datacenters){
        my @dca = @{$_};
        for my $el(@dca){
          print "\n\tDatacenter: \"",$el->{'name'},"\"\n";
          my @oref = @{$el->{'origin'}} if($el->{'origin'});
          for my $el1(@oref){
            print "\n\t\tOrigin: \"",$el1->{'name'},"\"\n";
            my @resources = @{$el1->{'resource'}} if($el1->{'resource'});
            for(my $r=0;$r<=$#resources;$r++){
              for(qw/name value/){
                print "\n\t\t\tResource $r $_: \"",$resources[$r]->{$_},"\"";
              }
              print "\n";
            }
          }
        }
      }
    }
    
    exit(0);
    Running the perl code on the data file yields this for me:

    Code:
    ------------------------------------------------------------
    Role: "adminservice"
    
            Datacenter: "blu"
    
                    Origin: "adminservice-s1.msoc.nsack.net"
    
                            Resource 0 name: "adminservice-s1-blu"
                            Resource 0 value: "adminservice-s1-blu.msoc.nsack.net"
    
            Datacenter: "blu1"
    
                    Origin: "p-a9-adm.msocs.nsack.net"
    
                            Resource 0 name: "BLU-ADM-PROC-Anchor"
                            Resource 0 value: "140.40.140.40"
    
                            Resource 1 name: "ipv6_bba100-anchor"
                            Resource 1 value: "2001:222:f400:f00::66"
    
    ------------------------------------------------------------
    Role: "adminservice1"
    
            Datacenter: "blu2"
    Let me know if your XML data has more subtle differences that break it. You'll probably want to change how the data is printed out. Give it a whirl, or ping back if you are having trouble w/that.
    Last edited by atreyu; 09-29-2012 at 03:47 AM. Reason: typo

  3. #3
    Just Joined! exelan's Avatar
    Join Date
    Jun 2012
    Location
    Montana
    Posts
    8
    Hello again atreyu!

    Thanks so much for the reply! Sorry it took so long for me to try it out, I was out of town all weekend.

    I ran it against my XML data and it works out just fine! A couple different XML files that have subtle differences are returning some errors but I think it's just the source material.

    I can't thank you enough again man! This saved me a LOT of time and opens up doors for automating larger files down the road


    Many thank!

  4. $spacer_open
    $spacer_close
  5. #4
    Trusted Penguin
    Join Date
    May 2011
    Posts
    4,353
    Quote Originally Posted by exelan View Post
    I ran it against my XML data and it works out just fine! A couple different XML files that have subtle differences are returning some errors but I think it's just the source material.
    that's good. if you can't figure out what's causing the errors, feel free to post them, and i'll see if i can give you some pointers.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •