Results 1 to 8 of 8
I am aware that Perl has a lot of features that originally came from sed and awk. I have a pattern that I am using like this:
sed -n '/|Y|/p'
...
- 08-04-2008 #1Just Joined!
- Join Date
- Apr 2005
- Location
- Clinton Township, MI
- Posts
- 84
Perl syntax for sed searches
I am aware that Perl has a lot of features that originally came from sed and awk. I have a pattern that I am using like this:
sed -n '/|Y|/p'
I want to do the same thing in Perl and be able to either save that value in some kind of variable or array or potentially write it out to a file.
For the simple case, writing out to a file, I think the syntax is very close to the sed syntax. I would like to get a few recommendations, first, on a few alternative ways to write a similar expression in Perl, then how to do I/O properly.
My second question is that I have two files, both with pipe separated data. In the first file, I want to do a large data reduction first, taking the pattern above, and retaining only records containing |Y|. In the second file, I have a field containing an employee number with an A as the first digit. The other larger file contains this same data, but with a lower case a.
The complete exercise, then, is to first reduce the first file to records containing Y in the field, surrounded by the pipe symbol. Next, I compare the records that match in the second file to the employee ID field, after making sure it is lower cased in both files. Then I will need to print out the matching records, possibly including the first, third, and fourth fields.
Can anyone give me a few key technology snippets on this so I don't keep struggling with it, and I will then apply that technology to my modest sized application, which I am writing in Perl - for speed and portability. I have used Perl before, but I have never become an expert, and it has been years since I used it last. I am confusing myself with pieces of different syntax and making a lot of silly mistakes, therefore I would appreciate some sound advice to set me back on course. I am reading up using a few classics - the Perl Cookbook and Programming Perl, but both are large books and daunting to get through. Until I can digest them, I'd appreciate some pointers to accelerate my learning, and more importantly, get a script in at least a minimally usable form ASAP. Therefore, I appreciate specific tips. I'll get better at it once I have digested the classic resources and actually done more coding to regain the experience.
- 08-04-2008 #2Linux User
- Join Date
- Aug 2006
- Posts
- 458
regular expression, although powerful, are not necessarily the first thing you think of when solving such problems. Since you have a fixed pattern for delimiters, doing simple splitting and string manipulation is much simpler to understand than regexp. please post a sample input and the expected output you want.
- 08-04-2008 #3Just Joined!
- Join Date
- Apr 2005
- Location
- Clinton Township, MI
- Posts
- 84
Examples
The first file contains a list of IDs and a lot of other information similar to this:
The second file looks similar to this in syntax:Code:A227776|a227776|NET1\A227776|John|Brown|Brown, John|PLAN YEAR END SUMMARY SPECIALIST|EMP|OPTRPCOOOA8R|RSO-RSO1|RSO-RETIREMENT SERVICES OPS1|A|02695|TESTING & REPORTING SERVICES - COVINGTON - 02695|RSO-RETIREMENT SERVICES OPS|RSO-RSO|RSO|RS|PWI RETIREMENT SERVICES|RETIREMENT SERVICES OPERATIONS|COV|COVINGTON|KY|41015|KS2J|1547||John.Brown@myco.com|/o=mycompany/ou=CENTRAL1/cn=Recipients/cn=A227776|a216767|N|NOT ASSIGNED|NOT REGISTERED|SMTP:John.Brown@myco.com,smtp:A227776@PROD.myco.COM,smtp:A227776@mycompany.com,smtp:A227776@myco.com,smtp:John.Brown@mycompany.com a401204|a401204|NET1\A401204|Sam|Smith|Smith, Sam|SENIOR ASSOCIATE|EMP|BPOPSOOIA3|ISS-INBPO1|ISS-myco INDIA BPO1|A|13659|EP&S - OTHERS|ISS- myco INDIA BOOKS|ISS-INDBKS|ISS|IC|myco INDIA1|myco INDIA|KAR|BANGALORE|XX|560 071|XINB1B|2001||Sam.Smith@myco.com|/o=mycompany/ou=NORTHEAST1/cn=RECIPIENTS/cn=A401204|a437745|N|NOT ASSIGNED|NOT REGISTERED|SMTP:Sam.Smith@myco.com,smtp:a401204@PROD.myco.COM,smtp:a401204@mycompany.com,smtp:a401204@myco.com,smtp:Sam.Smith@mycompany.com a356002|a356002|NWK1\a356002|Sarah|Smith|Smith, Sarah (NFS)|TRANSACTION PROCESSING REPRESENTATIVE VI|EMP|OPTRPCOOOA7R|OSG-AM1|OSG-ACCOUNT MANAGEMENT1|A|10451|RIA NEW ACCOUNTS & MAINTENANCE (NFS)|OSG-ACCOUNT MANAGEMENT|OSG-AM|NFS|NF|OPERATIONS AND SVCS GROUP|OPERATIONS AND SERVICES GROUP|COV|COVINGTON|KY|41015|HSGD|6110||Sarah.Smith.OSG-CUSTPR@myco.com|/o=mycompany/ou=CENTRAL1/cn=Recipients/cn=a356002|a034229|N|NOT ASSIGNED|NOT REGISTERED|SMTP:Sarah.Smith.OSG-CUSTPR@myco.com,smtp:Sarah.Smith.OSG-CUSTPR@mycompany.com,smtp:a356002@mycompany.com,smtp:a356002@myco.com,smtp:a356002@PROD.myco.COM a421566|a421566|NWK1\a421566|Dan|Della Aiken|Della Aiken, Dan|TRANSACTION PROCESSING REPRESENTATIVE VI|EMP|OPTRPCOOOA7R|OSG-TPS1|OSG-TRADE PROCESSING & SETTLEMENTS1|A|04441|R&D DTC|OSG-TRADE PROCESSING & SETTLEMENT|OSG-TPS|NFS|NF|OPERATIONS AND SVCS GROUP|OPERATIONS AND SERVICES GROUP|NYC|JERSEY CITY|NJ|07311|NJAC2|6019||Dan.della.Aiken@myco.com|/o=mycompany/ou=NORTHEAST1/cn=Recipients/cn=a421566|a433432|N|NOT ASSIGNED|NOT REGISTERED|SMTP:Dan.della.Aiken@myco.com,smtp:a421566@mycompany.com,smtp:a421566@myco.com,smtp:a421566@prod.myco.com,smtp:Dan.della.Aiken@mycompany.com a429378|a429378|NET1\a429378|Mary|Jones|Jones, Mary E.|CUSTOMER SERVICE REPRESENTATIVE - RETIREMENT|EMP|CSTOBGOOOA1R|FCS-NKU1|FCS-NKUNIVERSITY1|A|05240|NKU PHONES-COV|FCS-CUSTOMER SUPPORT|FCS-CUSSUP|FCS|OA|PWI CHJaneL MANAGEMENT|CUSTOMER SUPPORT SERVICES|COV|COVINGTON|KY|41015|NKU1|1252||mary.e.Jones@myco.com|/o=mycompany/ou=CENTRAL1/cn=Recipients/cn=a429378|a364401|N|NOT ASSIGNED|NOT REGISTERED|SMTP:mary.e.Jones@myco.com,smtp:a429378@mycompany.com,smtp:a429378@myco.com,smtp:a429378@prod.myco.com,smtp:mary.e.Jones@mycompany.com A269542|a269542|NWK1\A269542|Jane|Green|Green, Jane|VP, training AND finance|EMP|MKGENROOOEDR|BMP-ADBRD1|BMP-ADVETISING&BRAND1|A|15119|ONLINE MARKETING- NEW GL 012|BMP-ADVERTISING & BRAND|BMP-ADBRD|FRB|PI|PERSONAL engineering|PERSONAL engineering PRODUCT & MARKETING|BOS|BOSTON|MA|02109|R5A|1083||Jane.Green@myco.COM|/o=mycompany/ou=NORTHEAST1/cn=Recipients/cn=A269542|a345841|Y|a345841|MY LLC|SMTP:Jane.Green@myco.COM,smtp:A269542@mycompany.COM,smtp:A269542@myco.COM,smtp:A269542@PROD.myco.COM,smtp:Jane.Green@mycompany.COM
In this contrived example, we would only have one record from the first list containing a |Y|.8889967|SMITH,SAM|3078|a004256|SKYWORDPLUS
1573147|BROWN,RICK|3904|a004321|SKYWRITER
1157433|JONES,JOE|7016|a004829|SKYWRITER
1084453|FERRIS,JOE|1205|a004833|SKYWORDPLUS
1087549|MERCY,MARY|4853|a006064|SKYWRITER
In the second list, none of the entries would match, regardless of whether we restricted our search to lines containing only Y or not.
In the real data, there are at least 22,000 lines of data, but less than half of them contain a |Y|. First, I need to cut down the comparison between the two files to lines containing ONLY |Y| flag. Then I need to search through the second file for lines that have a fourth field that matches the second field in the first file. If I do get a match, I want to print out the first, second, and fourth fields in the matching records in the second file.
So the questions I have are:
1. I know how to write a one liner in Sed to extract records from the first file containing fields with |Y|. That syntax is Sed is sed -n '/|Y|/p'. How do I write that in Perl so that I flag those records for comparison to the second file?
2. Once reduced by that amount, how to I compare the second field in the first file to the fourth field in the second file (and make sure that the comparisons match case - convert to lower case before making the comparison to match sure that the fields actually match?
3. For matching fields in the first and second files, print matching records, taking the first, second, and fourth fields from the second file to output. Put this output in another file and iterate until all remaining records in the two files are evaluated.
Can you help me sketch this, at least the key parts? Would be much appreciated. I know the design of what I want to do, but the Perl techniques are tripping me up. Unfortunately, Perl is the one language I can count on having in the environment where I would be using such a script, so help is appreciated.
- 08-04-2008 #4Linux User
- Join Date
- Aug 2006
- Posts
- 458
its the same syntax, something like this.
see perldoc perlre for moreCode:if ( /\|Y\|/ )
- 08-04-2008 #5Just Joined!
- Join Date
- Apr 2005
- Location
- Clinton Township, MI
- Posts
- 84
That should help with the first part, (which I have done in Sed previously, as I mentioned. I will try that.
Any suggestions on how to compare the second field in the first file with the fourth field in the second file, once I cut down the search in the (large) first file? I want to compare the two fields, then output records in the second file (first, second, and fourth fields).
- 08-04-2008 #6Just Joined!
- Join Date
- Apr 2005
- Location
- Clinton Township, MI
- Posts
- 84
First part works, help with the second part?
I implemented this first part and got the first file reduced to records containing |Y| using the following code:
Now if I can take that data and compare the second field (pipe separated) in this file and the fourth field in the second file, then output to a third file the records found in both files, including the first, second, and fourth fields from that second file this tool will be complete.Code:while (<MINPUT>) { if ( /\|Y\|/ ) { print MOUTPUT; my $line = <SINPUT>; } }
Any suggestions on that part?
- 08-05-2008 #7Linux Newbie
- Join Date
- Jul 2008
- Posts
- 181
I think you need to do this: find the matching lines in the first file, split them on "|" and put the second field (the ID, is it?) into an associative array. Then, you read the second file, split each line on "|" and check whether field four is already in your associative array. If so, print the required fields from that record.
You may find that it is more effective to split the lines in the first file at once and then compare whether a particular field has the value "Y".
- 08-05-2008 #8Just Joined!
- Join Date
- Apr 2005
- Location
- Clinton Township, MI
- Posts
- 84
CLOSED: Program written, thank you very much!
Here is the final version (edited) of what I came up with:
Code:#!/usr/bin/perl # File: MatchID.pl # Author: Brian Masinick # Initial Creation Date: August 1, 2008 # Inputs: # M input feed file # S input feed file # Intermediate outputs: # M output feed matching |Y| # Final outputs: # S output file matching records in M output file with those in S input file. # Required software: Perl 5.10 open (MINPUT,"<M_input_file") || die("Could not open M_input_file"); open (MOUTPUT,">M_output_file") || die("Could not open M_output_file"); open (SOUTPUT,">Sl_output_file") || die("Could not open S_output_file"); # This produces the intermediate file, so we have a written record of which set # of records we are actually processing to produce the final result. while (<MINPUT>) { if ( /\|Y\|/ ) { print MOUTPUT; } } close (MINPUT, M_input_file); close (MOUTPUT, M_output_file); print "M registered users recorded.\n"; open (MOUTPUT,"<M_output_file") || die("Could not open M_output_file"); while (<MOUTPUT>) { # split the input line into the @fields array @fields = split "[|]"; # store the line in the %moutput hash indexed by the second field $moutput{$fields[1]}=$_; } open (SINPUT,"<S_input_file") || die("Could not open S_input_file"); print "Reading S users file.\n"; while (<SINPUT>) { # split the input line into the @fields array @fields=split "[|]"; # if the fourth field exists in the %moutput hash, print the current # input line if (exists $moutput{$fields[3]}) { print SOUTPUT} } print "Processing complete.\n"; close (MOUTPUT, M_output_file); close (SINPUT, S_input_file); close (SOUTPUT, S_output_file); print "Files closed. Program complete.\n";


Reply With Quote
