Welcome to Linux Forums! With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.
Find the answer to your Linux question:
Site Navigation
Linux Forums
Linux Articles
Product Showcase
Linux Downloads
Linux Hosting
Free Magazines
Job Board
IRC Chat
RSS Feeds
Free Publications


In the previouslesson we learnt about If statements and loops. We also started tostudy how to open files and how to write and read into them. In thislesson we will learn how to compare strings and we will study someadvanced techniques for parsing files.

String comparisons

The =~ operator

Perl provides an operator which you'll find very useful to parseand search files: "=~". If you are not familiar with thisoperator, think of it as a "contains" operator. Forinstance:

"helloworld" =~ "world" returns true, as "helloworld" contains "world".
"helloworld" =~ "o worl" also returns true since "oworl" is included in the string "hello world".
"helloworld" =~ "wrld" returns false because thestring "hello world" does not contain "wrld".

Using the =~ operator you can easily testif a variable contains a particular string, and this will help you alot while parsing text files. You can also use regular expressions inconjonction with the =~ operator. Although it is too early at thisstage to study regular expressions in details, here are sometechniques that you can use with =~.

We replace the doublequotes by forward slashes in order to tell our =~ operator that we'renot simply looking for a string anymore but for a matching pattern(with a bit of logic inside it):

"helloworld" =~ "world" is the same as "helloworld" =~ /world/

Although"world" represents a string and /world/ represents anexpression, these two instructions return true. By adding logic tothe expression, we can refine the meaning of our =~ operator.

=~/^Starts with/

A leading ^ sign changes the meaning of the operator from"contains" to "starts with":

"helloworld" =~ /world/ returns true because "hello world"contains "world".
"helloworld" =~ /^world/ returns false since "hello world"doesn't start with "world".
"helloworld" =~ /^hell/returns true because "hello world" starts with"hell".

=~ /Ends with$/

By adding a $ sign in the end of the expression you can change themeaning of the operator from "contains" to "endswith":

"hello world"=~ /world/ returns true because "hello world"contains "world".
"helloworld" =~ /world$/ also returns true, but this time it'sbecause "hello world" ends with "world".
"helloworld" =~ /hello$/ returns false, because "helloworld" doesn't end with "hello".

The eq and ne operators

You can use both the ^ and $ signsin the same expression, and it would mean that you're looking for astring with which your variable would both starts and end. Forinstance:

"hello world"=~ /^hello world$/ returns true because "hello world"starts and ends with "hello world".
"helloworld" =~ /^hello$/ returns false, because although"hello world" starts with "hello" it doesn't endwith it..

Note that there is no much point using both ^ and $in the same expression. If you're string starts and ends withsomething it is likely to be equal to that something... if you wantto test the equality of two strings, you can simply use the eqoperator:

"hello world"eq "hello world" returns true because the twostrings are identical.

The ne operator tests the non-equalityof two strings. It returns true if the strings are different andfalse otherwise:

"helloworld" ne "good night" returns true.

"helloworld" ne "Hello worlD" returns true (remember that Perlis case-sensitive)

"helloworld" ne "hello world" returns false because bothstrings are the same.

Remember to use the eq and ne operatorsto test the equality of strings in Perl, and their equivalence == and!= to test numerical values.

The !~ operator

The !~ operator is used as a “does not contain” operator. What!= is to ==, ne is to eq and !~ is to =~. For instance:

"hello world" !~ "world"returns false because “hello world” does contain “world”.

"hello world" !~ "wwt"returns true because “hello world” does not contain “wwt”.

Case insensitive search

When you use the =~ operator you test the matching of a stringwithin another, this is always case sensitive. For instance:

"hello world" =~ "world"returns true.

"hello world" =~ "woRld"returns false.

If you want to make the =~ operator insensitive, add an “i”after the expression:

"hello world" =~ /world/ireturns true.

"hello world" =~ /woRld/ialso returns true.

Substitutions

The =~ operator can also be used to find occurrences of a stringwithin a variable and substitute them with another string. Forinstance, if you have a variable which contains text, and you want tochange all occurrences of “aaa” with “aab” within that text,you can simply use the following substitution:

$variable =~ s/aaa/aab/;

All occurrences of “aaa” within $variable will then be changedto “aab”. Note that we prefixed our expression with an “s” tochange the meaning of the operator from “contains” to“substitute”.

Parsing files

There are many ways to parse a text file. In Perl, if the file hasits data organized line by line with delimiters, it is very easy toparse it.

Let's study a simple example. We have a set of employees in a filecalled employees.txt. In this file, each line represents an employee.The information relative to each employee is delimited with tabs, thefirst column is the name of the employee, the second column indicateshis department and the third one his salary. Here is an overview ofthe file:

Mr John Doe R&D 21000
Miss Gloria Dunne HR 23000
Mr Jack Stevens HR 45000
Mrs Julie Fay R&D 30000
Mr Patrick Reed R&D 33000

In order to obtain some statistics, the HR department wants toestablish a list of all male employees who work in the R&Ddepartment and which salary is more than 25000.

To obtain this list, we design a simple Perl script, which:

  1. opens the employees.txt file

  2. loops through each line

  3. identifies the name, department and salary of the employee

  4. ignores and goes to the next line if the employee is female (the name does not start with Mr)

  5. ignores and goes to the next line if the salary is less or equal to 25000.

  6. ignores and goes to the next line if the department is not “R&D”.

  7. prints the name and the salary of the employee on the screen.

To do this, we'll introduce two Perl functions:

  • “chomp” is used to remove the carriage return found in the end of the line. For instance chomp $variable removes all carriage returns in the variable.

  • “split” is used to cut the line in different parts where it finds a delimiter. For instance split /o/, “hello world” returns an array containing “hell”, “ w” and “rld”. In our example we'll split the lines with the tab delimiter, which in Perl is written “t”.

Here is the script which establishes the list of male employeesfrom the R&D department with a salary greater than 25000. To makethings a bit clearer, comments were introduced within the scripts(comments in Perl start with a # sign):

#open the employeesfile
open (EMPLOYEES,"employees.txt");

#for each line
while ($line =) {

#remove thecarriage return
chomp $line;

#split the linebetween tabs
#and get thedifferent elements
($name,$department, $salary) = split /t/, $line;

#go to the nextline unless the name starts with "Mr "
next unless$name =~ /^Mr /;

#go to the nextline unless the salary is more than 25000.
next unless$salary > 25000;

#go to the nextline unless the department is R&D.
next unless$department eq "R&D";

#since allemployees here are male,
#remove theparticle in front of their name
$name =~ s/Mr//;

print"$namen";
}
close (EMPLOYEES);

Study the script carefully and makesure you understand every part of it. Each instruction was eitherexplained in this lesson or in one of the previous ones. If you haveany question, do not hesitate to ask.

In the next lesson we'll look at how tointeract with the filesystem and the Linux operating system from ourPerl scripts.


Lesson 3

Lesson 5

Rate This Article: poorexcellent
 
Comments about this article
Typo?
writen by: Mark C. on 2006-08-02 14:16:52
Is this a typo in the =~ //i discussion? [quote]"hello world" =~ "world" returns true. "hello world" =~ "woRld" returns true.[/quote]
RE: Typo? written by Mark C.:
Lesson 5
writen by: Keith R. Martin on 2006-08-09 07:58:14
These were very insightful articles. When is lesson number 5 due out? Again thank you for the articles.
RE: Lesson 5 written by Keith R. Martin:
Error in last script
writen by: Guru on 2006-11-03 03:20:37
The script only prints the name not the salary as described above it shuld also do the printing of salarya print $name,"t",$salary,"n";
RE: Error in last script written by Guru:
Well..
writen by: Ted on 2007-01-22 07:12:07
RE: Well.. written by Ted:
Mr
writen by: Gattu on 2007-11-04 06:11:58
The place where while loop begins.. there is no handle mentioned? shouldnt be while ($line = <EMPLOYEES>) { .... } Not sure there is any defaults, which works!
RE: Mr written by Gattu:
YUP
writen by: Anurag Goel on 2008-05-09 03:35:50
RE: YUP written by Anurag Goel:
Modifications to the Parsing Code
writen by: silvertip257 on 2008-05-18 08:37:49
RE: Modifications to the Parsing Code written by silvertip257:
writen by: russkiy on 2008-09-20 04:38:17
RE: written by russkiy:
something wrong
writen by: Jeffrey on 2008-11-06 04:36:44
Are you sure all the employees in R&D department are male? Actually the employee Julie Fay will be printed out: Mrs Julie Fay R&D 30000
RE: something wrong written by Jeffrey:
error
writen by: johnny on 2009-01-14 01:57:36
i used the same example of the open file (We have a set of employees in a filecalled employees.txt. In this file, each line represents an employee.The information relative to each employee is delimited with tabs, thefirst column is the name of the employee, the second column indicateshis department and the third one his salary.) i got such errors syntax error at salary.pl line 6, near ")=" syntax error at salary.pl line 29, near "}" Execution of salary.pl aborted due to compilation errors.
RE: error written by johnny:
how we parse the file that has details i
writen by: abduljaleel.k on 2009-02-12 08:09:58
if the employee file is like following, then how we do parsing. Mr John Doe R&D 21000 Miss Gloria Dunne HR 23000 Mr Jack Stevens HR 45000 Mrs Julie Fay R&D 30000 Mr Patrick Reed R&D 33000
RE: how we parse the file that has details i written by abduljaleel.k:
The code will also display a female employee
writen by: Leo-G on 2009-11-17 01:37:29
next unless$name =~ /^Mr /;

Mrs Julie Fay R
RE: The code will also display a female employee written by Leo-G:
error in code (split function)
writen by: mehdar on 2010-01-04 14:03:35
($name,$department, $salary) = split /t/, $line;
should be
($name,$department, $salary) = split /\t/, $line;
the tab should be represtented by : \t between the / /
Thank you




RE: error in code (split function) written by mehdar:
error in code (split function)
writen by: mehdar on 2010-01-04 14:03:36
($name,$department, $salary) = split /t/, $line;
should be
($name,$department, $salary) = split /\t/, $line;
the tab should be represtented by : \t between the / /
Thank you




RE: error in code (split function) written by mehdar:
Another way to do it
writen by: imed on 2011-04-20 16:51:23
open (CLIENT,"employees.txt") or print "failed to open file !\n";
$count =0;
while ($line=)
{
#remove carriage
chomp $line;
@array = split /\t/,$line;
if ($array[2] > 25000 and (!($array[o] =~/^Mrs/)) and ($array[o] =~/^Mr/) and ($array[1] eq "R
RE: Another way to do it written by imed:
Another way to do it
writen by: imed on 2011-04-20 16:52:57

if ($array[2] > 25000 and (!($array[o] =~/^Mrs/)) and ($array[o] =~/^Mr/) and ($array[1] eq "R
RE: Another way to do it written by imed:

Comment title: * please do not put your response text here