Welcome to Linux Forums! With a comprehensive Linux Forum, information on various types of Linux software and many Linux Reviews articles, we have all the knowledge you need a click away, or accessible via our knowledgeable members.
Find the answer to your Linux question:
New to Linux Forums? Register here for free!
    Linux Forums > GNU Linux Zone > Linux Programming & Scripting > Reading and writing multicolumn text files with Perl

Forgot Password?
 Linux Programming & Scripting   C, Perl, PHP, Bash Scripts, anything programming or script related post in here!

Site Navigation
Linux Articles
Linux Forums
Linux Downloads
Linux Hosting
Free Magazines
Job Board
IRC Chat
RSS Feeds


Linux Forum Topics
Linux Forums
Your Distro
Linux Resources
GNU Linux Zone
The Community
Reply
 
Thread Tools Display Modes
Old 05-22-2005   #1 (permalink)
Just Joined!
 
hernandeangelis's Avatar
 
Join Date: Mar 2005
Location: Planet Earth
Posts: 79
Reading and writing multicolumn text files with Perl

Hi folks,

I am still new to Perl and trying to migrate out of Fortran for simple tasks, so if this question sounds silly there is 99.9% probability that it really is. I already checked out Perl tutorials at www.perl.com but I could not find an answer for this. I have a multicoulmn file like this:

number number number number
number number number number
number number number number
.............. ............... ............... ...............
number number number number

which is the output of a certain program. I want to read the first column as four different variables, do some conditional sorting, calculations and then write the results in a new file. That would be a very simple task in Fortran, like:

read (file) var1, var2, var3, var4

and then

write (file) varX, varY, varZ

Unfortunately I can not figure out how to do that in Perl. As far as my poor brain can see I can either put the whole line into a scalar ($variable) or the whole file into an array (@variable) of individual lines. Can you please shed some light on this?

Many thanks

Hernan
hernandeangelis is offline  


Reply With Quote
Old 05-22-2005   #2 (permalink)
Trusted Penguin
 
Cabhan's Avatar
 
Join Date: Jan 2005
Location: Boston, MA, USA
Posts: 2,691
Well, I'm thinking read each line, push each column onto a separate array. Repeat until you have 4 arrays where the first element in each is the top element of each column.

I'm also thinking regular expressions, something like:

(.*)\s+(.*)\s+(.*)\s+(.*)


Basically, the elements in each parenthesis set are now accessable outside of the regexp by $1, $2, $3, and $4. So you could simply set that regexp in a while loop ( while(<FILE>) ), and push each element onto a seperate array (@col1, @col2, @col3, @col4).

Tell me what you think.
__________________
DISTRO=Gentoo
Registered Linux User #388732
Gentoo Linux, 410 GB HD, 1.2 GB RAM, Fluxbox, These are a Few of my Favorite Things
Cabhan is offline   Reply With Quote
Old 05-22-2005   #3 (permalink)
scm
Linux Engineer
 
Join Date: Feb 2005
Posts: 1,004
Re: Reading and writing multicolumn text files with Perl

Quote:
Originally Posted by hernandeangelis
read (file) var1, var2, var3, var4

and then

write (file) varX, varY, varZ
Well, going for the simplest approach, the Perl equivalent of the read would be:
Code:
while (<>)
{
    (var1, var2, var3, var4) = $_;
}
You'll load the 4 variables for each line (the <> reads a line, the while loop ensures you'll suck in the whole file), so you'll need to process and write results within the loop. The writing can be dons by
Code:
    print FH varX, varY, varZ;
where FH is a previously opened filehandle. Not the absence of a comma separator after the FH value.

HTH,
Steve
scm is offline   Reply With Quote
Old 05-22-2005   #4 (permalink)
Just Joined!
 
hernandeangelis's Avatar
 
Join Date: Mar 2005
Location: Planet Earth
Posts: 79
OK! Many thanks Cabhan and Steve!!! However I had troubles to implement your suggestions. I am either very stupid or Perl is not the best for what I want to do. I tried with Cabhan's as well as Steve's solutions. The programs and results/errors look like this:

1. Cabhan's solution

The program:

open (INFILE, "xxxx");
open (OUTFILE, ">yyyy");
while (<INFILE>)
{
# OBS: my xxxx file contains actually 9 columns
/((.*)\s)+((.*)\s)+((.*)\s)+((.*)\s)+((.*)\s)+((.*) \s)+((.*)\s)+((.*)\s)+((.*)\s)/;
$xf = $1 + $6;
$yf = $2 + $7;
print OUTFILE $1, $2, $xf, $yf, "\n";
}
close INFILE;
close OUTFILE;

The output is garbage more or less like this:
64 64 2.351 7.120 1 1.652 1.673 64 64 2.351 7.120 1 1.652 1.6736464
64 89 2.712 6.000 1 2.079 1.741 64 89 2.712 6.000 1 2.079 1.7416464
.................................................. .................................................. ..(up to EOF)

where the first 7 numbers are those present in INFILE. The other 2 where apparently not loaded.

2. Steve's solution

The program is:

open (INFILE, "imcorr.out");
open (OUTFILE, ">imcorr.gmt");
while (<INFILE>)
{
# OBS: my xxxx file contains actually 9 columns
(v1, v2, v3, v4, v5, v6, v7, v8, v9) = $_;
$xf = $v1 + $v6;
$yf = $v2 + $v7;
print OUTFILE $v1, $v2, $xf, $yf, "\n";
}
close INFILE;
close OUTFILE;

And the output is the error message at the konsole:

Can't modify constant item in list assignment at ./im2gmt line 20, near "$_;"
Execution of ./im2gmt aborted due to compilation errors.

Well, I gave up for today, tomorrow will see. Thanks anyway guys!

Hernan
hernandeangelis is offline   Reply With Quote
Old 05-22-2005   #5 (permalink)
Linux Enthusiast
 
Join Date: Jan 2005
Posts: 575
Apologies for throwing this off topic but awk was made for jobs like that:
Code:
awk '{print $1,$2,$1+$6,$2+$7}' xxxx > yyyy
Santa's little helper is offline   Reply With Quote
Old 05-23-2005   #6 (permalink)
Trusted Penguin
 
Cabhan's Avatar
 
Join Date: Jan 2005
Location: Boston, MA, USA
Posts: 2,691
Well, for Steve's solution (which is actually pretty crazy), you need your list to be:

Code:
 ($v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9) = $_;
And for mine, the only thing I can maybe think of is eliminating the last "\s". The way you have it, the line will only match if there is a space at the end.
__________________
DISTRO=Gentoo
Registered Linux User #388732
Gentoo Linux, 410 GB HD, 1.2 GB RAM, Fluxbox, These are a Few of my Favorite Things
Cabhan is offline   Reply With Quote
Old 05-23-2005   #7 (permalink)
Just Joined!
 
hernandeangelis's Avatar
 
Join Date: Mar 2005
Location: Planet Earth
Posts: 79
OK guys! Thanks for your patience. I tell you what I did,

As santaslittlehelper said awk was done for this kind of purpouses. However the problem is that my file involves both integers and floating point numbers. Here is a sample of my original file:

64 64 2.351 7.120 1 1.652 1.673 0.059 0.180
64 89 2.712 6.000 1 2.079 1.741 0.050 0.104
64 114 2.357 11.226 1 1.660 1.674 0.096 0.106
64 139 2.573 13.114 1 1.520 2.076 0.114 0.096

I want to read 9 the values, then (if $5 == 1) then ($xf = $1+$6 and $xy = $2+$7) and finally: (print $1, $2, $xf, $xy) which is a four column file that is used by another program.

I was trying today your advice and what I can say is that Steve's solution did not work in my case (I cannot rule out my inexperience). My awk solution worked, same as Cabhans's but both produced the following file:

64 64 65 65

64 89 66 90

64 114 65 115

64 139 65 141

64 164 65 165

That is, NO FLOATING POINT values at xf and xy !!! This is something that I do not understand in the case of awk. Perl is still a big question mark for me.

Well, I am stuck for today guys. Thanks for your help if you still want to loose your time with this.

Hernan
hernandeangelis is offline   Reply With Quote
Old 05-23-2005   #8 (permalink)
Linux Enthusiast
 
Join Date: Jan 2005
Posts: 575
Trying the awk solution with the numbers you gave I get
Code:
64 64 65.652 65.673
64 89 66.079 90.741
64 114 65.66 115.674
64 139 65.52 141.076
It is certainly weird that you're not getting the same.Try something simple
like gawk 'BEGIN{print 1.423+5.761}' and see what it gives you.

What machine have you got by the way ?
Santa's little helper is offline   Reply With Quote
Old 05-24-2005   #9 (permalink)
Trusted Penguin
 
Cabhan's Avatar
 
Join Date: Jan 2005
Location: Boston, MA, USA
Posts: 2,691
Well, with mine, I actually got nothing. So I did some fiddling, and I now get the following file with your test data:

Code:
64 64 65.652 65.673
64 89 66.079 90.741
64 114 65.66 115.674
64 139 65.52 141.076
I forgot one minor detail in the regexp, which is that .* will match ANYTHING. Also sadly, \d will not match a ., so decimals don't work. So I made my own character class, [0-9.], and used that. So my regexp line looks like:

Code:
/([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)/;
Basically, that reads "At least one digit or period, then at least one space, repeat for nine columns."

So my loop became:

Code:
while (<INFILE>)
{
	/([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)\s+([0-9.]+)/;
	
	my $xf = $1 + $6;
	my $yf = $2 + $7;
	
	print OUTFILE "$1 $2 $xf $yf\n";
}
So yeah.
__________________
DISTRO=Gentoo
Registered Linux User #388732
Gentoo Linux, 410 GB HD, 1.2 GB RAM, Fluxbox, These are a Few of my Favorite Things
Cabhan is offline   Reply With Quote
Old 05-24-2005   #10 (permalink)
Just Joined!
 
hernandeangelis's Avatar
 
Join Date: Mar 2005
Location: Planet Earth
Posts: 79
Thanks for the answers. If I execute from the konsole:

gawk 'BEGIN{print 1.1 + 2.2}'

I get

3,3

I have a weird stupid problem with the " , " instead of " . " as a decimal point and please do not tell me that I do not have it configured in the Control Center. I did but still did not get it to work. I guess is a problem of my Swedish keyboard but I do not think that this is the cause of the problem.
hernandeangelis is offline   Reply With Quote
Reply


Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are Off
Pingbacks are Off
Refbacks are Off

Free Magazines
Run Your Own Web Server Using Linux & Apache - Free 191 Page Preview
Learn about everything you'll need to build and maintain your Linux servers, and to deploy Web applications to them.
subscribe
Open Source Security Myths Dispelled
Dispel the five major myths surrounding Open Source Security and gain the tools necessary to make a truly informed decision for your IT organization
subscribe
InformationWeek
InformationWeek is the only newsweekly you'll need to stay on top of the latest developments in information technology.
subscribe



All times are GMT. The time now is 04:54 PM.






© 2000 - 2009 - All Rights Reserved - Property of  MAS Media

Content Relevant URLs by vBSEO 3.3.0 RC2