Find the answer to your Linux question:
Page 1 of 2 1 2 LastLast
Results 1 to 10 of 16
Hi folks, I am still new to Perl and trying to migrate out of Fortran for simple tasks, so if this question sounds silly there is 99.9% probability that it ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined! hernandeangelis's Avatar
    Join Date
    Mar 2005
    Location
    Planet Earth
    Posts
    79

    Reading and writing multicolumn text files with Perl


    Hi folks,

    I am still new to Perl and trying to migrate out of Fortran for simple tasks, so if this question sounds silly there is 99.9% probability that it really is. I already checked out Perl tutorials at www.perl.com but I could not find an answer for this. I have a multicoulmn file like this:

    number number number number
    number number number number
    number number number number
    .............. ............... ............... ...............
    number number number number

    which is the output of a certain program. I want to read the first column as four different variables, do some conditional sorting, calculations and then write the results in a new file. That would be a very simple task in Fortran, like:

    read (file) var1, var2, var3, var4

    and then

    write (file) varX, varY, varZ

    Unfortunately I can not figure out how to do that in Perl. As far as my poor brain can see I can either put the whole line into a scalar ($variable) or the whole file into an array (@variable) of individual lines. Can you please shed some light on this?

    Many thanks

    Hernan

  2. #2
    Linux Guru Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,252
    Well, I'm thinking read each line, push each column onto a separate array. Repeat until you have 4 arrays where the first element in each is the top element of each column.

    I'm also thinking regular expressions, something like:

    (.*)\s+(.*)\s+(.*)\s+(.*)


    Basically, the elements in each parenthesis set are now accessable outside of the regexp by $1, $2, $3, and $4. So you could simply set that regexp in a while loop ( while(<FILE>) ), and push each element onto a seperate array (@col1, @col2, @col3, @col4).

    Tell me what you think.

  3. #3
    scm
    scm is offline
    Linux Engineer
    Join Date
    Feb 2005
    Posts
    1,044

    Re: Reading and writing multicolumn text files with Perl

    Quote Originally Posted by hernandeangelis
    read (file) var1, var2, var3, var4

    and then

    write (file) varX, varY, varZ
    Well, going for the simplest approach, the Perl equivalent of the read would be:
    Code:
    while &#40;<>&#41;
    &#123;
        &#40;var1, var2, var3, var4&#41; = $_;
    &#125;
    You'll load the 4 variables for each line (the <> reads a line, the while loop ensures you'll suck in the whole file), so you'll need to process and write results within the loop. The writing can be dons by
    Code:
        print FH varX, varY, varZ;
    where FH is a previously opened filehandle. Not the absence of a comma separator after the FH value.

    HTH,
    Steve

  4. #4
    Just Joined! hernandeangelis's Avatar
    Join Date
    Mar 2005
    Location
    Planet Earth
    Posts
    79
    OK! Many thanks Cabhan and Steve!!! However I had troubles to implement your suggestions. I am either very stupid or Perl is not the best for what I want to do. I tried with Cabhan's as well as Steve's solutions. The programs and results/errors look like this:

    1. Cabhan's solution

    The program:

    open (INFILE, "xxxx");
    open (OUTFILE, ">yyyy");
    while (<INFILE>)
    {
    # OBS: my xxxx file contains actually 9 columns
    /((.*)\s)+((.*)\s)+((.*)\s)+((.*)\s)+((.*)\s)+((.*) \s)+((.*)\s)+((.*)\s)+((.*)\s)/;
    $xf = $1 + $6;
    $yf = $2 + $7;
    print OUTFILE $1, $2, $xf, $yf, "\n";
    }
    close INFILE;
    close OUTFILE;

    The output is garbage more or less like this:
    64 64 2.351 7.120 1 1.652 1.673 64 64 2.351 7.120 1 1.652 1.6736464
    64 89 2.712 6.000 1 2.079 1.741 64 89 2.712 6.000 1 2.079 1.7416464
    .................................................. .................................................. ..(up to EOF)

    where the first 7 numbers are those present in INFILE. The other 2 where apparently not loaded.

    2. Steve's solution

    The program is:

    open (INFILE, "imcorr.out");
    open (OUTFILE, ">imcorr.gmt");
    while (<INFILE>)
    {
    # OBS: my xxxx file contains actually 9 columns
    (v1, v2, v3, v4, v5, v6, v7, v8, v9) = $_;
    $xf = $v1 + $v6;
    $yf = $v2 + $v7;
    print OUTFILE $v1, $v2, $xf, $yf, "\n";
    }
    close INFILE;
    close OUTFILE;

    And the output is the error message at the konsole:

    Can't modify constant item in list assignment at ./im2gmt line 20, near "$_;"
    Execution of ./im2gmt aborted due to compilation errors.

    Well, I gave up for today, tomorrow will see. Thanks anyway guys!

    Hernan

  5. #5
    Linux Enthusiast
    Join Date
    Jan 2005
    Posts
    575
    Apologies for throwing this off topic but awk was made for jobs like that:
    Code:
    awk '&#123;print $1,$2,$1+$6,$2+$7&#125;' xxxx > yyyy

  6. #6
    Linux Guru Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,252
    Well, for Steve's solution (which is actually pretty crazy), you need your list to be:

    Code:
     &#40;$v1, $v2, $v3, $v4, $v5, $v6, $v7, $v8, $v9&#41; = $_;
    And for mine, the only thing I can maybe think of is eliminating the last "\s". The way you have it, the line will only match if there is a space at the end.

  7. #7
    Just Joined! hernandeangelis's Avatar
    Join Date
    Mar 2005
    Location
    Planet Earth
    Posts
    79
    OK guys! Thanks for your patience. I tell you what I did,

    As santaslittlehelper said awk was done for this kind of purpouses. However the problem is that my file involves both integers and floating point numbers. Here is a sample of my original file:

    64 64 2.351 7.120 1 1.652 1.673 0.059 0.180
    64 89 2.712 6.000 1 2.079 1.741 0.050 0.104
    64 114 2.357 11.226 1 1.660 1.674 0.096 0.106
    64 139 2.573 13.114 1 1.520 2.076 0.114 0.096

    I want to read 9 the values, then (if $5 == 1) then ($xf = $1+$6 and $xy = $2+$7) and finally: (print $1, $2, $xf, $xy) which is a four column file that is used by another program.

    I was trying today your advice and what I can say is that Steve's solution did not work in my case (I cannot rule out my inexperience). My awk solution worked, same as Cabhans's but both produced the following file:

    64 64 65 65

    64 89 66 90

    64 114 65 115

    64 139 65 141

    64 164 65 165

    That is, NO FLOATING POINT values at xf and xy !!! This is something that I do not understand in the case of awk. Perl is still a big question mark for me.

    Well, I am stuck for today guys. Thanks for your help if you still want to loose your time with this.

    Hernan

  8. #8
    Linux Enthusiast
    Join Date
    Jan 2005
    Posts
    575
    Trying the awk solution with the numbers you gave I get
    Code:
    64 64 65.652 65.673
    64 89 66.079 90.741
    64 114 65.66 115.674
    64 139 65.52 141.076
    It is certainly weird that you're not getting the same.Try something simple
    like gawk 'BEGIN{print 1.423+5.761}' and see what it gives you.

    What machine have you got by the way ?

  9. #9
    Linux Guru Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,252
    Well, with mine, I actually got nothing. So I did some fiddling, and I now get the following file with your test data:

    Code:
    64 64 65.652 65.673
    64 89 66.079 90.741
    64 114 65.66 115.674
    64 139 65.52 141.076
    I forgot one minor detail in the regexp, which is that .* will match ANYTHING. Also sadly, \d will not match a ., so decimals don't work. So I made my own character class, [0-9.], and used that. So my regexp line looks like:

    Code:
    /&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;/;
    Basically, that reads "At least one digit or period, then at least one space, repeat for nine columns."

    So my loop became:

    Code:
    while &#40;<INFILE>&#41;
    &#123;
    	/&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;\s+&#40;&#91;0-9.&#93;+&#41;/;
    	
    	my $xf = $1 + $6;
    	my $yf = $2 + $7;
    	
    	print OUTFILE "$1 $2 $xf $yf\n";
    &#125;
    So yeah.

  10. #10
    Just Joined! hernandeangelis's Avatar
    Join Date
    Mar 2005
    Location
    Planet Earth
    Posts
    79
    Thanks for the answers. If I execute from the konsole:

    gawk 'BEGIN{print 1.1 + 2.2}'

    I get

    3,3

    I have a weird stupid problem with the " , " instead of " . " as a decimal point and please do not tell me that I do not have it configured in the Control Center. I did but still did not get it to work. I guess is a problem of my Swedish keyboard but I do not think that this is the cause of the problem.

Page 1 of 2 1 2 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •