Find the answer to your Linux question:
Results 1 to 6 of 6
Hi everyone i am using a gawk script in a file called AwkTexComments (see below), and I am running it on a file by typing: Code: gawk -f AwkTexComments testAwkarticleRes.tex ...
  1. #1
    Just Joined!
    Join Date
    Nov 2010
    Posts
    15

    gawk file limitation

    Hi everyone
    i am using a gawk script in a file called AwkTexComments (see below), and I am running it on a file by typing:

    Code:
    gawk -f AwkTexComments testAwkarticleRes.tex
    The script works but only for a limited number of lines (around 100), any idea on how to make it work for the whole content of testAwkarticleRes.tex (which is 3000 lines)?

    Thanks

    Code:
    #          AwkTexComments
    # Remove comment line from tex file
    # usage:
    # gawk -f ~/awkfile file
    {
    FILEDELETED="supprime.txt";
    where = match($0, /\%/)
    if (where == 1){
         print $0 > FILEDELETED
        }
        else
        {
        print $0  > FILENAME
        }
    }
    Last edited by louisJ; 12-16-2010 at 08:59 PM.

  2. #2
    Just Joined!
    Join Date
    Nov 2010
    Posts
    15
    up!
    Is everybody that much afraid of awk?

  3. #3
    Linux Enthusiast
    Join Date
    Apr 2004
    Location
    UK
    Posts
    658
    Hi there,

    I suspect the lack on answers is because you didn't say how the script was failing. To get things moving I ran your example against a test file and I'm assuming you get the same problem I got.

    Specifically: when you start with a 3000 line file, the lines in the results files add up to significantly less than 3000 lines (I got 201 lines total).

    An educated guess says that gawk has a limited buffer size so will load all of your small file but only a portion of your large file. Once it starts reading out data it clobbers your source file, discovers it has read a greater or equal amount of data than the current size of the file and concludes it is finished.

    Change your script to send the data to two new files. You can then copy the keeper back over the source file if you really want to.

    Let us know how you get on.
    To be good, you must first be bad. "Newbie" is a rank, not a slight.

  4. #4
    Just Joined!
    Join Date
    Nov 2010
    Posts
    15
    Yes I suspected a buffer size problem but I think I was not accepting this from gawk.
    Isn't it a big limitation that gawk can't manage a big file as several blocks for its buffer size?

    Thank you very much kakariko81280.
    I rewrote the script for those interested.

    Code:
    # Remove comment line from tex file
    # usage:
    # gawk -f ~/thisawkfile file.tex
    # nota: same as sed '/^%.*/d' file.tex > new.tex
    
    {
    FILEDELETED="zsupprime.tex";
    FILENEW="zgood.tex";
    where = match($0, /\%/)
    if (where == 1){
    	# debug
    	#jojo=match($0, /Plumerault/)
    	#if ( jojo != 0){
    	#	print where	
    	#	print jojo
    	#	print $0
    	#	}
        print $0 > FILEDELETED
        }
        else
        {
        print $0  > FILENEW
        }
    }

  5. #5
    Linux Enthusiast
    Join Date
    Apr 2004
    Location
    UK
    Posts
    658
    Hi there,

    It isn't really a big limitation, your new script can now dig through comically large files without chewing up all of your RAM and swap space. The buffer is really just there to improve the performance of gawk.

    Most other scripting programs will behave similarly. Piping the output of grep directly back to the source file will leave you with an empty file every time.
    To be good, you must first be bad. "Newbie" is a rank, not a slight.

  6. #6
    Just Joined!
    Join Date
    Nov 2010
    Posts
    15
    I am quite new to linux, so I am discovering this. But ok I understand what you say.
    Thank you again for helping and shedding light.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...