Find the answer to your Linux question:
Results 1 to 3 of 3
Hello once again. I'm actually working on a genetics project in flies. I've created a script that pulls out genome coordinates (essentially character positions in a 23 million character-long file). ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Feb 2013
    Posts
    19

    Running sed on HUGE strings -- error message


    Hello once again.

    I'm actually working on a genetics project in flies. I've created a script that pulls out genome coordinates (essentially character positions in a 23 million character-long file). I want to exclude these regions from analysis and replace the sequences I don't want with a string of repeated "x's" of the same length to mark those regions.

    Using bash, this is the script I came up with:

    Code:
    #!/usr/bash
    
    coords=`grep -v -- "--" *clean | sed 's/;/-/' | grep 2L | sed 's/2L://'`
    
    for coord in $coords
        do
    
            sequ=`cut -c$coord ./genome/2L.raw`
            exes=`echo $sequ | sed 's/./x/g'`
            sed "s/$sequ/$exes/" 2L.test
    
        done
    Essentially, I take the range of characters I don't want (the coords, i.e., coordinates), cut them out of the large text file, and assign the corresponding sub-string to the variable name "sequ". Using sed, I create a second variable of equal length of all "x's" by using global substitution; I call this variable "exes".

    So to solve my problem, all I need to do is replace the unwanted sub-strings with the sub-strings composed of all "x's" in my original very large text file.

    Unfortunately, I get an "Argument list too long" error for every completion of the for loop. I think this has something to do with the length of the substrings -- they are often tens or hundreds of thousands of characters long.

    I've echoed $sequ and $exes and they both produce long substrings as expected, the latter being all "x's" of the same length. Why then isn't sed swapping them?

  2. #2
    tpl
    tpl is offline
    Linux User
    Join Date
    Jan 2007
    Location
    cleveland
    Posts
    478
    interesting problem:

    Linux / UNIX : Argument list too long error in shell and solution

    The shell can hold a maximum of 131072 bytes for command line
    arguments. If you try to pass more than that number you will greeted
    with the "Argument list too long" error.

    How To Find and Overcome Shell Command Line Length Limitations

    How To Find and Overcome Shell Command Line Length Limitations
    the sun is new every day (heraclitus)

  3. #3
    Just Joined!
    Join Date
    Feb 2013
    Posts
    19
    I decided to cut --complement the sequences I don't want out of the genome instead of generating huge strings to replace. I'll lose the positional data but I don't have much choice ... I can use grep or a DNA alignment tool to show me where the sequence originally belongs. Thanks anyway ....

  4. $spacer_open
    $spacer_close

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •