Running sed on HUGE strings -- error message
Hello once again.
I'm actually working on a genetics project in flies. I've created a script that pulls out genome coordinates (essentially character positions in a 23 million character-long file). I want to exclude these regions from analysis and replace the sequences I don't want with a string of repeated "x's" of the same length to mark those regions.
Using bash, this is the script I came up with:
Essentially, I take the range of characters I don't want (the coords, i.e., coordinates), cut them out of the large text file, and assign the corresponding sub-string to the variable name "sequ". Using sed, I create a second variable of equal length of all "x's" by using global substitution; I call this variable "exes".
coords=`grep -v -- "--" *clean | sed 's/;/-/' | grep 2L | sed 's/2L://'`
for coord in $coords
sequ=`cut -c$coord ./genome/2L.raw`
exes=`echo $sequ | sed 's/./x/g'`
sed "s/$sequ/$exes/" 2L.test
So to solve my problem, all I need to do is replace the unwanted sub-strings with the sub-strings composed of all "x's" in my original very large text file.
Unfortunately, I get an "Argument list too long" error for every completion of the for loop. I think this has something to do with the length of the substrings -- they are often tens or hundreds of thousands of characters long.
I've echoed $sequ and $exes and they both produce long substrings as expected, the latter being all "x's" of the same length. Why then isn't sed swapping them?