Results 1 to 3 of 3
hello everyone,
I've been trying to figure out how to write this program but I can't get it yet, you'll see, I have this bunch of DNA sequences for which ...
- 05-03-2007 #1Just Joined!
- Join Date
- Aug 2006
- Location
- Mexico
- Posts
- 17
Bioinformatic challenge!! :)
hello everyone,
I've been trying to figure out how to write this program but I can't get it yet, you'll see, I have this bunch of DNA sequences for which I wanna analyze the frecuency of dinucleotides CG. I already have a program for that but i want to modify it so that i can chomp my sequence in smaller pieces and analize them individually. The program I have for taking the whole sequence is:
#!/usr/bin/perl
$ARGV[0] || die "Use cuenta.pl <archivo fasta>\n";
$file = $ARGV[0];
open (DAT, $file) || die ("No puedo abrir $file");
while ( <DAT> ) {
if ( />/) {
print $_;
}
else
{
chomp;
$cadena = $_;
$numero = length ($cadena);
$cuentag = ($cadena =~ s/G/G/gi);
$cuentac = ($cadena =~ s/C/C/gi);
$cuentagc = ($cadena =~ s/CG/CG/gi);
$resultado = $cuentagc / (($cuentag * $cuentac) / $numero);
print "obs/exp $resultado\n";
}
}
close DAT;
now, what i want to do is cut these large sequences in 10 smaller fragments and analyze each of them and all together, I'll try to explain briefly. Let's suppose there's a 10,000 bp sequence, so I want to have 10 pieces of 1,000 bp. then analyze fragment 1, and then from 1-2 and 1-3 and so on until i reach 9-10.
All ideas welcome!! Thank you all very much!
- 05-06-2007 #2
Oh, FASTA. How I love thee.
Assuming that your FASTA file is already in the format:
(that is, the actual sequence is not split across multiple lines), then in your 'else' statement, you can just keep a counter (incrementing by 1000 each time) and use a while loop, and take a substring of length 1000, and work on it, then take the next substring of length 1000, etc. Check the length of the returned string, and if it's less than 1000 (or if the returned string is undef), you know that you're done with that sequence.Code:>TITLE < long line > > TITLE < long line >
If you want to join particular ones together, then keep an array of each substring, then you can use an array slice with join to add certain substrings together.
Let us know how it goes.DISTRO=Arch
Registered Linux User #388732
- 05-07-2007 #3Just Joined!
- Join Date
- Aug 2006
- Location
- Mexico
- Posts
- 17
hi Cabhan,
thanks for the advise, luckly I saw that there's this command called cpgplot in Emboss which does a sliding window and even graphs the observed / expected CpG frecuency. I might do what you suggest later anyway. Thanks a lot, have a nice day!
rodpck


Reply With Quote