Find the answer to your Linux question:
Results 1 to 3 of 3
hello everyone, I've been trying to figure out how to write this program but I can't get it yet, you'll see, I have this bunch of DNA sequences for which ...
  1. #1
    Just Joined!
    Join Date
    Aug 2006
    Location
    Mexico
    Posts
    17

    Bioinformatic challenge!! :)

    hello everyone,

    I've been trying to figure out how to write this program but I can't get it yet, you'll see, I have this bunch of DNA sequences for which I wanna analyze the frecuency of dinucleotides CG. I already have a program for that but i want to modify it so that i can chomp my sequence in smaller pieces and analize them individually. The program I have for taking the whole sequence is:

    #!/usr/bin/perl

    $ARGV[0] || die "Use cuenta.pl <archivo fasta>\n";
    $file = $ARGV[0];
    open (DAT, $file) || die ("No puedo abrir $file");
    while ( <DAT> ) {

    if ( />/) {
    print $_;
    }
    else
    {
    chomp;
    $cadena = $_;
    $numero = length ($cadena);
    $cuentag = ($cadena =~ s/G/G/gi);
    $cuentac = ($cadena =~ s/C/C/gi);
    $cuentagc = ($cadena =~ s/CG/CG/gi);
    $resultado = $cuentagc / (($cuentag * $cuentac) / $numero);
    print "obs/exp $resultado\n";
    }
    }
    close DAT;

    now, what i want to do is cut these large sequences in 10 smaller fragments and analyze each of them and all together, I'll try to explain briefly. Let's suppose there's a 10,000 bp sequence, so I want to have 10 pieces of 1,000 bp. then analyze fragment 1, and then from 1-2 and 1-3 and so on until i reach 9-10.

    All ideas welcome!! Thank you all very much!

  2. #2
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    Oh, FASTA. How I love thee.

    Assuming that your FASTA file is already in the format:
    Code:
    >TITLE
    < long line >
    > TITLE
    < long line >
    (that is, the actual sequence is not split across multiple lines), then in your 'else' statement, you can just keep a counter (incrementing by 1000 each time) and use a while loop, and take a substring of length 1000, and work on it, then take the next substring of length 1000, etc. Check the length of the returned string, and if it's less than 1000 (or if the returned string is undef), you know that you're done with that sequence.

    If you want to join particular ones together, then keep an array of each substring, then you can use an array slice with join to add certain substrings together.

    Let us know how it goes.
    DISTRO=Arch
    Registered Linux User #388732

  3. #3
    Just Joined!
    Join Date
    Aug 2006
    Location
    Mexico
    Posts
    17
    hi Cabhan,

    thanks for the advise, luckly I saw that there's this command called cpgplot in Emboss which does a sliding window and even graphs the observed / expected CpG frecuency. I might do what you suggest later anyway. Thanks a lot, have a nice day!

    rodpck

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...