Find the answer to your Linux question:
Results 1 to 4 of 4
hi everyone, i'm having a bit of trouble with a file that i want to split, i tried csplit but i just can't figure out the way to do it, ...
  1. #1
    Just Joined!
    Join Date
    Aug 2006
    Location
    Mexico
    Posts
    17

    Lightbulb cutting a file in lots of pieces

    hi everyone,

    i'm having a bit of trouble with a file that i want to split, i tried csplit but i just can't figure out the way to do it, i'm almost certan it's something easy but i'm no linux expert

    the file is like this:

    >m1
    AAAATTTATCAAGTCGTCGTGTCTTGATGAGAGGAACCCGACGTTTCTTA AATCCTAAATTGCACAGATTTCTTTATCTTTTTCCACATGCAACATTGTT GACATCTAGATTCTCGTTACAGCTAGATCTTTAAAAAAAACTGTAAAAAA AACCAAGCCCTTAGTCATACGCGCACGCCTCGTAGTTTTTTTTGGATGAT GATCTGATCAGGGAAAAAGAAACAGGAATTGGGGAGCAAAAATTCCAGGA TCTACAGGTGGTTGGCATAAACGAAATAACGATCTGAAAACAGTAACGGT TTCCCTTCTGACGATCTGACCCACAAAAATGCAGATTAAGCAGACCCACA TAAACGAAATAATGCGACTCTCGGCAGGGAGTCGCGCACATTCTGCCAAC CTGCCTGGTAGGG
    >m2
    TTAAAAAAAACTGTAAAAAAAACCAAGCCCTTAGTCATACGCGCACGCCT CGTAGTTTTTTTTGGATGATGATCTGATCAGGGA
    >m3
    AAGTCGTCGTGTCTTGATGAGAGGAACCCGACGTTTCTTAAATCCTAAAT TGCACAGATTTCAAAACCAAGCCCTTAGTCATACGCGCACGCCTCGTAGT TTTTTTTGGATGATGATCTGATCAGGGAAAAAGAAAC
    >m4
    TATCAAGTCGTCGTGTCTTGATGAGAGGAACCCGACGTTTCTTAAATCCT AAATTGCACAGATTTCTTTATCTTTTTCCACATGCAACATTGTTGACATC TAGATTCTCGTTACAGCTAGATCTTTAAAAAAAACTGTAAAAAAAACCAA GCCCTTAGTCATACGCGCACGCCTCGTAGTTTTTTTTGGATGATGATCTG ATCAGGGAAAAAGAAACAGGAATTGGGGAGCAAAAATTCCAGGATCTACA GGTGGTTGGCATAAACGAAATAACGATCTGAAAACAGTAACGGTTTCCCT TCT


    and so on....

    so i'd like to have all the different m's in single files instead of all the m's in only one. thank you very much for your help!

  2. #2
    Trusted Penguin Cabhan's Avatar
    Join Date
    Jan 2005
    Location
    Seattle, WA, USA
    Posts
    3,230
    Oh FASTA files...how I love thee.

    While there may be a program out there that does this, I recommend just using this Perl script. I wrote a thousand of these when I interned at NIH...
    Code:
    #!/usr/bin/perl
    
    use strict;
    
    my $in;
    my $fh;
    my $startfile = $ARGV[0];
    die "Usage: $0 <file to start with>\n" unless $startfile;
    
    open $in, "< $startfile" or die "$0: Cannot open $startfile for reading!\n";
    
    while(<$in>)
    {
        chomp;
    
        if(/^>/)
        {
            my($fname) = /^>(.+)$/;
            open $fh, "> $fname" or die "$0: Cannot open $fname for writing!\n";
        }
    
        print {$fh} "$_\n" if $fh;
    }
    This will split the file into individual files called m1, m2, etc.
    DISTRO=Arch
    Registered Linux User #388732

  3. #3
    Just Joined!
    Join Date
    Aug 2006
    Location
    Mexico
    Posts
    17
    thanks a lot Cabhan! so you were intern for NIH...nice! i'm working on my masters on DNA methylation, but in plants instead of humans, i suppose you might have seen bioinformatic work at the NIH for cancer right? that's cool. thanks again, the script works very nice.

  4. #4
    Linux User Dark_Stang's Avatar
    Join Date
    Jun 2006
    Location
    Around St. Louis
    Posts
    284
    You just gave me a flash back to Biology class. When I saw those letters I was like "That looks so familiar..."
    Two levels higher than a newb.
    (I can search google)

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...