Find the answer to your Linux question:
Results 1 to 8 of 8
Hi all. I need some help developing a shell script. I'm working in a chemistry research departement (undergrad) and I'm taking over where someone else left off when they graduated. ...
  1. #1
    Just Joined!
    Join Date
    Jul 2007
    Posts
    15

    Lightbulb shell scripting help

    Hi all. I need some help developing a shell script.
    I'm working in a chemistry research departement (undergrad) and I'm taking over where someone else left off when they graduated.
    Here's the problem:

    I have a file (called mol_names) that contains a list of molecule names and a property value. The format is the following:
    train molecule #<number>: <name> IC50: <double value>

    Here's an example:
    train molecule #1: A01 IC50: 0.01
    train molecule #2: A02 IC50: 2.38
    train molecule #3: D30 IC50: 3.12

    ...
    train molecule #n: C93 IC50: 0.12
    where n is close to 200 in this case.

    Now I have a set of databases that correspond to the names: database foo holds the files for A01, A02, ...; database bar holds the files for the D set, and database baz holds those of the C set, and so forth.
    So if I do an ls ./foo I'd get
    A01
    A02


    What I need to to is copy all the molecules listed in the mol_names text file into a new directory, called known_mols.

    Since I'm new at shell scripting I'm not sure how to proceed. I'm thinking that I need to somehow cat the mol_names file, extract the name to a variable, say $name, pipe that into a find . -name $name then somehow pipe that into a cp $molecule dir/$molecule.

    I don't know how to start though. Any help?

    Thanks.

  2. #2
    Linux Engineer Freston's Avatar
    Join Date
    Mar 2007
    Location
    The Netherlands
    Posts
    1,047
    If I understand you correctly, you want to use a loop to run through the output of a <cat> command, and use <touch $name> to create the files.
    Can't tell an OS by it's GUI

  3. #3
    Just Joined!
    Join Date
    Jul 2007
    Posts
    15
    Quote Originally Posted by Freston View Post
    If I understand you correctly, you want to use a loop to run through the output of a <cat> command, and use <touch $name> to create the files.
    That sounds like it could work. The tough point for me is putting the correct name into the $name variable. To use the following line from the file as an example:
    train molecule #1: H07 IC50: 0.28
    I want to do name=H07 so that echo $name yields H07. This would be done within the loop, along with finding and copying the file.

    If I was using java I would split the string train molecule #1: H07 IC50: 0.28 around the white space and then assign the name variable to the contents of array[3]. Can something similar be done using bash?

    Quote Originally Posted by Freston View Post
    use <touch $name>
    After looking at the man pages I don't think that touch is what I need. I need to find the file named in the text file ($name = H07 in the exampe) which is located in one (or more) of several subdirs below the current working dir. Assuming I can set $name up correctly, I have a couple ideas on how to proceed:

    1. Try a find . $name* to get the path to whatever file the name points to.

    2. Do a ls -R | egrep $name

    Beyond this I'm not sure. I'm thinking of somehow using cp but I don't know if I can pipe anything into the copy command.

  4. #4
    Linux Engineer Freston's Avatar
    Join Date
    Mar 2007
    Location
    The Netherlands
    Posts
    1,047
    Quote Originally Posted by aliaseer
    If I was using java I would split the string train molecule #1: H07 IC50: 0.28 around the white space and then assign the name variable to the contents of array[3]. Can something similar be done using bash?
    Ehm... There are lots of things you can do in Bash

    Something like:
    Code:
    for i in `awk '{print $4}' mol_names` ; do
       name=$i
       echo $name # testing precaution ;)
       #touch $path/$name
    done
    Quote Originally Posted by aliaseer
    After looking at the man pages I don't think that touch is what I need. I need to find the file named in the text file ($name = H07 in the exampe) which is located in one (or more) of several subdirs below the current working dir. Assuming I can set $name up correctly, I have a couple ideas on how to proceed:

    1. Try a find . $name* to get the path to whatever file the name points to.

    2. Do a ls -R | egrep $name
    I'm not quite sure I follow you here. But try it, and see if it works (or find out why it doesn't :P)

    Quote Originally Posted by aliaseer
    Beyond this I'm not sure. I'm thinking of somehow using cp but I don't know if I can pipe anything into the copy command.
    Well, as long as you can redirect output right to your speakers I see no reason why you couldn't feed anything to a <cp> command. As long as it makes sense to the system.
    Can't tell an OS by it's GUI

  5. #5
    Just Joined!
    Join Date
    Jul 2007
    Posts
    15
    Well, its done! Thanks a bunch for your help Freston.
    I'm posting the code and asking for critiques. Since this may not be the appropriate venue for that, does anyone know of an online community that does code critiques and otherwise helps one develop their coding habits?

    A couple improvements I'm thinking of already but don't know how to implement just yet:
    1. Is there a better way to name the temp files that with the <date> command? Maybe something like a hash function? Is there a way to completely forego the use of the tmp files?
    2. What other ways are there to implement the commandline args check?
    3. A couple subdirs of the DB_DIR directory contain files with the same name (hence the <-n> flag for the <cp> command). So while copying into the NEW_DIR directory, how would one check if the current file is already in the directory, and if so append a number to the name?

    I've also got an issue I'd like to clarify:
    What's the difference between the tick mark (`) and the single quote ('). And what do the curly braces ({}) do? What is it that they signify and what do they accomplish under different contexts?

    Thanks.

    Code:
    #!/bin/bash
    
    #---------------- magic stuff ----------------#
    NAME_PATH=$1
    DB_DIR=$2
    NEW_DIR=$3
    TMP_DIR=~/tmp
    TMP_FILE1=bar-$(date "+%y.%m.%d_%H.%M.%S")
    sleep 1 # sleeping one second so the tmp files have different names
    TMP_FILE2=bar-$(date "+%y.%m.%d_%H.%M.%S")
    NUM_PARAMS=3
    SCRIPT_NAME=findm2
    SUFFIX=.mol2
    
    USAGE="$SCRIPT_NAME <molname file> <database top dir> <new dir>
    \twhere
    \t<molname file> is the file containing the names of the molecles to be found and copied.
    \t<database top dir> is the top level directory containing all the  files to be checked.
    \t<new dir> is the directory into which the files are to be copied. If <new dir> does not exit, it will be created.'
    "
    		
    #---------------- error codes ----------------#
    E_GOOD=0
    E_BAD_USAGE=1
    
    
    #---------------- check if params passed ----------------#
    echo "checking params"
    if [ ! $# == $NUM_PARAMS ]; then
    	printf "$USAGE"
    	exit $E_BAD_USAGE
    fi
    	
    #---------------- check $NEW_DIR ----------------#
    echo "stat $NEW_DIR"
    if [ ! '{stat $NEW_DIR}' == 0 ];then
    	echo "Creating $NEW_DIR"
    	mkdir $NEW_DIR
    fi
    
    #---------------- find names ----------------#
    # this finds all the molecules listed in $NAME_PATH in the $DB_DIR 
    #	and prints theirs paths to a file
    echo "finding names in $NAME_PATH"
    for i in `awk '{print $4}' $NAME_PATH` ;do
    	name=$i
    	find $DB_DIR -name $name$SUFFIX | tee -a $TMP_DIR/$TMP_FILE1
    done
    
    #---------------- sort the tmp file----------------#
    echo "sorting $TMP_DIR/$TMP_FILE1 into $TMP_DIR/$TMP_FILE2"
    cat $TMP_DIR/$TMP_FILE1 | sort > $TMP_DIR/$TMP_FILE2
    
    #---------------- copy into $NEW_DIR ----------------#
    echo "copying"
    for j in `cat $TMP_DIR/$TMP_FILE2`; do
    	echo "cp -n $j $NEW_DIR"
    	cp -n $j $NEW_DIR
    done
    
    #---------------- delete tmp files ----------------#
    echo "removing $TMP_DIR/$TMP_FILE1 and $TMP_DIR/$TMP_FILE2"
    rm $TMP_DIR/$TMP_FILE1
    rm $TMP_DIR/$TMP_FILE2
    
    
    exit $E_GOOD

  6. #6
    Linux Engineer Freston's Avatar
    Join Date
    Mar 2007
    Location
    The Netherlands
    Posts
    1,047
    Quote Originally Posted by aliaseer
    Well, its done! Thanks a bunch for your help Freston.
    Don't mention it

    Quote Originally Posted by aliaseer
    I'm posting the code and asking for critiques. Since this may not be the appropriate venue for that, does anyone know of an online community that does code critiques and otherwise helps one develop their coding habits?
    Good question! The move to the 'Programming&Scripting' is a first step,

    Critiques:
    You're not keeping to some scripting conventions, but my bet is that's because you don't have enough commands at your disposal right now. You make due with what you do know. That's a good sign of intelligence

    A good example of this is the way you define errorcodes. You didn't have to do it, because of the build in exit codes functionality. Look into that, you where very close. Still! It shows you're thinking right.

    Your <sleep> command is an ugly hack. You're wasting your own time, that's a sin in some religions . Even an `expr $S + 1` would be better. Although you may want to completely get rid of the tmp files altogether.

    I don't really see why you <sort> from tmp1 to tmp2.

    Something about the <tee> command:
    Code:
    find $DB_DIR -name $name$SUFFIX | tee -a $TMP_DIR/$TMP_FILE1
    
    Can be:
    
    find $DB_DIR -name $name$SUFFIX 1> $TMP_DIR/$TMP_FILE1
    (leaves out potential error messages)
    But I'm wondering. First you look in mol_names, edit the output to get a clean_name.
    Than you search a certain path to find the location of a file called clean_name. You store those locations in TMP_FILE1.
    You <sort> the contents of TMP_FILE1 and store that in TMP_FILE2 (why?)
    And then you search for files in DB_DIR and copy them to NEW_DIR. I think you can do this in one pass without to much difficulty.

    Something like:
    Code:
    Find name section
    (...)
    OLD_NAME=`find blabla 2> /dev/null`
    cp $OLD_NAME $NEW_DIR/$name
    done
    echo It's been done, my master
    exit 0
    You'll need to tune it a little, but structurally it fits your intent I'd think.

    Quote Originally Posted by aliaseer
    What's the difference between the tick mark (`) and the single quote ('). And what do the curly braces ({}) do? What is it that they signify and what do they accomplish under different contexts?
    There are some excellent resources online. Here you'll find a lot of answers, and better worded than I could
    Can't tell an OS by it's GUI

  7. #7
    Linux User
    Join Date
    Aug 2006
    Posts
    458
    Quote Originally Posted by Freston View Post
    Ehm... There are lots of things you can do in Bash
    Code:
    for i in `awk '{print $4}' mol_names` ; do
       name=$i
       echo $name # testing precaution ;)
       #touch $path/$name
    done
    and there's a lot of things you can do in awk too.
    Code:
    path="/home"
    awk -v path=$path '{ 
      cmd= "touch "path"/"$4
      cmd | getline
      close(cmd)
    }' "mol_names"

  8. #8
    Linux Enthusiast
    Join Date
    Aug 2006
    Posts
    631
    Quote Originally Posted by aliaseer View Post

    3. A couple subdirs of the DB_DIR directory contain files with the same name (hence the <-n> flag for the <cp> command). So while copying into the NEW_DIR directory, how would one check if the current file is already in the directory, and if so append a number to the name?

    If the file exists you can add a number to the new filename like this (not tested ):

    Code:
    #---------------- copy into $NEW_DIR ----------------#
    echo "copying"
    for j in `cat $TMP_DIR/$TMP_FILE2`; do
    	i=0
    	while [ -e $NEW_DIR"/"$j ]; do
    		j=$j"_"$i
    		i=`expr $i + 1`
    	done
    	echo "cp $j $NEW_DIR"
    	cp $j $NEW_DIR
    done

    Regards

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  
...