Results 1 to 8 of 8
Hi all. I need some help developing a shell script.
I'm working in a chemistry research departement (undergrad) and I'm taking over where someone else left off when they graduated.
...
- 07-19-2007 #1Just Joined!
- Join Date
- Jul 2007
- Posts
- 15
shell scripting help
Hi all. I need some help developing a shell script.
I'm working in a chemistry research departement (undergrad) and I'm taking over where someone else left off when they graduated.
Here's the problem:
I have a file (called mol_names) that contains a list of molecule names and a property value. The format is the following:
train molecule #<number>: <name> IC50: <double value>
Here's an example:
train molecule #1: A01 IC50: 0.01
train molecule #2: A02 IC50: 2.38
train molecule #3: D30 IC50: 3.12
...
train molecule #n: C93 IC50: 0.12
where n is close to 200 in this case.
Now I have a set of databases that correspond to the names: database foo holds the files for A01, A02, ...; database bar holds the files for the D set, and database baz holds those of the C set, and so forth.
So if I do an ls ./foo I'd get
A01
A02
What I need to to is copy all the molecules listed in the mol_names text file into a new directory, called known_mols.
Since I'm new at shell scripting I'm not sure how to proceed. I'm thinking that I need to somehow cat the mol_names file, extract the name to a variable, say $name, pipe that into a find . -name $name then somehow pipe that into a cp $molecule dir/$molecule.
I don't know how to start though. Any help?
Thanks.
- 07-20-2007 #2
If I understand you correctly, you want to use a loop to run through the output of a <cat> command, and use <touch $name> to create the files.
Can't tell an OS by it's GUI
- 07-20-2007 #3Just Joined!
- Join Date
- Jul 2007
- Posts
- 15
That sounds like it could work. The tough point for me is putting the correct name into the $name variable. To use the following line from the file as an example:
train molecule #1: H07 IC50: 0.28
I want to do name=H07 so that echo $name yields H07. This would be done within the loop, along with finding and copying the file.
If I was using java I would split the string train molecule #1: H07 IC50: 0.28 around the white space and then assign the name variable to the contents of array[3]. Can something similar be done using bash?
After looking at the man pages I don't think that touch is what I need. I need to find the file named in the text file ($name = H07 in the exampe) which is located in one (or more) of several subdirs below the current working dir. Assuming I can set $name up correctly, I have a couple ideas on how to proceed:
1. Try a find . $name* to get the path to whatever file the name points to.
2. Do a ls -R | egrep $name
Beyond this I'm not sure. I'm thinking of somehow using cp but I don't know if I can pipe anything into the copy command.
- 07-20-2007 #4Ehm... There are lots of things you can do in Bash
Originally Posted by aliaseer 
Something like:
Code:for i in `awk '{print $4}' mol_names` ; do name=$i echo $name # testing precaution ;) #touch $path/$name doneI'm not quite sure I follow you here. But try it, and see if it works (or find out why it doesn't :P)
Originally Posted by aliaseer
Well, as long as you can redirect output right to your speakers I see no reason why you couldn't feed anything to a <cp> command. As long as it makes sense to the system.
Originally Posted by aliaseer Can't tell an OS by it's GUI
- 07-21-2007 #5Just Joined!
- Join Date
- Jul 2007
- Posts
- 15
Well, its done! Thanks a bunch for your help Freston.
I'm posting the code and asking for critiques. Since this may not be the appropriate venue for that, does anyone know of an online community that does code critiques and otherwise helps one develop their coding habits?
A couple improvements I'm thinking of already but don't know how to implement just yet:
1. Is there a better way to name the temp files that with the <date> command? Maybe something like a hash function? Is there a way to completely forego the use of the tmp files?
2. What other ways are there to implement the commandline args check?
3. A couple subdirs of the DB_DIR directory contain files with the same name (hence the <-n> flag for the <cp> command). So while copying into the NEW_DIR directory, how would one check if the current file is already in the directory, and if so append a number to the name?
I've also got an issue I'd like to clarify:
What's the difference between the tick mark (`) and the single quote ('). And what do the curly braces ({}) do? What is it that they signify and what do they accomplish under different contexts?
Thanks.
Code:#!/bin/bash #---------------- magic stuff ----------------# NAME_PATH=$1 DB_DIR=$2 NEW_DIR=$3 TMP_DIR=~/tmp TMP_FILE1=bar-$(date "+%y.%m.%d_%H.%M.%S") sleep 1 # sleeping one second so the tmp files have different names TMP_FILE2=bar-$(date "+%y.%m.%d_%H.%M.%S") NUM_PARAMS=3 SCRIPT_NAME=findm2 SUFFIX=.mol2 USAGE="$SCRIPT_NAME <molname file> <database top dir> <new dir> \twhere \t<molname file> is the file containing the names of the molecles to be found and copied. \t<database top dir> is the top level directory containing all the files to be checked. \t<new dir> is the directory into which the files are to be copied. If <new dir> does not exit, it will be created.' " #---------------- error codes ----------------# E_GOOD=0 E_BAD_USAGE=1 #---------------- check if params passed ----------------# echo "checking params" if [ ! $# == $NUM_PARAMS ]; then printf "$USAGE" exit $E_BAD_USAGE fi #---------------- check $NEW_DIR ----------------# echo "stat $NEW_DIR" if [ ! '{stat $NEW_DIR}' == 0 ];then echo "Creating $NEW_DIR" mkdir $NEW_DIR fi #---------------- find names ----------------# # this finds all the molecules listed in $NAME_PATH in the $DB_DIR # and prints theirs paths to a file echo "finding names in $NAME_PATH" for i in `awk '{print $4}' $NAME_PATH` ;do name=$i find $DB_DIR -name $name$SUFFIX | tee -a $TMP_DIR/$TMP_FILE1 done #---------------- sort the tmp file----------------# echo "sorting $TMP_DIR/$TMP_FILE1 into $TMP_DIR/$TMP_FILE2" cat $TMP_DIR/$TMP_FILE1 | sort > $TMP_DIR/$TMP_FILE2 #---------------- copy into $NEW_DIR ----------------# echo "copying" for j in `cat $TMP_DIR/$TMP_FILE2`; do echo "cp -n $j $NEW_DIR" cp -n $j $NEW_DIR done #---------------- delete tmp files ----------------# echo "removing $TMP_DIR/$TMP_FILE1 and $TMP_DIR/$TMP_FILE2" rm $TMP_DIR/$TMP_FILE1 rm $TMP_DIR/$TMP_FILE2 exit $E_GOOD
- 07-21-2007 #6Don't mention it
Originally Posted by aliaseer 
Good question! The move to the 'Programming&Scripting' is a first step,
Originally Posted by aliaseer
Critiques:
You're not keeping to some scripting conventions, but my bet is that's because you don't have enough commands at your disposal right now. You make due with what you do know. That's a good sign of intelligence
A good example of this is the way you define errorcodes. You didn't have to do it, because of the build in exit codes functionality. Look into that, you where very close. Still! It shows you're thinking right.
Your <sleep> command is an ugly hack. You're wasting your own time, that's a sin in some religions
. Even an `expr $S + 1` would be better. Although you may want to completely get rid of the tmp files altogether.
I don't really see why you <sort> from tmp1 to tmp2.
Something about the <tee> command:
But I'm wondering. First you look in mol_names, edit the output to get a clean_name.Code:find $DB_DIR -name $name$SUFFIX | tee -a $TMP_DIR/$TMP_FILE1 Can be: find $DB_DIR -name $name$SUFFIX 1> $TMP_DIR/$TMP_FILE1 (leaves out potential error messages)
Than you search a certain path to find the location of a file called clean_name. You store those locations in TMP_FILE1.
You <sort> the contents of TMP_FILE1 and store that in TMP_FILE2 (why?)
And then you search for files in DB_DIR and copy them to NEW_DIR. I think you can do this in one pass without to much difficulty.
Something like:
You'll need to tune it a little, but structurally it fits your intent I'd think.Code:Find name section (...) OLD_NAME=`find blabla 2> /dev/null` cp $OLD_NAME $NEW_DIR/$name done echo It's been done, my master exit 0
There are some excellent resources online. Here you'll find a lot of answers, and better worded than I could
Originally Posted by aliaseer
Can't tell an OS by it's GUI
- 07-21-2007 #7Linux User
- Join Date
- Aug 2006
- Posts
- 458
- 07-21-2007 #8Linux Enthusiast
- Join Date
- Aug 2006
- Posts
- 631
If the file exists you can add a number to the new filename like this (not tested
):
Code:#---------------- copy into $NEW_DIR ----------------# echo "copying" for j in `cat $TMP_DIR/$TMP_FILE2`; do i=0 while [ -e $NEW_DIR"/"$j ]; do j=$j"_"$i i=`expr $i + 1` done echo "cp $j $NEW_DIR" cp $j $NEW_DIR done
Regards


Reply With Quote
