Results 1 to 7 of 7
I took a great many photos of the 2006, 2007 and 2008 Belfast marathons, 1400+ for the 2008 marathon. The marathon goes past my back gate and the photography is ...
- 05-17-2008 #1Just Joined!
- Join Date
- Feb 2005
- Posts
- 11
A wee bit of help with gawk, please
I took a great many photos of the 2006, 2007 and 2008 Belfast marathons, 1400+ for the 2008 marathon. The marathon goes past my back gate and the photography is for fun not reward.
I have gone through a lot of these photo and, to the name of the photo, I have added the runners' shirt numbers and the organisations that they were 'running for', e.g. P5050006 809 samaratans x3760.JPG
( photo#, runner#, orginisation, runner#)
I have managed to contact several of these organisations to see if they would pass on photos to the runners etc.
I want to create spreadsheets listing;
1) the runner appearing in each photo, and
2) the photos in which each runner appears.
This is actually why I reisntalled linux as a few years back I wrote some simple awk/gawk programmes and knew I could do this, or most of this myself, whoops, I have forgotten more than I thought.
I have written the script for part 1 and for the above example its output would be
P5050006 809 x3760
plus some for use later on in this post
P5050007 3
P5050008 508 605 809
P5050009 57 1000
P5050010 124 596 7545 809
but I am struggling with regard to 2).
Say the output of the script mentioned above is written to a file "trial", I want to read or scan "trial" and for each mentioned runner print out, in file "trial2", a line listing all the photos in which they are mentioned.
So for the above group of lines in "trial" I would get something like
3 P5050007
57 P5050009
124 P5050010
596 P5050010
605 P5050008
809 P5050006 P5050008 P5050010
1000 P5050009
etc
currently I am half way there via creating an "associative array"? with the code,
gawk ' { for (j=2; j <= NF; j++) {
rpn[$j] = rpn[$j] sprintf("%s\t", $1)
printf("%s\t %s\t %s\n",$j,(nopfr[$j] +1),rpn[ $j ])
nopfr[$j] = nopfr[$j] + 1
}
} ' /home/sean/trial > /home/sean/trial2
excuse any layout errors the layout doesn't survive the posting.
The obvious problem being that for, for example, runner 809 I get the following lines in "trial2"
809 1 P5050006
809 2 P5050006 P5050008
809 3 P5050006 P5050008 P5050010
when all I want is "809 P5050006 P5050008 P5050010" i.e. the 'value' of rpn[$j] for the last instance of runner $j (ignore the nopfr[$j bit that's just for my seeing what is happening).
I am certain this can be done and simply done but I have forgotten how at the moment and the spiders are valiantly fighting to keep the dust on my awk manual, so any help would be gratefully received by them and me.
- 05-17-2008 #2Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044
If you're going for a Linux scripting solution, the first thing you should do is get rid of those spaces in your filenames - you'll probably struggle to cope with them. Use underscores instead to separate the parts.
- 05-17-2008 #3Just Joined!
- Join Date
- Feb 2005
- Posts
- 11
Thanks, I know that in the general case the spaces in the original file name could/would be a problem but in this case they are not.
I should have said that in this case, under linux, to generate input to all this I just "ls" the original windows folder/directory containing the photos ( on a windows partition) and output the result to a file in my home directory. This file is then just a text file where the problematic gaps become the field seperators betwen photo number and runners' shirt numbers etc.
- 05-17-2008 #4
I'm not much of an awk expert, but one idea would be to create an array with the numbers of the runners you've added. Then, before you add a line for a runner, search that array to see if you've already added him/her. If so, then search for that line, remove it, add the new photo number to the end, and put that runner back into the file.
Or....
Each time you add a runner, go back and search all the previous entries. If you find him/her already mentioned , then remove that line and continue.
Like I said, I'm not an awk expert, but there's bound to be a way to do this. I hope I was of some help.
The good ol' Belfast marathon. I remember watching that pass by my street on the North Road in East Belfast, and trying to collect as many sponges as possible. It's weird thinking about it -- why did a bunch of kids want to collect all the nasty sponges the runners used to cool themselves off? Pretty disgusting now that I think about it :S Kids'll compete over anything I suppose
Registered Linux user #388328 || Registered LFS user #15880
AMD 64 X2 4600+ :: 2X1GB DDR2 800 :: GeForce 9400 GT 512MB :: ASUS M2N32 Deluxe :: 4X250GB SATAII
Need instant help? Try us on IRC -- #linuxforums on freenode
- 05-19-2008 #5Just Joined!
- Join Date
- Feb 2005
- Posts
- 11
smolloy, thanks for your reply I have been trying to 'picture' how to use your suggested route but couldn't and I didnt want to reply until I had figured it out or not.
It turns out theres a very easy, once you know how, answer, again excuse the format
gawk ' {
for (j=2; j <= NF; j++) {
rpn[$j] = rpn[$j] sprintf("%s\t", $1)
}
}
END {
for (p in rpn) {
printf("%s\t %s\n",p,rpn[ p ])
}
} '
My understanding is; the array "rpn" exists within between the apostrophies ' and ' . The loop ending just before "END" finds the photos in which a runner is noted and sequential adds those photo numbers to the element of "rpn" that is addressed by the runners' number. The loop starting "END" finds all the elements of the array "rpn" and for each element outputs its 'value' to a line starting with the index/runner number
- 05-19-2008 #6
Well done. That's a much cleaner solution than the piece of crap I tried to explain in my previous post.
Registered Linux user #388328 || Registered LFS user #15880
AMD 64 X2 4600+ :: 2X1GB DDR2 800 :: GeForce 9400 GT 512MB :: ASUS M2N32 Deluxe :: 4X250GB SATAII
Need instant help? Try us on IRC -- #linuxforums on freenode
- 05-19-2008 #7Just Joined!
- Join Date
- Feb 2005
- Posts
- 11
Oh sadly I didnt come up with the solution, someone else did, but it is neat.


Reply With Quote