Results 1 to 10 of 18
I have a fileserver running FC4 that stores CAD files (about 40,000 ifiles) for an engineering group. This server is simply a mirror to our main file server in another ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
- 08-05-2005 #1Just Joined!
- Join Date
- Jun 2005
- Posts
- 47
rsh problems
I have a fileserver running FC4 that stores CAD files (about 40,000 ifiles) for an engineering group. This server is simply a mirror to our main file server in another state so we have to sync the files every few hours so that things work correctly. To do this, we use a shell script that uses rsh and rcp to collect a list of files and copy them over to the local fileserver. This setup was working flawlessly with FC3, but every since I installed FC4 the script hangs (rsh process becomes defunct) at the first rsh command every time it is launched by crond. The weird thing is that I can manually launch the script and it works without a problem.
Does anyone have any ideas what could be causing this? I need to get it fixed because I'm having to run the updates manually right now and its starting to get old.
- 08-06-2005 #2Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044
It sounds like the rsh process is terminating for some reason and its parent isn't waiting for it (hence the defunct status). Are you getting any diagnostics in the logs, or emails from cron, that might give a clue as to what's happening? There may be something in your environment that enables you to run the script successfully that cron doesn't have. If there is, find it, and you're home and dry!
- 08-06-2005 #3
Sorry to be curt, but don't use rsh / rcp in the first place. There are obviously secure alternatives: ssh / scp.
Who knows, maybe fedora project has deprecated rsh / rcp. (Just speculating.)
- 08-07-2005 #4Linux Engineer
- Join Date
- Feb 2005
- Posts
- 1,044
rsh and rcp are fine if you're running in a secure environment. It'd be very arrogant of the Fedora guys (or anyone) to banish useful commands (and UNIX standard) just because they don't think you're capable of using them safely. (Yes, I know they'd like to banish Windoze ....
Originally Posted by anomie
)
- 08-07-2005 #5Just Joined!
- Join Date
- Jun 2005
- Posts
- 47
I'm fully aware that ssh and scp are more secure, and I would be using them if I had my way. The problem is that this is a big corporation that still uses old Unix servers and the fileserver replication is only supported by the IT group via rsh/rcp.
Originally Posted by anomie
The good news is that this is on a secure network and I have firewall rules to limit access to rsh/rlogin/rcp to a specific IP address.
Nothing is showing up in the logs and I'm not getting e-mail from cron because the cron job never finishes. You can let it try to finish for days and it will still be hung up and will never send an e-mail. BTW, the scripts and cron job have to be executed by a specific user with a specific UID for the whole thing to work. I have checked to make sure that cron is launching the script as that user and it is so I don't know what could be wrong.
Originally Posted by scm
- 08-08-2005 #6Just Joined!
- Join Date
- Jun 2005
- Posts
- 47
Is there any way this could be a cron problem rather than a rsh problem? I've read several websites that discuss how cron can be really screwy at times.
- 08-09-2005 #7Just Joined!
- Join Date
- Jun 2005
- Posts
- 47
Someone please help me!
- 08-09-2005 #8
Would it be possible to post the script? (You could obfuscate the IPs.)
Maybe a new set of eyes can spot the problem. It's a little suspicious that it worked ok under FC3, but who knows.
- 08-09-2005 #9Just Joined!
- Join Date
- Jun 2005
- Posts
- 47
I can not post the whole script, but I can post the relavent snippet that is causing problems.
The last line is where the script hangs. It will not get past that line no matter what I try.Code:SERVER=$1 # # Abort is no servername # if [ "$SERVER" = "" ] ; then echo "SERVER name missing - Aborting" exit fi # SCRIPTLOC=/home/egnhxfr/scripts SETUP=$SCRIPTLOC/bu_data.setup CMNUSG=`grep ",$SERVER," $SETUP | cut -d',' -f3` DATA=`grep ",$SERVER," $SETUP | cut -d',' -f4` DWGS=`grep ",$SERVER," $SETUP | cut -d',' -f5` MAILTO=`grep ",$SERVER," $SETUP | cut -d',' -f6` TODAY=$(date +%y%m%d) # COPYLST=$SCRIPTLOC/bu_requests TMPdir=$SCRIPTLOC/bu TMPScript=$TMPdir/script-$SERVER ERRORLOG="$TMPdir/replicate.bu_data.errorlog.$TODAY" # # Clean up the "deleted" files # rm -f $CMNUSG/links_del/* 2>/dev/null rm -rf $TMPdir 2>/dev/null mkdir $TMPdir # NOW=$(date) echo "Starting cron.replicate.bu_data: $NOW" >> $ERRORLOG echo "" >> $ERRORLOG # # Make temp script to run on primary server # echo 'cd /mailbox/to-nhe' > $TMPScript awk '{print "ls "$0".* >> PROE_PARTS.bur"}' $COPYLST >> $TMPScript echo 'grep bu_crawler /bu/users2/tdsdb/lists/PROE_PARTS.bu | egrep "(\.prt\.|\.asm\.|\.lay\.|\.drw\.)" >> PROE_PARTS.bur' >> $TMPScript echo 'grep bu_common_parts /bu/users2/tdsdb/lists/PROE_PARTS.bu | egrep "(\.prt\.|\.asm\.|\.lay\.|\.drw\.)" >> PROE_PARTS.bur' >> $TMPScript echo 'grep "racine_drive_trains_hin/crawler/hydrostatic" /bu/users2/tdsdb/lists/PROE_PARTS.dt | egrep "(\.prt\.|\.asm\.|\.lay\.|\.drw\.)" >> PROE_PARTS.bur' >> $TMPScript echo 'egrep -v "(_history|_pdbase|_trans_sbm|submission_forms)" PROE_PARTS.bur > PROE_PARTS.bu1' >> $TMPScript echo 'sort -t "." -k1,1 -k2,2 -k3,3nr PROE_PARTS.bu1 | sort -t "." -mu -k1,2 > PROE_PARTS.bur' >> $TMPScript chmod 777 $TMPScript /usr/bin/rcp $TMPScript XXX.XXX.XXX.XXX/mailbox/to-nhe # # Run the script on primary server then retreive the file # /usr/bin/rsh XXX.XXX.XXX.XXX /mailbox/to-nhe/script-$SERVER # This is where everything goes wrong
BTW, I'm launching this as user "egnhxfr" with cron. My crontab entry looks like this:
Code:30 * * * /home/egnhxfr/scripts/cron.replicate.bu_data ca
- 08-09-2005 #10
Just a few questions:
1. For the rsh command, why are you using script-$SERVER instead of $TMPScript? edit: Never mind - I see now.
2. Could you add some error checking immediately after the rcp command at the bottom? Like:Code:RC1=$? if [ "$RC1" -ne 0 ] then echo "rcp returned a code of $RC1. Aborting now..." >&2 exit 1 fi


Reply With Quote
