Strange NFS issue
Hello everyone. I'm using LFS (Not my choice) Linux.
The destination server is an Open-E box running the latest and greatest version of DSS. The NFS Options are:
root at filer:~# uname -a
Linux eb-arch-1 2.6.18 #1 SMP Fri Nov 17 00:16:14 GMT 2006 i686 pentium4 i386 GNU/Linux
- All Squash
- No root squash. (I know it's insecure, but I don't care.)
On "filer", the NFS share is being mounted with just a straight mount command.
(IP changed to protect the innocent) Now, this issue seems to be related to using LFS because it doesn't happen using CentOS.
0.0.0.0:/archive01 on /mnt/nfs type nfs (rw,addr=0.0.0.0)
The issue is that if I perform an rsync like so:
The rsync gets about halfway in and gets "Stuck". Eventually, every single service on "filer" starts becoming unresponsive, logins stop working and eventually the box has to be rebooted. Fortunately, I was already logged in when this started happening, so I was able to kill the rsync job with repeated kill -9 commands. However, doing so left a zombie process on the machine.
rsync -avz --delete /filebackup /mnt/nfs/filebackup
Now, the strange part of this issue is that if the amount of data is small (ie. 5-15 GB) I don't seem to have any issues and the file copy happens quite quickly. (About 3 minutes per 10 GB) It only seems this issue happens when large amounts of data are being copied.
In case any of you are curious, it's not the fault of the DSS box, because I also have this same exact problem when I've mounted an NFS share to another LFS box. The only difference in that case is that both the source and the destination LFS box completely lock up and both of them need to be rebooted.
Any ideas? I want to get some opinions before I consider compiling/installing a new version of NFS on the box.
An anndenum to this, I had to reboot the DSS box because the NFS daemon on it also was becoming unresponsive, but the rest of the box was working fine.
It looks like anything "filer" touches ends up doing this, at least from an NFS point of view.
Would changing the connections to asynchronous help? I heard that this isn't recommended due to possible data loss issues.