Results 1 to 2 of 2
I have a distributed build program I wrote - it runs on 8 different systems sequentially and uses expect to ssh into the different systems. The build takes place on ...
- 01-23-2008 #1Just Joined!
- Join Date
- Jun 2006
- Posts
- 2
NFS Stale File Handle
I have a distributed build program I wrote - it runs on 8 different systems sequentially and uses expect to ssh into the different systems. The build takes place on a single NFS file system. It runs fine on all platforms (Redhat, Centos, gentoo, Solaris, Suse) except for Debian 4.0. (2.6.18.xs4.0.1.900.5799.)
The NFS server is Centos 4.
The clean targets of the Makefiles do a "rm -rf". This results in a "NFS Stale File Handle" error when run from the nightly build. I can run it by hand on the Debian machine with no errors. I can also run the expect script that just builds on the Debian machine over and over with no errors. It is only in the context of the nightly build the error occurs.
[exec] rm -rf dist_debian_2.6_i686
[exec] rm: cannot chdir from `dist_debian_2.6_i686/tva' to `bin': Stale NFS file handle
[exec] rm: cannot chdir from `dist_debian_2.6_i686/tva' to `bin': Stale NFS file handle
[exec] make[1]: *** [clean] Error 1
I changed:
rm -rf foo
to
rm -rf foo/*
rm -rf foo
And the error goes away - most of the time. The problem is there are a lot of places in Makefiles where we do this and I can't make sure everyone knows to do this work around. And it's ugly. Plus, I still don't know exactly what the problem is.
I know what NFS Stale File Handle means - it means a file or directory the NFS client is trying to access is no longer there. This only occurs on a recursive "rm -rf". I think the possible problems are:
- Need different NFS mount options. Right now I am using rw,rsize=4096,wsize=4096,hard,intr,async,nodev,act imeo=5. I shortened the cache time to see if that would help.
- There is a bug on the Debian side (kernel, NFS, etc.) that is causing it not to work properly with the NFS server.
- A performance problem on the NFS server. I think one of the reasons it may only happen in the context of a nightly build is that the file system is very active when those run.
I have hunted on the Internet, but none of the solutions out there have helped me. Unmounting/mounting the file systems does not help.
Anyway, I am at a loss!!! Any ideas?
- 01-27-2008 #2Just Joined!
- Join Date
- Jun 2006
- Posts
- 2
Resolved
We had to re-install the system to match the OS version our customer was running. Debian 3.1, 2.6.19.7 kernel version. The problem has gone away. So there is either a bug with 4.0 or a problem with they way our machine was configured. (kernel maybe?)
I am posting this because no where could I find an explanation or solution other than "re-mount the filesystem" or something similar. Hopefully this will help someone else. I still don't know the exact cause, but I'll have a better idea of what to look at if we need to re-install the other OS.


Reply With Quote
