Find the answer to your Linux question:
Results 1 to 2 of 2
I have a distributed build program I wrote - it runs on 8 different systems sequentially and uses expect to ssh into the different systems. The build takes place on ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jun 2006
    Posts
    2

    NFS Stale File Handle


    I have a distributed build program I wrote - it runs on 8 different systems sequentially and uses expect to ssh into the different systems. The build takes place on a single NFS file system. It runs fine on all platforms (Redhat, Centos, gentoo, Solaris, Suse) except for Debian 4.0. (2.6.18.xs4.0.1.900.5799.)

    The NFS server is Centos 4.

    The clean targets of the Makefiles do a "rm -rf". This results in a "NFS Stale File Handle" error when run from the nightly build. I can run it by hand on the Debian machine with no errors. I can also run the expect script that just builds on the Debian machine over and over with no errors. It is only in the context of the nightly build the error occurs.

    [exec] rm -rf dist_debian_2.6_i686
    [exec] rm: cannot chdir from `dist_debian_2.6_i686/tva' to `bin': Stale NFS file handle
    [exec] rm: cannot chdir from `dist_debian_2.6_i686/tva' to `bin': Stale NFS file handle
    [exec] make[1]: *** [clean] Error 1


    I changed:

    rm -rf foo

    to

    rm -rf foo/*
    rm -rf foo

    And the error goes away - most of the time. The problem is there are a lot of places in Makefiles where we do this and I can't make sure everyone knows to do this work around. And it's ugly. Plus, I still don't know exactly what the problem is.

    I know what NFS Stale File Handle means - it means a file or directory the NFS client is trying to access is no longer there. This only occurs on a recursive "rm -rf". I think the possible problems are:

    - Need different NFS mount options. Right now I am using rw,rsize=4096,wsize=4096,hard,intr,async,nodev,act imeo=5. I shortened the cache time to see if that would help.
    - There is a bug on the Debian side (kernel, NFS, etc.) that is causing it not to work properly with the NFS server.
    - A performance problem on the NFS server. I think one of the reasons it may only happen in the context of a nightly build is that the file system is very active when those run.

    I have hunted on the Internet, but none of the solutions out there have helped me. Unmounting/mounting the file systems does not help.

    Anyway, I am at a loss!!! Any ideas?

  2. #2
    Just Joined!
    Join Date
    Jun 2006
    Posts
    2

    Resolved

    We had to re-install the system to match the OS version our customer was running. Debian 3.1, 2.6.19.7 kernel version. The problem has gone away. So there is either a bug with 4.0 or a problem with they way our machine was configured. (kernel maybe?)

    I am posting this because no where could I find an explanation or solution other than "re-mount the filesystem" or something similar. Hopefully this will help someone else. I still don't know the exact cause, but I'll have a better idea of what to look at if we need to re-install the other OS.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •