Find the answer to your Linux question:
Results 1 to 5 of 5
The script below succeeds 100% when run in a directory on a locally-mounted disk volume, but exhibits a failure rate between .05% and .5% when run in a directory mounted ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    djl
    djl is offline
    Just Joined!
    Join Date
    Jul 2011
    Posts
    3

    Question Intermittent FIFO Pipe Failures


    The script below succeeds 100% when run in a directory on a locally-mounted disk volume, but exhibits a failure rate between .05% and .5% when run in a directory mounted on any of our NFS volumes.

    It appears the mkfifo pipe intermittently behaves like a file, in that when the script backgrounds the write, the subsequent read receives no data.

    Code:
    #!/bin/ksh
    
    failcount=0
    nreps=1000
    
    awk 'BEGIN{for(i=0;i<1000;++i){print i}}' >datafile
    
    docat="cat datafile"
    for i in {1 .. 100} ; do
    	docat="$docat && cat datafile"
    done
    
    for reps in {1..$nreps} ; do
    	mkfifo mypipe
    	eval "$docat" | awk '{print}' > mypipe &
    	cat mypipe >myfile
    	[[ -z $(head myfile) ]] && failcount=$(( $failcount + 1 ))
    	echo "failrate ($failcount/$reps)" >status
    	rm -f mypipe myfile
    done
    
    cat status
    rm -f status datafile
    We have a hunch this is related to how NFS is configured (e.g. for syncronous vs asynchronous writes), and we're interested to know if anyone else can reproduce the same behavior. Any additional insight would also be greatly appreciated.

    Note: It is trivial to make these failures go away by modifying the script. The point is not to make the script work, but to find out why the failures are occurring in the first place, as we have other scripts which are exhibiting similar behavior (although far less frequently).

    Version info:

    Code:
    $ uname -a
    Linux myhost.mydomain.com 2.6.18-194.32.1.el5 #1 SMP Mon Dec 20 10:52:42 EST 2010 x86_64 x86_64 x86_64 GNU/Linux
    
    $ /bin/ksh
    $ echo ${.sh.version}
    Version M 93s+ 2008-01-31
    
    $/bin/awk --version
    GNU Awk 3.1.5
    
    $ /usr/bin/mkfifo --version
    mkfifo (GNU coreutils) 5.97
    
    $ yum --version nfs
    3.2.22
    Last edited by djl; 07-08-2011 at 09:27 PM. Reason: remove proprietary info

  2. #2
    scm
    scm is offline
    Linux Engineer
    Join Date
    Feb 2005
    Posts
    1,044
    Would you need a small delay before you cat the fifo to allow the data to flow into the remote pipe? If the cat starts while the pipe is empty it may think it's terminated.

  3. #3
    djl
    djl is offline
    Just Joined!
    Join Date
    Jul 2011
    Posts
    3
    It's been a while, but this bug continues to be problematic for us.

    As scm suggested, adding a sleep prior to catting the pipe helps in that it lessens the frequency of this (kernel?) bug. Another thing we have tried is doing a stat on the pipe prior to cat; This also helps, but does not eliminate the problem.

    We have documented this bug on the following file systems: ext3, ext4, nfs, nfs4, tmpfs, xfs

    By increasing to nreps=10000 we usually see at least 1 occurrence on any of these, with the exception of xfs, although our testing on xfs is not yet conclusive. Below are some recent results:

    Code:
    FAILSPER10K FSTYPE LOCALOS REMOTEOS
    0           xfs    RHEL6   ()
    1           ext4   RHEL6   ()
    1           tmpfs  RHEL5   ()
    3           ext3   RHEL5   ()
    326         nfs    RHEL5   RHEL5
    373         nfs4   RHEL5   RHEL6
    We would very much appreciate if someone else would run the above script to confirm that this problematic behavior can be replicated elsewhere. Because we make heavy use of backgrounded pipes in our shell scripts, this bug is important for us to identify and address. Please include the OS version and file system type in your results.

    Thanks!!
    Last edited by djl; 10-24-2012 at 05:06 PM. Reason: remove DOTNFSFILES from results to keep things simpler

  4. $spacer_open
    $spacer_close
  5. #4
    Linux Newbie
    Join Date
    Jun 2012
    Location
    SF Bay area
    Posts
    217
    Frankly, I wouldn't expect FIFO's to work on an NFS volume in the general case. It might work sometimes, but since lots of systems have simultaneous R/W access to NFS volumes, then I think certain use cases would just cause ugly, confusing problems. So unless you can carefully control the access to the FIFO (and basically limit it to only process from a single server at a time), then I wouldn't advise relying on it at all.

    Here's a NIST.Gov page that looked into it that might be interesting.

    FIFOs in a Network Environment
    Last edited by cnamejj; 10-24-2012 at 09:28 PM. Reason: typo. s/NFA/NFS/

  6. #5
    djl
    djl is offline
    Just Joined!
    Join Date
    Jul 2011
    Posts
    3
    Appreciate the feedback and the link. Thanks.

    The way we implement fifos is within a single process. They are created using a unique name that includes the PID and are cleaned up (deleted) when the process exits, so I do not believe the issue of accessing the fifo from different clients applies; that's asking for trouble.

    The issue appears to have something to do with the state of the fifo when it's created. The vast majority of the time the scripts work as expected, and the problem also happens on locally-mounted (non-network) volumes, just less often. So this discussion could be limited to a non-networked single-host and single process scenario.

    My guess is that this happens everywhere, at least on RHEL, only no one notices because it's rare and not many people rely on fifo pipes as much as we do.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •