Find the answer to your Linux question:
Results 1 to 3 of 3
I've made a lot of research on how to optimize NFS but sadly I have'nt found anything that helped me optimize my NFS server on one of my cluster. I ...
Enjoy an ad free experience by logging in. Not a member yet? Register.
  1. #1
    Just Joined!
    Join Date
    Jun 2006
    Posts
    2

    Question HowTo: Optimizing NFS ?


    I've made a lot of research on how to optimize NFS but sadly I have'nt found anything that helped me optimize my NFS server on one of my cluster.

    I run a 8 node cluster. Two of them are quad-2ghz cpus frontends using heartbeat + drbd NFS sharing a 250 gig. /apps mount-point to 6 backends composed of 2x1ghz backends.

    The application running on the frontend is mainly doing a lot of calculations based on radar volumes scans to detect precipitations types, etc.. The resulting files are dumped in flat files under a /apps/.../db/RADAR/ directory. Under sever weather there can be has much has 10000 sub-directories under that db/RADAR/ directory.

    Every backend runs 1 client / cpu (since dual cpu = 2 clients each). Theses clients simply checks for a job to run in a specific /apps/.../run directory. Theses jobs files are auto-generated scripts which calls a graphics modules which generates pretty big radar images at a relatively high resolution... (about 70 radars are being merged when doing composites images). Theses images are then stored back into a sub-directory under db/RADAR/xyz directory.

    That being said, the problem is that after a few weeks the load average on the backends jumps off and graphics generation starts to slow down dramatically. The only way I can make them work at full load again is when I either reboot the backends when they slow down or simply umount and remount the /apps directory.

    Every servers (frontends and backends) are sadly running a redhat 7.3... At least we upgraded the kernel to a 2.4.32 whch helps. I have tried different ways to mount my /apps filesystem from the backends to optimize NFS but it never fixed the problem... And pretty sure my problem reside in NFS tweaking.

    So recently I decided to use two totally different way to export and mount the /apps files system with sadly no success (meaning that it did'nt changed much):

    Here is my /etc/exports on my frontends:
    /apps 192.168.1.101(rw,async,wdelay,no_root_squash)
    /apps 192.168.1.102(rw,async,wdelay,no_root_squash)
    /apps 192.168.1.103(rw,async,wdelay,no_root_squash)
    /apps 192.168.1.104(rw,sync,no_wdelay,no_root_squash)
    /apps 192.168.1.105(rw,sync,no_wdelay,no_root_squash)
    /apps 192.168.1.106(rw,sync,no_wdelay,no_root_squash)

    And here is how I mount it:
    (101-103)
    192.168.1.2:/apps /apps nfs rsize=2048,wsize=2048,hard,intr,acdirmin=1,acdirma x=2,timeo=4,retrans=9 1

    (104-106)
    192.168.1.2:/apps /apps nfs rsize=2048,wsize=2048,hard,tcp,intr,noac,actimeo=1 0 1

    To resume:
    - The database is compose of a lot of sub-directories (can reach 10000) and a fair amount of small files
    - 6 backends = 12 daemons checking for a graphic generation job to run.
    - About 2000 jobs being run on the backends per hour (making about 50 000 / day)

    My results are:
    - Using TCP instead of UDP help to reduce the fragmented packets considerably
    - Using TCP instead of UDP increased the network traffic
    - Neither UDP or TCP using theses configurations seems to behave better
    - It's not necessary to reset the NFS server.. only resetting the client makes the backend behave normally has usuall (it's load average falls back from 7 to 2-3 when occupied with two jobs i.e. one on each cpu)

    Conclusion:
    1- Even after a lot of reading, work and effort I still did not found the appropriate way to "optimize" my NFS server
    - My gigabit network might be capped... although unless I don't understand well the ntop results it seems ok?

    Any body has a clue?

    thnx

    - vin

  2. #2
    Linux User DThor's Avatar
    Join Date
    Jan 2006
    Location
    Ca..na...daaa....
    Posts
    319
    I had nothing but grief when we went giggie and had a mix of RH 7.3 and newer systems. Nothing I tried got things talking at top speed, consistently, to a 7.3 server. As I'm sure you've found, there are many open NFS speed issues with 7.3 - perhaps your kernel upgrade addressed some of the bad ones, but I get the feeling that's still at the base of your problems.

    I completely understand that it's easy to say "just upgrade", but the powers that be just can't allow it, but I wonder if it's worth doing a test of two more recent kernels on your network to verify that the problem seems related to that kernel?

    DT

  3. #3
    Just Joined!
    Join Date
    Jun 2006
    Posts
    2

    Unhappy

    In fact, using the original 2.4.20 (which is actuall an 2.4.21-rc something) the NFS performances where even worst..

    The reason I made a 2.4.32 rpm kernel and installed it everywhere was to be able to use NFS via TCP... an option not available in 2.4.20.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •