Hi All,
I have a problem with an NFS server connected to a small cluster (104 cores spread over 11 execution nodes). The server exports 6 directories, with things like /home, /scratch, and /software partitions, etc.
I have a user who submits large task arrays to the cluster, with each instance of that task generating large amounts of disk writes. This maxes out the IO bandwidth to the disk (sdb) in the NFS server containing the /scratch partition. The result is that all the instances of the nfsd go into the 'D' state while they are waiting for the local IO (confirmed using top). My output from iostat confirming this issue is:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 70.50 0.00 3.50 0.00 1188.50 339.57 0.00 0.00 0.00 0.00
sdb 0.00 82.00 0.00 109.00 0.00 2629.00 24.12 3.70 32.48 9.17 100.00
sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Note the line for sdb. The problem is that this condition appears to *also* hold up client access to directories on the two other disks, and the entire cluster grinds to a halt. I have tested ethernet bandwidth (4 bonded GE lines), CPU (8 Xeon cores), and memory (8G) and there is plenty of headroom (esp CPU and memory).
Is this expected behaviour for NFS, and is there anything I can do to alleviate the issue?
Thanks,
Chris
Bookmarks