Originally Posted by
dwhickok
Any thoughts or leads are very much appreciated!!!
If U16 is Ubuntu 16.04, it is passed time to migrate to 18.04 or 20.04. Support for 16.04 ends this month. It isn't worth troubleshooting anything on 16.04 at this point. https://ubuntu.com/about/release-cycle On my 18.04 NFS server, I run:
Code:
$ ubuntu-support-status
Support status summary of 'istar':
You have 1454 packages (62.4%) supported until April 2023 (Canonical - 5y)
You have 512 packages (22.0%) supported until April 2021 (Community - 3y)
You have 2 packages (0.1%) supported until April 2021 (Canonical - 3y)
Why force NFSv3 when NFSv4 has been production ready since 2005 and has a number of enhancements for both security and performance?
You've lost me on what system is actually running the NFS server. As I read the summary, seems ESXi is providing NFS. If the NFS server is an Ubuntu VM, how does it efficiently access the block storage? Hopefully, not using vmdk files, but raw, block storage, access or a PCIe controller passthru?
I only run NFS clients in VMs. NFS servers are on real hardware ... er ... for performance reasons.
Ok, sorry for all the junk below. I got into this a little more than I should have. You can ignore all of it - or not.
Were I troubleshooting this, I'd start with performance of the disks and network as separate problems. Then I'd add in nfsstat and run tests from an NFS client using something like fio or bonnie++, and locally perform the same tests with fio or bonnie++. https://arstechnica.com/gadgets/2020...-way-with-fio/ has some test step, explanations and results.
The exact command:
Code:
$ fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k \
--numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based \
--end_fsync=1
The test file is created in the CWD. I didn't run as root. There is overhead in using a normal userid, but for NFS, remote root gets mapped to nobody, so it only seemed fair.
From an NFS client using virtio for networking and disk controllers, inside a VM on a different physical machine that the NFS server, the first test from Ars had these 2 major lines of output:
Code:
write: IOPS=3544, BW=13.8MiB/s (14.5MB/s)(1191MiB/86041msec); 0 zone resets
WRITE: bw=13.8MiB/s (14.5MB/s), 13.8MiB/s-13.8MiB/s (14.5MB/s-14.5MB/s), io=1191MiB (1249MB), run=86041-86041msec
On the physical system (nfs server), same directory, same disk,
Code:
write: IOPS=1927, BW=7710KiB/s (7895kB/s)(866MiB/115075msec)
WRITE: bw=7710KiB/s (7895kB/s), 7710KiB/s-7710KiB/s (7895kB/s-7895kB/s), io=866MiB (909MB), run=115075-115075msec
So, the NFS was faster.
On the VM host (I use KVM), a much faster system with better CPU, RAM, and networking, same directory, same NFS disk,
Code:
write: IOPS=3445, BW=13.5MiB/s (14.1MB/s)(1712MiB/127224msec)
WRITE: bw=13.5MiB/s (14.1MB/s), 13.5MiB/s-13.5MiB/s (14.1MB/s-14.1MB/s), io=1712MiB (1795MB), run=127224-127224msec
I didn't run 10 runs on each system, nor did I reboot or turn off other jobs running on all the installs so this is just FYI. Basically the systems were doing what they normally do on a Tuesday afternoon already.
My exports looks like this:
Code:
/d/D1 regulus(rw,async,root_squash)
On the clients, I use autofs to mount NFS when requested. That looks like this:
Code:
/d/D1 -fstype=nfs,proto=tcp,intr,rw,async istar:/d/D1
LVM is used on D1 with ext4 file system.
The drive is a HGST HUS726T4TALA6L4. That's an Ultrastar DC HC310 7200rpm 4TB HDD - basically a WD-Gold Series HDD.
I got curious - my NFS server is a 5 yr old dual core Pentium with over 30TB of disks connected, and a fairly crappy NIC. I've been meaning to swap in an Intel PRO/1000 or i210/i211 NIC for a few years. It's on the TODO list. Promise.
Re-ran the tests on my main VM host which has a cheap HDD and ok SATA SSD. For the HDD:
Code:
write: IOPS=4891, BW=19.1MiB/s (20.0MB/s)(2794MiB/146229msec)
WRITE: bw=19.1MiB/s (20.0MB/s), 19.1MiB/s-19.1MiB/s (20.0MB/s-20.0MB/s), io=2794MiB (2930MB), run=146229-146229msec
For the m.2 SATA-SSD:
Code:
write: IOPS=54.8k, BW=214MiB/s (224MB/s)(13.3GiB/63784msec)
WRITE: bw=214MiB/s (224MB/s), 214MiB/s-214MiB/s (224MB/s-224MB/s), io=13.3GiB (14.3GB), run=63784-63784msec
Hummm.
Because I like bad results, run the same random block test on another NFS server here. It is a Core i5-750 from 2010 and has a few RAID1 arrays.
Locally, on the real hardware:
Code:
write: IOPS=1372, BW=5491KiB/s (5623kB/s)(1780MiB/331874msec)
WRITE: bw=5491KiB/s (5623kB/s), 5491KiB/s-5491KiB/s (5623kB/s-5623kB/s), io=1780MiB (1866MB), run=331874-331874msec
RAID1 takes a hit.
NFS client:
Code:
write: IOPS=6530, BW=25.5MiB/s (26.7MB/s)(2114MiB/82884msec)
WRITE: bw=25.5MiB/s (26.7MB/s), 25.5MiB/s-25.5MiB/s (26.7MB/s-26.7MB/s), io=2114MiB (2217MB), run=82884-82884msec
One thing is certain, network disk caching is helpful!
Bookmarks