Hi,
Thanks for the comprehensive reply. Yeah this is an odd one. I've been using ZFS on Ubuntu for years and have never had any issues until now, not even the slightest blip.
As requested:
Code:
sudo zpool status -v Tank
pool: Tank
state: ONLINE
scan: scrub repaired 0B in 10:00:01 with 0 errors on Mon Nov 13 23:01:24 2023
config:
NAME STATE READ WRITE CKSUM
Tank ONLINE 0 0 0
raidz2-0 ONLINE 0 0 0
ata-ST4000DM000-1F2168_S300XXXX ONLINE 0 0 0
ata-ST4000DM004-2CV104_ZTT4XXXX ONLINE 0 0 0
ata-ST4000DM004-2CV104_ZTT4XXXX ONLINE 0 0 0
ata-ST4000DM000-1F2168_W300XXXX ONLINE 0 0 0
ata-ST4000DM000-1F2168_W300XXXX ONLINE 0 0 0
ata-ST4000DM000-1F2168_W300XXXX ONLINE 0 0 0
ata-ST4000DM000-1F2168_W300XXXX ONLINE 0 0 0
ata-ST4000DM000-1F2168_W300XXXX ONLINE 0 0 0
errors: No known data errors
arc_summary: https://pastebin.ubuntu.com/p/5ZDtsNzX7v/
FIO:
Code:
fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.28
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][46.7%][w=274MiB/s][w=274 IOPS][eta 00m:08s]
Jobs: 1 (f=1): [W(1)][65.0%][w=392MiB/s][w=392 IOPS][eta 00m:07s]
Jobs: 1 (f=1): [W(1)][73.1%][w=40.0MiB/s][w=40 IOPS][eta 00m:07s]
Jobs: 1 (f=1): [W(1)][73.5%][w=63.0MiB/s][w=63 IOPS][eta 00m:09s]
Jobs: 1 (f=1): [W(1)][83.8%][w=289MiB/s][w=289 IOPS][eta 00m:06s]
Jobs: 1 (f=1): [W(1)][97.4%][w=232MiB/s][w=232 IOPS][eta 00m:01s]
Jobs: 1 (f=1): [W(1)][97.7%][eta 00m:01s]
Jobs: 1 (f=1): [W(1)][97.8%][eta 00m:01s]
TEST: (groupid=0, jobs=1): err= 0: pid=1241014: Tue Nov 28 23:44:29 2023
write: IOPS=235, BW=235MiB/s (247MB/s)(10.0GiB/43556msec); 0 zone resets
slat (usec): min=252, max=53915, avg=3569.49, stdev=4839.84
clat (usec): min=3, max=7052.4k, avg=131421.17, stdev=405561.87
lat (usec): min=313, max=7056.0k, avg=134991.41, stdev=407262.28
clat percentiles (msec):
| 1.00th=[ 11], 5.00th=[ 13], 10.00th=[ 16], 20.00th=[ 21],
| 30.00th=[ 63], 40.00th=[ 72], 50.00th=[ 81], 60.00th=[ 88],
| 70.00th=[ 99], 80.00th=[ 120], 90.00th=[ 207], 95.00th=[ 426],
| 99.00th=[ 944], 99.50th=[ 1028], 99.90th=[ 7013], 99.95th=[ 7013],
| 99.99th=[ 7080]
bw ( KiB/s): min=28672, max=2134016, per=100.00%, avg=280756.99, stdev=332876.74, samples=74
iops : min= 28, max= 2084, avg=274.18, stdev=325.08, samples=74
lat (usec) : 4=0.02%, 10=0.03%, 500=0.02%, 1000=0.01%
lat (msec) : 2=0.05%, 4=0.13%, 10=0.63%, 20=18.89%, 50=7.08%
lat (msec) : 100=43.91%, 250=21.15%, 500=4.10%, 750=2.07%, 1000=1.32%
lat (msec) : 2000=0.29%, >=2000=0.30%
fsync/fdatasync/sync_file_range:
sync (nsec): min=1223, max=1223, avg=1223.00, stdev= 0.00
sync percentiles (nsec):
| 1.00th=[ 1224], 5.00th=[ 1224], 10.00th=[ 1224], 20.00th=[ 1224],
| 30.00th=[ 1224], 40.00th=[ 1224], 50.00th=[ 1224], 60.00th=[ 1224],
| 70.00th=[ 1224], 80.00th=[ 1224], 90.00th=[ 1224], 95.00th=[ 1224],
| 99.00th=[ 1224], 99.50th=[ 1224], 99.90th=[ 1224], 99.95th=[ 1224],
| 99.99th=[ 1224]
cpu : usr=1.68%, sys=10.99%, ctx=76215, majf=0, minf=15
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,10240,0,1 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=235MiB/s (247MB/s), 235MiB/s-235MiB/s (247MB/s-247MB/s), io=10.0GiB (10.7GB), run=43556-43556msec
and again 10 mins later:
Code:
fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.28
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][11.7%][eta 00m:53s]
Jobs: 1 (f=1): [W(1)][20.0%][w=3072KiB/s][w=3 IOPS][eta 00m:48s]
Jobs: 1 (f=1): [W(1)][30.0%][w=3075KiB/s][w=3 IOPS][eta 00m:42s]
Jobs: 1 (f=1): [W(1)][40.0%][w=4096KiB/s][w=4 IOPS][eta 00m:36s]
Jobs: 1 (f=1): [W(1)][50.0%][w=4100KiB/s][w=4 IOPS][eta 00m:30s]
Jobs: 1 (f=1): [W(1)][60.0%][w=3072KiB/s][w=3 IOPS][eta 00m:24s]
Jobs: 1 (f=1): [W(1)][70.0%][w=2050KiB/s][w=2 IOPS][eta 00m:18s]
Jobs: 1 (f=1): [W(1)][80.0%][w=2048KiB/s][w=2 IOPS][eta 00m:12s]
Jobs: 1 (f=1): [W(1)][90.0%][w=3072KiB/s][w=3 IOPS][eta 00m:06s]
Jobs: 1 (f=1): [W(1)][100.0%][w=3075KiB/s][w=3 IOPS][eta 00m:00s]
Jobs: 1 (f=1): [W(1)][1.4%][w=3072KiB/s][w=3 IOPS][eta 01h:10m:46s]
TEST: (groupid=0, jobs=1): err= 0: pid=1368279: Tue Nov 28 23:53:39 2023
write: IOPS=2, BW=3003KiB/s (3075kB/s)(177MiB/60356msec); 0 zone resets
slat (msec): min=221, max=660, avg=340.97, stdev=83.41
clat (usec): min=13, max=14102k, avg=9758855.47, stdev=3033929.82
lat (msec): min=363, max=14413, avg=10099.83, stdev=3058.23
clat percentiles (msec):
| 1.00th=[ 363], 5.00th=[ 2567], 10.00th=[ 5470], 20.00th=[ 8221],
| 30.00th=[ 8792], 40.00th=[ 9597], 50.00th=[10268], 60.00th=[10671],
| 70.00th=[11073], 80.00th=[12281], 90.00th=[13355], 95.00th=[13758],
| 99.00th=[14026], 99.50th=[14160], 99.90th=[14160], 99.95th=[14160],
| 99.99th=[14160]
bw ( KiB/s): min= 2048, max= 4104, per=99.90%, avg=3000.53, stdev=1026.58, samples=99
iops : min= 2, max= 4, avg= 2.93, stdev= 1.00, samples=99
lat (usec) : 20=0.56%
lat (msec) : 500=0.56%, 750=0.56%, 1000=0.56%, 2000=1.69%, >=2000=96.05%
cpu : usr=0.03%, sys=0.28%, ctx=1432, majf=0, minf=10
IO depths : 1=0.6%, 2=1.1%, 4=2.3%, 8=4.5%, 16=9.0%, 32=82.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.3%, 8=0.0%, 16=0.0%, 32=0.7%, 64=0.0%, >=64=0.0%
issued rwts: total=0,177,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=3003KiB/s (3075kB/s), 3003KiB/s-3003KiB/s (3075kB/s-3075kB/s), io=177MiB (186MB), run=60356-60356msec
But if I use my previous testing method as something to measure against, I don't appear to be in a massive slow down right now:
Code:
dd if=/dev/zero of=/mnt/Tank/testfile bs=1G count=6 oflag=dsync
6+0 records in
6+0 records out
6442450944 bytes (6.4 GB, 6.0 GiB) copied, 56.0559 s, 115 MB/s
I may have to monitor this for a couple of days and re-run fio when I see things drop to the abysmal 35MB/s... unless anything stands out to you? Nothing on a physical level has changed since Bionic. This has to be some sort of config issue?
Edit: here's another one:
Code:
fio --name TEST --eta-newline=5s --filename=temp.file --rw=write --size=2g --io_size=10g --blocksize=1024k --ioengine=libaio --fsync=10000 --iodepth=32 --direct=1 --numjobs=1 --runtime=60 --group_reporting
TEST: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 1024KiB-1024KiB, ioengine=libaio, iodepth=32
fio-3.28
Starting 1 process
TEST: Laying out IO file (1 file / 2048MiB)
Jobs: 1 (f=1): [W(1)][23.3%][w=134MiB/s][w=134 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][35.1%][w=188MiB/s][w=188 IOPS][eta 00m:24s]
Jobs: 1 (f=1): [W(1)][44.2%][w=167MiB/s][w=167 IOPS][eta 00m:24s]
Jobs: 1 (f=1): [W(1)][49.0%][eta 00m:26s]
Jobs: 1 (f=1): [W(1)][51.7%][eta 00m:29s]
Jobs: 1 (f=1): [W(1)][61.7%][w=4100KiB/s][w=4 IOPS][eta 00m:23s]
Jobs: 1 (f=1): [W(1)][71.7%][eta 00m:17s]
Jobs: 1 (f=1): [W(1)][81.7%][w=180MiB/s][w=180 IOPS][eta 00m:11s]
Jobs: 1 (f=1): [W(1)][91.7%][eta 00m:05s]
Jobs: 1 (f=1): [W(1)][100.0%][w=5120KiB/s][w=5 IOPS][eta 00m:00s]
TEST: (groupid=0, jobs=1): err= 0: pid=1759323: Wed Nov 29 00:20:56 2023
write: IOPS=100, BW=101MiB/s (106MB/s)(6076MiB/60205msec); 0 zone resets
slat (usec): min=49, max=6932.0k, avg=9330.20, stdev=147502.33
clat (msec): min=2, max=11969, avg=307.67, stdev=1279.78
lat (msec): min=2, max=11969, avg=317.00, stdev=1303.51
clat percentiles (msec):
| 1.00th=[ 39], 5.00th=[ 57], 10.00th=[ 65], 20.00th=[ 65],
| 30.00th=[ 69], 40.00th=[ 77], 50.00th=[ 80], 60.00th=[ 124],
| 70.00th=[ 167], 80.00th=[ 186], 90.00th=[ 262], 95.00th=[ 472],
| 99.00th=[10000], 99.50th=[11208], 99.90th=[12013], 99.95th=[12013],
| 99.99th=[12013]
bw ( KiB/s): min= 2048, max=466944, per=100.00%, avg=179392.93, stdev=151093.97, samples=69
iops : min= 2, max= 456, avg=175.19, stdev=147.55, samples=69
lat (msec) : 4=0.03%, 10=0.07%, 20=0.28%, 50=0.87%, 100=53.83%
lat (msec) : 250=34.53%, 500=5.74%, 750=1.10%, 1000=1.04%, 2000=0.64%
lat (msec) : >=2000=1.86%
cpu : usr=0.60%, sys=0.72%, ctx=2228, majf=0, minf=12
IO depths : 1=0.1%, 2=0.1%, 4=0.2%, 8=0.4%, 16=0.8%, 32=98.5%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=99.9%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
issued rwts: total=0,6076,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=32
Run status group 0 (all jobs):
WRITE: bw=101MiB/s (106MB/s), 101MiB/s-101MiB/s (106MB/s-106MB/s), io=6076MiB (6371MB), run=60205-60205msec
Disk stats (read/write):
sdg: ios=153/12344, merge=59/268, ticks=133897/2984697, in_queue=3133595, util=98.20%
Bookmarks