Page 12 of 14 FirstFirst ... 21011121314 LastLast
Results 111 to 120 of 136

Thread: Seemingly sporadic slow ZFS IO since 22.04

  1. #111
    Join Date
    Aug 2016
    Location
    Wandering
    Beans
    Hidden!
    Distro
    Xubuntu Development Release

    Re: Seemingly sporadic slow ZFS IO since 22.04

    What's heat like on all drives?

    I've had mine under a moderate load for a couple hours now:
    Code:
    sudo smartctl -a /dev/nvme0n1p1 |grep Temperature
    Temperature:                        32 Celsius
    Warning  Comp. Temperature Time:    0
    Critical Comp. Temperature Time:    0
    Temperature Sensor 1:               32 Celsius
    Temperature Sensor 2:               33 Celsius
    Last edited by 1fallen; January 30th, 2024 at 11:07 PM.
    With realization of one's own potential and self-confidence in one's ability, one can build a better world.
    Dalai Lama>>
    Code Tags | System-info | Forum Guide lines | Arch Linux, Debian Unstable, FreeBSD

  2. #112
    Join Date
    Mar 2010
    Location
    USA
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: Seemingly sporadic slow ZFS IO since 22.04

    I'm starting to see a pattern there...

    First, this is what these columns mean:
    r_await is the average time (in milliseconds) for read requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.

    w_await is the average time (in milliseconds) for write requests issued to the device to be served. This includes the time spent by the requests in queue and the time spent servicing them.
    Look at the consistently high values for /dev/sdf1 in those two columns. That disk is only 1 disk in an 8 disk RAIDZ2 array right?

    You would expect the load there would be balanced between all the disks of the array, not just one.

    First I would use smartmon tools to confirm the health of that drive. If it is good, then I would move the data off the array then write it back to it. That way the data is confirmed as no fragmentation and written equally between all the drives.

    To do that though... For that much data, that is going to take some time and resources.

    EDIT: Sorry about that news. Please, both of you look at that with that understanding of what that means, and see if you reach the same conclusion.
    Last edited by MAFoElffen; January 31st, 2024 at 12:10 AM.

    "Concurrent coexistence of Windows, Linux and UNIX..." || Ubuntu user # 33563, Linux user # 533637
    Sticky: Graphics Resolution | UbuntuForums 'system-info' Script | Posting Guidelines | Code Tags

  3. #113
    Join Date
    Aug 2016
    Location
    Wandering
    Beans
    Hidden!
    Distro
    Xubuntu Development Release

    Re: Seemingly sporadic slow ZFS IO since 22.04

    Yeppers that's a good spot MAFoElffen
    With realization of one's own potential and self-confidence in one's ability, one can build a better world.
    Dalai Lama>>
    Code Tags | System-info | Forum Guide lines | Arch Linux, Debian Unstable, FreeBSD

  4. #114
    Join Date
    Nov 2023
    Beans
    76

    Re: Seemingly sporadic slow ZFS IO since 22.04

    Temps of array disks:

    Code:
    sda1 =35c
    sdb1 = 36c
    sdc1 = 34c
    sdd1 = 32c
    sde1 = 31c
    sdf1 = 33c
    sdg1 = 31c
    sdh1 = 32c
    Yes, sdf is in the array, and does seem to be the only one in the array with issues.

    But it's smart checks out - no errors. It is one of the newer drives - I had to replace 2 in the history of the pool.

    Some info for sdf:

    Code:
    Complete SCT temperature log:
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       522 (0x020a)
    Device State:                        Active (0)
    Current Temperature:                    32 Celsius
    Power Cycle Min/Max Temperature:     29/35 Celsius
    Lifetime    Min/Max Temperature:     19/46 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Temperature History Version:     2
    Temperature Sampling Period:         3 minutes
    Temperature Logging Interval:        59 minutes
    Min/Max recommended Temperature:     14/55 Celsius
    Min/Max Temperature Limit:           10/60 Celsius
    Temperature History Size (Index):    128 (41)
    And smart stat for sdf:

    Code:
    smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.15.0-92-generic] (local build)
    Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Seagate BarraCuda 3.5
    Device Model:     ST4000DM004-2CV104
    Serial Number:    ZTT4XXXX
    LU WWN Device Id: 5 000c50 0e466fde3
    Firmware Version: 0001
    User Capacity:    4,000,787,030,016 bytes [4.00 TB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5425 rpm
    Form Factor:      3.5 inches
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ACS-3 T13/2161-D revision 5
    SATA Version is:  SATA 3.1, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Wed Jan 31 02:13:03 2024 UTC
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    AAM feature is:   Unavailable
    APM feature is:   Unavailable
    Rd look-ahead is: Enabled
    Write cache is:   Enabled
    DSN feature is:   Unavailable
    ATA Security is:  Disabled, frozen [SEC2]
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00)    Offline data collection activity
                        was never started.
                        Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0)    The previous self-test routine completed
                        without error or no self-test has ever 
                        been run.
    Total time to complete Offline 
    data collection:         (    0) seconds.
    Offline data collection
    capabilities:              (0x73) SMART execute Offline immediate.
                        Auto Offline data collection on/off support.
                        Suspend Offline collection upon new
                        command.
                        No Offline surface scan supported.
                        Self-test supported.
                        Conveyance Self-test supported.
                        Selective Self-test supported.
    SMART capabilities:            (0x0003)    Saves SMART data before entering
                        power-saving mode.
                        Supports SMART auto save timer.
    Error logging capability:        (0x01)    Error logging supported.
                        General Purpose Logging supported.
    Short self-test routine 
    recommended polling time:      (   1) minutes.
    Extended self-test routine
    recommended polling time:      ( 496) minutes.
    Conveyance self-test routine
    recommended polling time:      (   2) minutes.
    SCT capabilities:            (0x30a5)    SCT Status supported.
                        SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 10
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
      1 Raw_Read_Error_Rate     POSR--   076   064   006    -    42410760
      3 Spin_Up_Time            PO----   097   097   000    -    0
      4 Start_Stop_Count        -O--CK   100   100   020    -    50
      5 Reallocated_Sector_Ct   PO--CK   100   100   010    -    0
      7 Seek_Error_Rate         POSR--   093   060   045    -    1964192642
      9 Power_On_Hours          -O--CK   089   089   000    -    10197 (64 162 0)
     10 Spin_Retry_Count        PO--C-   100   100   097    -    0
     12 Power_Cycle_Count       -O--CK   100   100   020    -    50
    183 Runtime_Bad_Block       -O--CK   100   100   000    -    0
    184 End-to-End_Error        -O--CK   100   100   099    -    0
    187 Reported_Uncorrect      -O--CK   100   100   000    -    0
    188 Command_Timeout         -O--CK   100   096   000    -    0 114 177
    189 High_Fly_Writes         -O-RCK   100   100   000    -    0
    190 Airflow_Temperature_Cel -O---K   068   054   040    -    32 (Min/Max 29/35)
    191 G-Sense_Error_Rate      -O--CK   100   100   000    -    0
    192 Power-Off_Retract_Count -O--CK   100   100   000    -    390
    193 Load_Cycle_Count        -O--CK   100   100   000    -    493
    194 Temperature_Celsius     -O---K   032   046   000    -    32 (0 19 0 0 0)
    195 Hardware_ECC_Recovered  -O-RC-   076   064   000    -    42410760
    197 Current_Pending_Sector  -O--C-   100   100   000    -    0
    198 Offline_Uncorrectable   ----C-   100   100   000    -    0
    199 UDMA_CRC_Error_Count    -OSRCK   200   200   000    -    0
    240 Head_Flying_Hours       ------   100   253   000    -    10170h+36m+58.864s
    241 Total_LBAs_Written      ------   100   253   000    -    13331335686
    242 Total_LBAs_Read         ------   100   253   000    -    85484145805
                                ||||||_ K auto-keep
                                |||||__ C event count
                                ||||___ R error rate
                                |||____ S speed/performance
                                ||_____ O updated online
                                |______ P prefailure warning
    
    General Purpose Log Directory Version 1
    SMART           Log Directory Version 1 [multi-sector log support]
    Address    Access  R/W   Size  Description
    0x00       GPL,SL  R/O      1  Log Directory
    0x01           SL  R/O      1  Summary SMART error log
    0x02           SL  R/O      5  Comprehensive SMART error log
    0x03       GPL     R/O      5  Ext. Comprehensive SMART error log
    0x04       GPL,SL  R/O      8  Device Statistics log
    0x06           SL  R/O      1  SMART self-test log
    0x07       GPL     R/O      1  Extended self-test log
    0x08       GPL     R/O      2  Power Conditions log
    0x09           SL  R/W      1  Selective self-test log
    0x0c       GPL     R/O   2048  Pending Defects log
    0x10       GPL     R/O      1  NCQ Command Error log
    0x11       GPL     R/O      1  SATA Phy Event Counters log
    0x21       GPL     R/O      1  Write stream error log
    0x22       GPL     R/O      1  Read stream error log
    0x24       GPL     R/O    512  Current Device Internal Status Data log
    0x30       GPL,SL  R/O      9  IDENTIFY DEVICE data log
    0x80-0x9f  GPL,SL  R/W     16  Host vendor specific log
    0xa1       GPL,SL  VS      24  Device vendor specific log
    0xa2       GPL     VS    8160  Device vendor specific log
    0xa6       GPL     VS     192  Device vendor specific log
    0xa8-0xa9  GPL,SL  VS     136  Device vendor specific log
    0xab       GPL     VS       1  Device vendor specific log
    0xb0       GPL     VS    9048  Device vendor specific log
    0xbd       GPL     VS       8  Device vendor specific log
    0xbe-0xbf  GPL     VS   65535  Device vendor specific log
    0xc0       GPL,SL  VS       1  Device vendor specific log
    0xc1       GPL,SL  VS      16  Device vendor specific log
    0xc3       GPL,SL  VS       8  Device vendor specific log
    0xc4       GPL,SL  VS      24  Device vendor specific log
    0xd1       GPL     VS     264  Device vendor specific log
    0xd3       GPL     VS    1920  Device vendor specific log
    0xe0       GPL,SL  R/W      1  SCT Command/Status
    0xe1       GPL,SL  R/W      1  SCT Data Transfer
    
    SMART Extended Comprehensive Error Log Version: 1 (5 sectors)
    No Errors Logged
    
    SMART Extended Self-test Log Version: 1 (1 sectors)
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    
    SCT Status Version:                  3
    SCT Version (vendor specific):       522 (0x020a)
    Device State:                        Active (0)
    Current Temperature:                    32 Celsius
    Power Cycle Min/Max Temperature:     29/35 Celsius
    Lifetime    Min/Max Temperature:     19/46 Celsius
    Under/Over Temperature Limit Count:   0/0
    
    SCT Temperature History Version:     2
    Temperature Sampling Period:         3 minutes
    Temperature Logging Interval:        59 minutes
    Min/Max recommended Temperature:     14/55 Celsius
    Min/Max Temperature Limit:           10/60 Celsius
    Temperature History Size (Index):    128 (41)
    
    Index    Estimated Time   Temperature Celsius
      42    2024-01-25 20:58    33  **************
      43    2024-01-25 21:57    33  **************
      44    2024-01-25 22:56    32  *************
      45    2024-01-25 23:55    33  **************
      46    2024-01-26 00:54    34  ***************
      47    2024-01-26 01:53    34  ***************
      48    2024-01-26 02:52    32  *************
      49    2024-01-26 03:51    33  **************
     ...    ..(  4 skipped).    ..  **************
      54    2024-01-26 08:46    33  **************
      55    2024-01-26 09:45    34  ***************
      56    2024-01-26 10:44    34  ***************
      57    2024-01-26 11:43    33  **************
     ...    ..(  7 skipped).    ..  **************
      65    2024-01-26 19:35    33  **************
      66    2024-01-26 20:34    34  ***************
     ...    ..(  3 skipped).    ..  ***************
      70    2024-01-27 00:30    34  ***************
      71    2024-01-27 01:29    33  **************
      72    2024-01-27 02:28    33  **************
      73    2024-01-27 03:27    32  *************
      74    2024-01-27 04:26    32  *************
      75    2024-01-27 05:25    33  **************
      76    2024-01-27 06:24    33  **************
      77    2024-01-27 07:23    33  **************
      78    2024-01-27 08:22    32  *************
      79    2024-01-27 09:21    32  *************
      80    2024-01-27 10:20    33  **************
      81    2024-01-27 11:19    32  *************
     ...    ..(  4 skipped).    ..  *************
      86    2024-01-27 16:14    32  *************
      87    2024-01-27 17:13    33  **************
      88    2024-01-27 18:12    32  *************
     ...    ..(  2 skipped).    ..  *************
      91    2024-01-27 21:09    32  *************
      92    2024-01-27 22:08    31  ************
      93    2024-01-27 23:07    32  *************
     ...    ..( 18 skipped).    ..  *************
     112    2024-01-28 17:48    32  *************
     113    2024-01-28 18:47    33  **************
     114    2024-01-28 19:46    33  **************
     115    2024-01-28 20:45    33  **************
     116    2024-01-28 21:44    32  *************
     ...    ..(  2 skipped).    ..  *************
     119    2024-01-29 00:41    32  *************
     120    2024-01-29 01:40    33  **************
     ...    ..(  6 skipped).    ..  **************
     127    2024-01-29 08:33    33  **************
       0    2024-01-29 09:32    34  ***************
       1    2024-01-29 10:31    34  ***************
       2    2024-01-29 11:30    34  ***************
       3    2024-01-29 12:29    33  **************
     ...    ..(  9 skipped).    ..  **************
      13    2024-01-29 22:19    33  **************
      14    2024-01-29 23:18    34  ***************
      15    2024-01-30 00:17    33  **************
      16    2024-01-30 01:16    34  ***************
      17    2024-01-30 02:15    33  **************
     ...    ..(  5 skipped).    ..  **************
      23    2024-01-30 08:09    33  **************
      24    2024-01-30 09:08    34  ***************
     ...    ..(  2 skipped).    ..  ***************
      27    2024-01-30 12:05    34  ***************
      28    2024-01-30 13:04    33  **************
     ...    ..( 12 skipped).    ..  **************
      41    2024-01-31 01:51    33  **************
    
    SCT Error Recovery Control command not supported
    
    Device Statistics (GP Log 0x04)
    Page  Offset Size        Value Flags Description
    0x01  =====  =               =  ===  == General Statistics (rev 1) ==
    0x01  0x008  4              50  ---  Lifetime Power-On Resets
    0x01  0x010  4           10197  ---  Power-on Hours
    0x01  0x018  6     13330824526  ---  Logical Sectors Written
    0x01  0x020  6       385929805  ---  Number of Write Commands
    0x01  0x028  6     85484126939  ---  Logical Sectors Read
    0x01  0x030  6       996034335  ---  Number of Read Commands
    0x01  0x038  6               -  ---  Date and Time TimeStamp
    0x03  =====  =               =  ===  == Rotating Media Statistics (rev 1) ==
    0x03  0x008  4           10192  ---  Spindle Motor Power-on Hours
    0x03  0x010  4           10191  ---  Head Flying Hours
    0x03  0x018  4             493  ---  Head Load Events
    0x03  0x020  4               0  ---  Number of Reallocated Logical Sectors
    0x03  0x028  4               0  ---  Read Recovery Attempts
    0x03  0x030  4               0  ---  Number of Mechanical Start Failures
    0x03  0x038  4               0  ---  Number of Realloc. Candidate Logical Sectors
    0x03  0x040  4             390  ---  Number of High Priority Unload Events
    0x04  =====  =               =  ===  == General Errors Statistics (rev 1) ==
    0x04  0x008  4               0  ---  Number of Reported Uncorrectable Errors
    0x04  0x010  4             176  ---  Resets Between Cmd Acceptance and Completion
    0x05  =====  =               =  ===  == Temperature Statistics (rev 1) ==
    0x05  0x008  1              32  ---  Current Temperature
    0x05  0x010  1              33  ---  Average Short Term Temperature
    0x05  0x018  1              32  ---  Average Long Term Temperature
    0x05  0x020  1              46  ---  Highest Temperature
    0x05  0x028  1               0  ---  Lowest Temperature
    0x05  0x030  1              44  ---  Highest Average Short Term Temperature
    0x05  0x038  1              30  ---  Lowest Average Short Term Temperature
    0x05  0x040  1              40  ---  Highest Average Long Term Temperature
    0x05  0x048  1              32  ---  Lowest Average Long Term Temperature
    0x05  0x050  4               0  ---  Time in Over-Temperature
    0x05  0x058  1              60  ---  Specified Maximum Operating Temperature
    0x05  0x060  4               0  ---  Time in Under-Temperature
    0x05  0x068  1               0  ---  Specified Minimum Operating Temperature
    0x06  =====  =               =  ===  == Transport Statistics (rev 1) ==
    0x06  0x008  4             430  ---  Number of Hardware Resets
    0x06  0x010  4              74  ---  Number of ASR Events
    0x06  0x018  4               0  ---  Number of Interface CRC Errors
                                    |||_ C monitored condition met
                                    ||__ D supports DSN
                                    |___ N normalized value
    
    SATA Phy Event Counters (GP Log 0x11)
    ID      Size     Value  Description
    0x000a  2           28  Device-to-host register FISes sent due to a COMRESET
    0x0001  2            0  Command failed due to ICRC error
    0x0003  2            0  R_ERR response for device-to-host data FIS
    0x0004  2            0  R_ERR response for host-to-device data FIS
    0x0006  2            0  R_ERR response for device-to-host non-data FIS
    0x0007  2            0  R_ERR response for host-to-device non-data FIS
    My pool currently contains about 11.6TB. I have a 14TB drive I use to backup the essential parts of this pool.

    So you are saying since the SMART checks out, this is pool fragmentation? I can certainly backup and replace the data. Yes it will take a while, but I don't really care too much as long as it solves the problem. The pool is quite a few years old - but if pool fragmentation is responsible, it surprises me... is 2% really doing this?

    Code:
    NAME        SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    Tank         29T  16.3T  12.7T        -         -     2%    56%  1.00x    ONLINE  -
    (Sorry I could be totally missing the point here - it's 2.30am....)
    Last edited by tkae-lp; January 31st, 2024 at 03:38 AM.

  5. #115
    Join Date
    Mar 2010
    Location
    USA
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: Seemingly sporadic slow ZFS IO since 22.04

    How long was it faulted before you changed in the new drive?

    If a drive is faulted, then it stops adding new to that drive, but continues to write to the others, causing an imbalance between the drives. This in ZFS, also can happen if you add a new drive to a normal pool. The old drives have more on it already, as the new drive starts adding data. The algorithms of writing data try to balance the written data out, but that drive will be there with less data on it. Also if you change an option, like compression or encryption, the old written data, will be what the older option was when it was written, whereas anything after that change will be written with the new option(s). Some people also do this because of fragmentation, though I have not had a big enough problem with fragmentation. Now that they added the ability to add a drive to an existing array, some thing between old and new data.

    The common way to correct that in all those circumstances is to rewrite the data to the pool so it gets re-balanced and/or written with the newer options. the reason this works, is that it writes the data back to all the drives in one fell swoop.

    Here is an explanation: https://serverfault.com/a/859223

    On the TrueNAS Forums, I did read mention (https://www.truenas.com/community/th...38/post-689468) that some people used this to do ZFS rebalancing: https://github.com/markusressel/zfs-inplace-rebalancing ... I have not tried that script. I usually just send/receive the data back and forth. Or if i am changing an option that only changes on a pool creation with that differing option, then I rsync a backup to a drive, destroy the pool, recreate it with the new options, then rsync it back into the new pool/datasets.
    Last edited by MAFoElffen; January 31st, 2024 at 05:11 AM.

    "Concurrent coexistence of Windows, Linux and UNIX..." || Ubuntu user # 33563, Linux user # 533637
    Sticky: Graphics Resolution | UbuntuForums 'system-info' Script | Posting Guidelines | Code Tags

  6. #116
    Join Date
    Nov 2023
    Beans
    76

    Re: Seemingly sporadic slow ZFS IO since 22.04

    The drive in question (sdf) was only faulted for a few hours before I noticed and shutdown the server. I didn't have a drive to hand so I ordered one, then powered on when it arrived and resilvered. In the history of the pool, only 2 drives have been replaced. I did the same for both, the server was shutdown to avoid unnecessary use. In terms of when it was replaced... well over a year ago. And definitely months before the speed problem occurred. Compression has been lz4 throughout the history of the pool.

    I agree that this does seem like the next logical step... so next I will:

    1. Backup
    2. Destroy
    3. Recreate
    4. Restore

    Give me a few days (or more, I am busy this week) and I will report back when completed. I will then start monitoring again.

    However (as a side note): if this imbalance is the issue, this seems like a pretty big flaw in ZFS - one would think the resilvering process should take care of this? If this is the case, I must say, I'm a little disappointed. I am very surprised at the possibility that 2 drive replacements could require this action.... It doesn't seem that intelligent for such an amazing FS. IMHO, OpenZFS [moving forward] should have a "re-balance and defrag" option... preferably one that can be added/chained onto a scrub! Just my 2p..

  7. #117
    Join Date
    Mar 2010
    Location
    USA
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: Seemingly sporadic slow ZFS IO since 22.04

    To give you a peek at some of that going on...

    If I do this:
    Code:
    mafoelffen@msi-ubuntu:~$ sudo zpool list -v
    [sudo] password for mafoelffen: 
    NAME                                                   SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    bpool                                                 2.75G   707M  2.06G        -         -     2%    25%  1.00x    ONLINE  -
      nvme-Samsung_SSD_990_PRO_2TB_S73WNJ0W310240K-part2  2.75G   707M  2.06G        -         -     2%  25.1%      -    ONLINE
    dpool                                                  464G  2.10M   464G        -         -     0%     0%  1.00x    ONLINE  -
      nvme-PCIe_SSD_23011650000183-part1                   464G  2.10M   464G        -         -     0%  0.00%      -    ONLINE
    hpool                                                  358G  2.75G   355G        -         -     0%     0%  1.00x    ONLINE  -
      nvme-Samsung_SSD_990_PRO_2TB_S73WNJ0W310240K-part5   358G  2.75G   355G        -         -     0%  0.76%      -    ONLINE
    kpool                                                  928G   429G   499G        -     1.36T     1%    46%  1.00x    ONLINE  -
      nvme-Samsung_SSD_990_PRO_2TB_S73WNJ0W310240K-part6   464G   329G   135G        -         -     1%  71.0%      -    ONLINE
      nvme-Samsung_SSD_990_PRO_2TB_S73WNJ0W618194B-part1   464G  99.9G   364G        -     1.36T     2%  21.5%      -    ONLINE
    rpool                                                  748G  22.5G   726G        -         -     3%     3%  1.00x    ONLINE  -
      nvme-Samsung_SSD_990_PRO_2TB_S73WNJ0W310240K-part3   748G  22.5G   726G        -         -     3%  3.00%      -    ONLINE
    I can see how much is written to each disk in a pool. If you look at the kpool, you can see that one disk was added after another so the older disk has more written to it.

    But if I do this:
    Code:
    mafoelffen@Mikes-B460M:~$ sudo zpool list -v
    [sudo] password for mafoelffen: 
    NAME                                                    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    bpool                                                  1.88G   305M  1.58G        -         -     0%    15%  1.00x    ONLINE  -
      0196ab45-3ee2-894e-b725-39560409109f                 1.88G   305M  1.58G        -         -     0%  15.9%      -    ONLINE
    datapool                                               9.09T  2.19T  6.90T        -         -     4%    24%  1.00x    ONLINE  -
      raidz2-0                                             9.09T  2.19T  6.90T        -         -     4%  24.1%      -    ONLINE
        ata-Samsung_SSD_870_EVO_2TB_S6PNNM0TA09560A-part1      -      -      -        -         -      -      -      -    ONLINE
        ata-Samsung_SSD_870_EVO_2TB_S6PNNM0TA11601H-part1      -      -      -        -         -      -      -      -    ONLINE
        ata-Samsung_SSD_870_EVO_2TB_S6PNNM0TA47393M-part1      -      -      -        -         -      -      -      -    ONLINE
        ata-Samsung_SSD_870_EVO_2TB_S6PNNS0W330507J-part1      -      -      -        -         -      -      -      -    ONLINE
        ata-Samsung_SSD_870_EVO_2TB_S6PNNM0TB08933B-part1      -      -      -        -         -      -      -      -    ONLINE
    logs                                                       -      -      -        -         -      -      -      -  -
      nvme-Samsung_SSD_970_EVO_2TB_S464NB0KB10521K-part2   9.50G   328K  9.50G        -         -     0%  0.00%      -    ONLINE
    cache                                                      -      -      -        -         -      -      -      -  -
      nvme-Samsung_SSD_970_EVO_2TB_S464NB0KB10521K-part1   1.25T  1.21G  1.25T        -         -     0%  0.09%      -    ONLINE
    kpool                                                  7.27T  3.40T  3.86T        -         -     0%    46%  1.00x    ONLINE  -
      raidz2-0                                             7.27T  3.40T  3.86T        -         -     0%  46.8%      -    ONLINE
        nvme-eui.0025385a2140bd61-part1                        -      -      -        -         -      -      -      -    ONLINE
        nvme-eui.0025385a21418769-part1                        -      -      -        -         -      -      -      -    ONLINE
        nvme-eui.0025385a2141f4fc-part1                        -      -      -        -         -      -      -      -    ONLINE
        nvme-eui.0025385b21407ef0-part1                        -      -      -        -         -      -      -      -    ONLINE
    rpool                                                   920G  19.4G   901G        -         -     1%     2%  1.00x    ONLINE  -
      ae13efaa-d1cf-ec40-86a6-2883f0e07102                  920G  19.4G   901G        -         -     1%  2.11%      -    ONLINE
    You can see that it doesn't show how much is written to disks or vdev or a RAIDZ array. It it's all a guess for those.

    "Concurrent coexistence of Windows, Linux and UNIX..." || Ubuntu user # 33563, Linux user # 533637
    Sticky: Graphics Resolution | UbuntuForums 'system-info' Script | Posting Guidelines | Code Tags

  8. #118
    Join Date
    Nov 2023
    Beans
    76

    Re: Seemingly sporadic slow ZFS IO since 22.04

    But on that logic (older disk has more written to it) My case is the opposite - the disk that's flagging on my end is newer.

    My output shows very little:

    Code:
    NAME                                                                 SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
    Tank                                                                  29T  16.3T  12.7T        -         -     2%    56%  1.00x    ONLINE  -
      raidz2-0                                                            29T  16.3T  12.7T        -         -     2%  56.1%      -    ONLINE
        ata-ST4000DM000-1F2168_S300XXXX                                     -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM004-2CV104_ZTT4XXXX   <--- SDF                          -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM004-2CV104_ZTT4XXXX                                     -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM000-1F2168_W300XXXX                                     -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM000-1F2168_W300XXXX                                     -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM000-1F2168_W300XXXX                                     -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM000-1F2168_W300XXXX                                     -      -      -        -         -      -      -      -    ONLINE
        ata-ST4000DM000-1F2168_W300XXXX                                     -      -      -        -         -      -      -      -    ONLINE
    logs                                                                    -      -      -        -         -      -      -      -  -
      nvme-Samsung_SSD_980_PRO_with_Heatsink_2TB_S6WRNS0XXXXXXXX-part2   928G   164K   928G        -         -     0%  0.00%      -    ONLINE
    cache                                                                   -      -      -        -         -      -      -      -  -
      nvme0n1p1                                                          931G   928G  3.31G        -         -     0%  99.6%      -    ONLINE
    The original drives were Seagate ST4000DM000s, the 2 replacements are ST4000DM004s

    I still think it's crazy that the resilvering process doesn't do more to correct this sort of thing, or that there is no 're-balance' option ;D

    In any case, I'll report back as soon as I have completed the backup/restore

  9. #119
    Join Date
    Nov 2023
    Beans
    76

    Re: Seemingly sporadic slow ZFS IO since 22.04

    It's looking like I might have the time I need to do the backup/restore tomorrow. Quick question before I do: Is it that I am literally just copying off then back onto the array or should I destroy then recreate it in the process?

  10. #120
    Join Date
    Mar 2010
    Location
    USA
    Beans
    Hidden!
    Distro
    Ubuntu Development Release

    Re: Seemingly sporadic slow ZFS IO since 22.04

    It needs to write as new so if not destroying the pool, then destroying the filesystem dataset within the pool would get rid of the data written to it. Then you would not have to recreate the pool itself, right? That would save, what, one to two steps of many? I usually use rsync for that...

    Or... if you do a recursive snapshot of the pool (-r), then send a full stream (-R)... Check to ensure it received a full stream... Then send/receive it back... It should overwrite the old data.
    Last edited by MAFoElffen; February 5th, 2024 at 10:33 AM.

    "Concurrent coexistence of Windows, Linux and UNIX..." || Ubuntu user # 33563, Linux user # 533637
    Sticky: Graphics Resolution | UbuntuForums 'system-info' Script | Posting Guidelines | Code Tags

Page 12 of 14 FirstFirst ... 21011121314 LastLast

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •