Help with interpreting SMART data

**CharlesA** · June 25th, 2013

Howdy,

I have been running 3 Hitachi DeskStar (DeathStar

) drives in a RAID5 on my home server for a while now and recently added another drive to expand the array. In the end, I had to destroy the array and rebuild it with the new drive, but once that was done, it started "rebuilding" after the management software threw an error: "Array 'RAID5' data is not consistent."

That was on 6/21. Fast forward to today. The rebuilding completed with no errors and I have run a verify on the entire array with all 4 drives. No errors. No errors when accessing files that I am aware of and so far backups are up-to-date and good.

here's the SMART data:

Code:

	

    Drive 1:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   098   098   016    Pre-fail  Always       -       262145
      2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       96
      3 Spin_Up_Time            0x0007   159   159   024    Pre-fail  Always       -       408 (Average 500)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       186
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       20
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   110   110   020    Pre-fail  Offline      -       40
      9 Power_On_Hours          0x0012   096   096   000    Old_age   Always       -       32532
     10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       186
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       969
    193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       969
    194 Temperature_Celsius     0x0002   162   162   000    Old_age   Always       -       37 (Min/Max 21/50)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       20
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
     
    Drive 2:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       65536
      2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       106
      3 Spin_Up_Time            0x0007   151   151   024    Pre-fail  Always       -       436 (Average 522)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       184
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       8
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   112   112   020    Pre-fail  Offline      -       39
      9 Power_On_Hours          0x0012   096   096   000    Old_age   Always       -       32536
     10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       184
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       950
    193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       950
    194 Temperature_Celsius     0x0002   150   150   000    Old_age   Always       -       40 (Min/Max 22/50)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       11
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
     
    Drive 3:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   098   098   016    Pre-fail  Always       -       5
      2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       101
      3 Spin_Up_Time            0x0007   150   150   024    Pre-fail  Always       -       439 (Average 524)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       184
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       3
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   112   112   020    Pre-fail  Offline      -       39
      9 Power_On_Hours          0x0012   096   096   000    Old_age   Always       -       32532
     10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       184
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       998
    193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       998
    194 Temperature_Celsius     0x0002   153   153   000    Old_age   Always       -       39 (Min/Max 22/54)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       4
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0
     
    Drive 4:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   133   133   054    Pre-fail  Offline      -       99
      3 Spin_Up_Time            0x0007   169   169   024    Pre-fail  Always       -       440 (Average 416)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       10
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   121   121   020    Pre-fail  Offline      -       35
      9 Power_On_Hours          0x0012   100   100   000    Old_age   Always       -       201
     10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       10
    192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       12
    193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       12
    194 Temperature_Celsius     0x0002   166   166   000    Old_age   Always       -       36 (Min/Max 25/44)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

Now, I have read a couple different threads about how these read error and UDMA CRC errors are not necessarily saying the drive is going out, but that the cable might be going bad or is lose. So far I have reseated the cable on all the drives, but I have not replaced them.

Can I get some opinions on this? In the past I have only really been keeping an eye on Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector, and Offline_Uncorrectable because I read those were the important things to keep an eye on on wikipedia (insert lulz here).

Anyway, up until this totally random rebuild on the 21st, I haven't had any issues with the array since I put it into place. It has been moved from my old server running 10.04, to a new server running 12.04 and now Debian 7.0 with no issues. I only started having problems when I added another drive. I have been running a verify every two weeks and those have caught the current bad sectors. I'm now thinking of setting it up to run a verify every week and then to do a smart test before emailing me the results, so I can keep an eye on them.

Here's the main reason for this post: Should I be worried about replacing the two drives with a high raw_rear_error_count, or leave them be as I have yet to run into problems and I have good backups?

I currently have 1 spare 2TB drive that I can replace one of the drives with, but I would need to order another one to get everything "in the green" again. Thoughts?

Thanks in advance.

EDIT: With all that being said, I am seeing similar things on both my OS drive and the external backup drive, but so far it seems like it is a Seagate thing. Any opinions would be welcome.

Code:

OS Drive:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda 7200.12
Device Model:     ST3500418AS
Serial Number:    9VM6WZ62
LU WWN Device Id: 5 000c50 019f6e395
Firmware Version: CC38
User Capacity:    500,107,862,016 bytes [500 GB]
Sector Size:      512 bytes logical/physical
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   117   099   006    Pre-fail  Always       -       143554816
  3 Spin_Up_Time            0x0003   097   097   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       175
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   077   060   030    Pre-fail  Always       -       56698229
  9 Power_On_Hours          0x0032   076   076   000    Old_age   Always       -       21786
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       87
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   096   000    Old_age   Always       -       157
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   069   056   045    Old_age   Always       -       31 (Min/Max 28/33)
194 Temperature_Celsius     0x0022   031   044   000    Old_age   Always       -       31 (0 21 0 0)
195 Hardware_ECC_Recovered  0x001a   034   024   000    Old_age   Always       -       143554816
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       105398497465792
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       465972698
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       4216579979

External backup drive:
=== START OF INFORMATION SECTION ===
Model Family:     Seagate Barracuda XT
Device Model:     ST33000651AS
Serial Number:    9XK06WXD
LU WWN Device Id: 5 000c50 02d0a77dd
Firmware Version: CC43
User Capacity:    3,000,592,982,016 bytes [3.00 TB]
Sector Size:      512 bytes logical/physical
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   115   099   006    Pre-fail  Always       -       84433708
  3 Spin_Up_Time            0x0003   089   089   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   100   100   020    Old_age   Always       -       546
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail  Always       -       18697178
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       -       870
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       17
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   100   000    Old_age   Always       -       0
189 High_Fly_Writes         0x003a   095   095   000    Old_age   Always       -       5
190 Airflow_Temperature_Cel 0x0022   038   028   045    Old_age   Always   FAILING_NOW 62 (0 47 67 31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       9
193 Load_Cycle_Count        0x0032   100   100   000    Old_age   Always       -       1027
194 Temperature_Celsius     0x0022   062   072   000    Old_age   Always       -       62 (0 15 0 0)
195 Hardware_ECC_Recovered  0x001a   019   009   000    Old_age   Always       -       281470766177068
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       82046760256032
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       4092398426
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3580989180

**TheFu** · June 25th, 2013

According to all my reading, SMART data is next to worthless for predicting disk failures. The only SMART data that seems to be of use is when the disk subsystem gets a warning that a HDD is about to fail.

I'm with you on backups and I've pre-emptively replaced very old HDDs in an array in the last 6 months. I asked on my blog how long do people use HDDs? http://blog.jdpfu.com/2012/08/20/how...se-hard-drives

The key that we know is to "replace every HDD just before it fails." Not much help, I know.

**CharlesA** · June 25th, 2013

Thanks for the input. That's pretty much what I've come to the conclusion on, too. Most of the stuff I have seen has been "Well, the drive could be failing, but have you tried it in another machine?"

So far the array is working well. The verify completed with no problems and from what I can tell, there has been no read errors. The verify would have picked up on that wouldn't it?

I'm thinking of just running with the drives I have now until SMART shows a "FAIL." That might not be the smartest thing to do, but it's a home server and I have it set up to backup via CrashPlan every 15 minutes and backup to external media daily, so I wouldn't be losing data in the event more than one drive goes out.

Do you have any experience with those new WD Red drives? I'm thinking if/when I get around to replacing the disks, I'll replace the RAID card and/or start using mdadm at the same time.

**TheFu** · June 25th, 2013

Originally Posted by CharlesA

Do you have any experience with those new WD Red drives? I'm thinking if/when I get around to replacing the disks, I'll replace the RAID card and/or start using mdadm at the same time.

I **knew** you'd have a solid backup method.

Yes, my old array holds 4 drives - 2 are WD-Red drives. Added those about 2-3 months ago. As far as I can tell, there hasn't been any difference between those or the Seagates mentioned in the blog article. In theory, non-Black and non-Red HDDs should have issues when put into a RAID set. I'm not seeing that, but time will prove ... at least for my environment.

Code:

md2 : active raid1 sdf2[1] sde2[0]
      1338985536 blocks [2/2] [UU]
      
md1 : active raid1 sdc3[0] sdd3[1]
      1943010816 blocks super 1.2 [2/2] [UU]

I do weekly mdadm verify runs on each array just after backups, but still very early in the morning. So far, nothing bad has been reported.

I haven't run disk performance tests with these new drives, but I did tune the read-ahead values for both the individual disks AND the array after testing. Also went with a larger array stripe (256) that my old-school docs suggested (64/128). The box running this storage is a Core i5 with plenty of RAM and multiple GigE NICs.

BTW, I was burned by a RAID card a few years ago and couldn't get a replacement for anything less than 4x the cost. Switched to software-RAID and never looked back. I've moved the disk array between 3 different systems, never lost any data. Performance is not great, but I can't tell where the slowdown is. It is more than fast enough for my long running batch jobs.

**CharlesA** · June 25th, 2013

Originally Posted by TheFu

I **knew** you'd have a solid backup method.

Guess I am a bit insane when it comes to making sure I have backups of my data. I've lost data a few times before, mostly from bad CDs and ZIP disks and *gasp* floppy disks but so far I've had no issues when I moved to using HDD to back stuff up.

Also went with a larger array stripe (256) that my old-school docs suggested (64/128). The box running this storage is a Core i5 with plenty of RAM and multiple GigE NICs.

What are the benefits of using a larger stripe? My array defaulted to a 64K stripe (I think it's the stripe... the management software calls it "block size")

BTW, I was burned by a RAID card a few years ago and couldn't get a replacement for anything less than 4x the cost. Switched to software-RAID and never looked back. I've moved the disk array between 3 different systems, never lost any data. Performance is not great, but I can't tell where the slowdown is. It is more than fast enough for my long running batch jobs.

That is one of my worries as well, but not a huge one as I am totally anal about backups. Of course that might also be cuz I'm running a super cheap RAID card and I could just restore from backups if the card went out and a replacement didn't see the array.

I'm really considering software RAID, even though I originally went for hardware RAID cuz it was supposedly faster, but so far it has been a pain in the butt. The card I have (RocketRaid 2640x1) needs a proprietary drivers and I've had issues with it compiling correctly via DKMS. The management software is pretty basic and doesn't give you as much info from SMART as I would like, but it works. The funny thing is I didn't want to use mdadm because it looked super complex to me but that was also 3-4 years ago and I've learned a lot since then.

**rubylaser** · June 25th, 2013

Originally Posted by CharlesA

What are the benefits of using a larger stripe? My array defaulted to a 64K stripe (I think it's the stripe... the management software calls it "block size")

64k chunk size is fine. That is a good value, and is the default for a reason. Going with a larger chunk size can be beneficial if you are dealing primarily with large, sequential reads.

Also, I wouldn't say SMART data is completely worthless. If you setup smartmontools to monitor and test your disks periodically (with email alerts setup), they can be a decent early warning system. If values start increasing, this can be an indicator of a pre-fail. SMART data will never give you a definitive alert like, "This disk is going to fail tomorrow", but if you monitor your disks, you can often see a failing disk (and replace it) prior to it actually failing.

**CharlesA** · June 25th, 2013

I was just looking at your blog post on mdadm...

I totally agree with you there. So far I have had a script setup to check the SMART data every week on Sunday morning and having the software do a verify every two weeks and haven't run into any issues. I still have the full smart logs instead of the snipped down logs I get via email, so I can look them over if I feel like it.

Thanks for helping clear up some of my confusion.

EDIT: I looked thru my logs from 6/3/12 and found that RAID#1 was sitting at a Raw_Read_Error_Rate of 26144:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   098   098   016    Pre-fail  Always       -       262144
  2 Throughput_Performance  0x0005   134   134   054    Pre-fail  Offline      -       97
  3 Spin_Up_Time            0x0007   141   141   024    Pre-fail  Always       -       477 (Average 545)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       174
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   110   110   020    Pre-fail  Offline      -       40
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       23255
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       174
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       884
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       884
194 Temperature_Celsius     0x0002   187   187   000    Old_age   Always       -       32 (Min/Max 21/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

RAID#2:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   100   100   016    Pre-fail  Always       -       0
  2 Throughput_Performance  0x0005   131   131   054    Pre-fail  Offline      -       107
  3 Spin_Up_Time            0x0007   151   151   024    Pre-fail  Always       -       395 (Average 560)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       172
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       8
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   112   112   020    Pre-fail  Offline      -       39
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       23258
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       172
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       872
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       872
194 Temperature_Celsius     0x0002   181   181   000    Old_age   Always       -       33 (Min/Max 22/50)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       11
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

Raid#3:

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000b   094   094   016    Pre-fail  Always       -       65555
  2 Throughput_Performance  0x0005   132   132   054    Pre-fail  Offline      -       103
  3 Spin_Up_Time            0x0007   151   151   024    Pre-fail  Always       -       395 (Average 563)
  4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       172
  5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       3
  7 Seek_Error_Rate         0x000b   100   100   067    Pre-fail  Always       -       0
  8 Seek_Time_Performance   0x0005   107   107   020    Pre-fail  Offline      -       41
  9 Power_On_Hours          0x0012   097   097   000    Old_age   Always       -       23255
 10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       172
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       901
193 Load_Cycle_Count        0x0012   100   100   000    Old_age   Always       -       901
194 Temperature_Celsius     0x0002   187   187   000    Old_age   Always       -       32 (Min/Max 22/54)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3
197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x000a   200   200   000    Old_age   Always       -       0

RAID #3 makes me go lolwut.

**rubylaser** · June 26th, 2013

I hope that mdadm article helps you out. I love mdadm and still use it at work everyday, but for my home use, I've actually switched over to SnapRAID for my media storage and ZFS (raidz2) for all my really important stuff. I still backup to my collocated fileserver and to Amazon Glacier for my irreplaceable stuff (pictures, home movie, and documents).

Also, my Seagate drives have always had ridiculously high Raw_Read_Error_Rates as well. Here are two of my Seagate drives. Both of these drives are healthy and work great.

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   114   097   006    Pre-fail  Always       -       73247840
  3 Spin_Up_Time            0x0003   093   074   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   098   098   020    Old_age   Always       -       2190
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   058   056   030    Pre-fail  Always       -       283516086817
  9 Power_On_Hours          0x0032   080   080   000    Old_age   Always       -       17969
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       95
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   001   001   000    Old_age   Always       -       477
188 Command_Timeout         0x0032   100   092   000    Old_age   Always       -       4295032858
189 High_Fly_Writes         0x003a   100   100   000    Old_age   Always       -       0
190 Airflow_Temperature_Cel 0x0022   077   062   045    Old_age   Always       -       23 (Min/Max 21/31)
191 G-Sense_Error_Rate      0x0032   100   100   000    Old_age   Always       -       0
192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always       -       81
193 Load_Cycle_Count        0x0032   099   099   000    Old_age   Always       -       3499
194 Temperature_Celsius     0x0022   023   040   000    Old_age   Always       -       23 (0 16 0 0)
195 Hardware_ECC_Recovered  0x001a   024   007   000    Old_age   Always       -       73247840
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       216603790677821
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       494625907
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       491619127

Code:

ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x000f   118   099   006    Pre-fail  Always       -       194365396
  3 Spin_Up_Time            0x0003   100   100   000    Pre-fail  Always       -       0
  4 Start_Stop_Count        0x0032   097   097   020    Old_age   Always       -       3338
  5 Reallocated_Sector_Ct   0x0033   100   100   036    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x000f   072   060   030    Pre-fail  Always       -       8621789498
  9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       15641
 10 Spin_Retry_Count        0x0013   100   100   097    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   100   100   020    Old_age   Always       -       58
183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always       -       0
184 End-to-End_Error        0x0032   100   100   099    Old_age   Always       -       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       -       0
188 Command_Timeout         0x0032   100   086   000    Old_age   Always       -       68720787483
189 High_Fly_Writes         0x003a   095   095   000    Old_age   Always       -       5
190 Airflow_Temperature_Cel 0x0022   071   060   045    Old_age   Always       -       29 (Min/Max 20/33)
194 Temperature_Celsius     0x0022   029   040   000    Old_age   Always       -       29 (0 15 0 0)
195 Hardware_ECC_Recovered  0x001a   040   022   000    Old_age   Always       -       194365396
197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       0
240 Head_Flying_Hours       0x0000   100   253   000    Old_age   Offline      -       240951960288185
241 Total_LBAs_Written      0x0000   100   253   000    Old_age   Offline      -       1527605445
242 Total_LBAs_Read         0x0000   100   253   000    Old_age   Offline      -       3684894397

**CharlesA** · June 26th, 2013

SnapRAID is actually pretty cool. It looks very similar to the setup for mdadm. Do you use it for storage only or for other stuff too? I'm currently using my array to handle storage for my VMs via KVM and OpenVZ.

I wonder if it could work with LVM. At first glance, I doubt it because the drives are pooled, but maybe.

Good to know my drives aren't the only ones that are kinda screwy. You would think there would be some standard for SMART.. but nope.

EDIT: The reason I am asking is because I am currently running Debian Wheezy with Proxmox on top of it on my server. LVM snapshots for backups sound handy.

**rubylaser** · June 26th, 2013

You could use SnapRAID for VMs, but that's not what it's designed for. SnapRAID is primarily for storage that doesn't change that often (movies, tv shows, pictures, etc.). Also, SnapRAID doesn't pool by itself, it's an option, but I don't cover it in my tutorial. I use AUFS to pool the disks.

I also use Proxmox (at home, work, and for sites I host for others at the datacenter). I use my ZFS array with an ISCSI share to Proxmox at home and at the datacenter, I use hardware LSI RAID cards for my (4) Proxmox hosts and then backup to a (10) drive mdadm RAID6 array. Both of these methods have worked very well for me.

Thread: Help with interpreting SMART data

Thread Tools

Display

Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Re: Help with interpreting SMART data

Bookmarks

Bookmarks

Posting Permissions