[SOLVED] Help with interpreting SMART data

Help with interpreting SMART data

Howdy,

I have been running 3 Hitachi DeskStar (DeathStar :p) drives in a RAID5 on my home server for a while now and recently added another drive to expand the array. In the end, I had to destroy the array and rebuild it with the new drive, but once that was done, it started "rebuilding" after the management software threw an error: "Array 'RAID5' data is not consistent."

That was on 6/21. Fast forward to today. The rebuilding completed with no errors and I have run a verify on the entire array with all 4 drives. No errors. No errors when accessing files that I am aware of and so far backups are up-to-date and good.

here's the SMART data:

Code:

Drive 1: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 098 098 016 Pre-fail Always - 262145 2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 96 3 Spin_Up_Time 0x0007 159 159 024 Pre-fail Always - 408 (Average 500) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 186 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 20 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 110 110 020 Pre-fail Offline - 40 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 32532 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 186 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 969 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 969 194 Temperature_Celsius 0x0002 162 162 000 Old_age Always - 37 (Min/Max 21/50) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 20 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 Drive 2: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 65536 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 106 3 Spin_Up_Time 0x0007 151 151 024 Pre-fail Always - 436 (Average 522) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 184 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 8 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 32536 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 184 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 950 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 950 194 Temperature_Celsius 0x0002 150 150 000 Old_age Always - 40 (Min/Max 22/50) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 11 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 Drive 3: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 098 098 016 Pre-fail Always - 5 2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 101 3 Spin_Up_Time 0x0007 150 150 024 Pre-fail Always - 439 (Average 524) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 184 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 3 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39 9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 32532 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 184 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 998 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 998 194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 22/54) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 4 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0 Drive 4: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 99 3 Spin_Up_Time 0x0007 169 169 024 Pre-fail Always - 440 (Average 416) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 10 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 121 121 020 Pre-fail Offline - 35 9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 201 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 12 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 12 194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Min/Max 25/44) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

Now, I have read a couple different threads about how these read error and UDMA CRC errors are not necessarily saying the drive is going out, but that the cable might be going bad or is lose. So far I have reseated the cable on all the drives, but I have not replaced them.

Can I get some opinions on this? In the past I have only really been keeping an eye on Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector, and Offline_Uncorrectable because I read those were the important things to keep an eye on on wikipedia (insert lulz here).

Anyway, up until this totally random rebuild on the 21st, I haven't had any issues with the array since I put it into place. It has been moved from my old server running 10.04, to a new server running 12.04 and now Debian 7.0 with no issues. I only started having problems when I added another drive. I have been running a verify every two weeks and those have caught the current bad sectors. I'm now thinking of setting it up to run a verify every week and then to do a smart test before emailing me the results, so I can keep an eye on them.

Here's the main reason for this post: Should I be worried about replacing the two drives with a high raw_rear_error_count, or leave them be as I have yet to run into problems and I have good backups?

I currently have 1 spare 2TB drive that I can replace one of the drives with, but I would need to order another one to get everything "in the green" again. Thoughts?

Thanks in advance.

EDIT: With all that being said, I am seeing similar things on both my OS drive and the external backup drive, but so far it seems like it is a Seagate thing. Any opinions would be welcome.

Code:

OS Drive: === START OF INFORMATION SECTION === Model Family: Seagate Barracuda 7200.12 Device Model: ST3500418AS Serial Number: 9VM6WZ62 LU WWN Device Id: 5 000c50 019f6e395 Firmware Version: CC38 User Capacity: 500,107,862,016 bytes [500 GB] Sector Size: 512 bytes logical/physical ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 143554816 3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 175 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 56698229 9 Power_On_Hours 0x0032 076 076 000 Old_age Always - 21786 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 87 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 096 000 Old_age Always - 157 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 069 056 045 Old_age Always - 31 (Min/Max 28/33) 194 Temperature_Celsius 0x0022 031 044 000 Old_age Always - 31 (0 21 0 0) 195 Hardware_ECC_Recovered 0x001a 034 024 000 Old_age Always - 143554816 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 105398497465792 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 465972698 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 4216579979 External backup drive: === START OF INFORMATION SECTION === Model Family: Seagate Barracuda XT Device Model: ST33000651AS Serial Number: 9XK06WXD LU WWN Device Id: 5 000c50 02d0a77dd Firmware Version: CC43 User Capacity: 3,000,592,982,016 bytes [3.00 TB] Sector Size: 512 bytes logical/physical ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 84433708 3 Spin_Up_Time 0x0003 089 089 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 546 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 18697178 9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 870 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 17 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0 189 High_Fly_Writes 0x003a 095 095 000 Old_age Always - 5 190 Airflow_Temperature_Cel 0x0022 038 028 045 Old_age Always FAILING_NOW 62 (0 47 67 31) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9 193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1027 194 Temperature_Celsius 0x0022 062 072 000 Old_age Always - 62 (0 15 0 0) 195 Hardware_ECC_Recovered 0x001a 019 009 000 Old_age Always - 281470766177068 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 82046760256032 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 4092398426 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3580989180

Re: Help with interpreting SMART data

According to all my reading, SMART data is next to worthless for predicting disk failures. The only SMART data that seems to be of use is when the disk subsystem gets a warning that a HDD is about to fail.

I'm with you on backups and I've pre-emptively replaced very old HDDs in an array in the last 6 months. I asked on my blog how long do people use HDDs? http://blog.jdpfu.com/2012/08/20/how...se-hard-drives

The key that we know is to "replace every HDD just before it fails." Not much help, I know.

Re: Help with interpreting SMART data

Thanks for the input. That's pretty much what I've come to the conclusion on, too. Most of the stuff I have seen has been "Well, the drive could be failing, but have you tried it in another machine?"

So far the array is working well. The verify completed with no problems and from what I can tell, there has been no read errors. The verify would have picked up on that wouldn't it?

I'm thinking of just running with the drives I have now until SMART shows a "FAIL." That might not be the smartest thing to do, but it's a home server and I have it set up to backup via CrashPlan every 15 minutes and backup to external media daily, so I wouldn't be losing data in the event more than one drive goes out.

Do you have any experience with those new WD Red drives? I'm thinking if/when I get around to replacing the disks, I'll replace the RAID card and/or start using mdadm at the same time.

Re: Help with interpreting SMART data

Quote:

Originally Posted by CharlesA

Do you have any experience with those new WD Red drives? I'm thinking if/when I get around to replacing the disks, I'll replace the RAID card and/or start using mdadm at the same time.

I **knew** you'd have a solid backup method.

Yes, my old array holds 4 drives - 2 are WD-Red drives. Added those about 2-3 months ago. As far as I can tell, there hasn't been any difference between those or the Seagates mentioned in the blog article. In theory, non-Black and non-Red HDDs should have issues when put into a RAID set. I'm not seeing that, but time will prove ... at least for my environment.

Code:

md2 : active raid1 sdf2[1] sde2[0] 1338985536 blocks [2/2] [UU] md1 : active raid1 sdc3[0] sdd3[1] 1943010816 blocks super 1.2 [2/2] [UU]

I do weekly mdadm verify runs on each array just after backups, but still very early in the morning. So far, nothing bad has been reported.

I haven't run disk performance tests with these new drives, but I did tune the read-ahead values for both the individual disks AND the array after testing. Also went with a larger array stripe (256) that my old-school docs suggested (64/128). The box running this storage is a Core i5 with plenty of RAM and multiple GigE NICs.

BTW, I was burned by a RAID card a few years ago and couldn't get a replacement for anything less than 4x the cost. Switched to software-RAID and never looked back. I've moved the disk array between 3 different systems, never lost any data. Performance is not great, but I can't tell where the slowdown is. It is more than fast enough for my long running batch jobs.

Re: Help with interpreting SMART data

Quote:

Originally Posted by TheFu

I **knew** you'd have a solid backup method.

Guess I am a bit insane when it comes to making sure I have backups of my data. I've lost data a few times before, mostly from bad CDs and ZIP disks and *gasp* floppy disks but so far I've had no issues when I moved to using HDD to back stuff up.

Quote:

Also went with a larger array stripe (256) that my old-school docs suggested (64/128). The box running this storage is a Core i5 with plenty of RAM and multiple GigE NICs.

What are the benefits of using a larger stripe? My array defaulted to a 64K stripe (I think it's the stripe... the management software calls it "block size")

Quote:

BTW, I was burned by a RAID card a few years ago and couldn't get a replacement for anything less than 4x the cost. Switched to software-RAID and never looked back. I've moved the disk array between 3 different systems, never lost any data. Performance is not great, but I can't tell where the slowdown is. It is more than fast enough for my long running batch jobs.

That is one of my worries as well, but not a huge one as I am totally anal about backups. Of course that might also be cuz I'm running a super cheap RAID card and I could just restore from backups if the card went out and a replacement didn't see the array.

I'm really considering software RAID, even though I originally went for hardware RAID cuz it was supposedly faster, but so far it has been a pain in the butt. The card I have (RocketRaid 2640x1) needs a proprietary drivers and I've had issues with it compiling correctly via DKMS. The management software is pretty basic and doesn't give you as much info from SMART as I would like, but it works. The funny thing is I didn't want to use mdadm because it looked super complex to me but that was also 3-4 years ago and I've learned a lot since then. :lolflag:

Re: Help with interpreting SMART data

Quote:

Originally Posted by CharlesA

What are the benefits of using a larger stripe? My array defaulted to a 64K stripe (I think it's the stripe... the management software calls it "block size")

64k chunk size is fine. That is a good value, and is the default for a reason. Going with a larger chunk size can be beneficial if you are dealing primarily with large, sequential reads.

Also, I wouldn't say SMART data is completely worthless. If you setup smartmontools to monitor and test your disks periodically (with email alerts setup), they can be a decent early warning system. If values start increasing, this can be an indicator of a pre-fail. SMART data will never give you a definitive alert like, "This disk is going to fail tomorrow", but if you monitor your disks, you can often see a failing disk (and replace it) prior to it actually failing.

Re: Help with interpreting SMART data

I was just looking at your blog post on mdadm... :)

I totally agree with you there. So far I have had a script setup to check the SMART data every week on Sunday morning and having the software do a verify every two weeks and haven't run into any issues. I still have the full smart logs instead of the snipped down logs I get via email, so I can look them over if I feel like it.

Thanks for helping clear up some of my confusion.

EDIT: I looked thru my logs from 6/3/12 and found that RAID#1 was sitting at a Raw_Read_Error_Rate of 26144:

Code:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 098 098 016 Pre-fail Always - 262144 2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 97 3 Spin_Up_Time 0x0007 141 141 024 Pre-fail Always - 477 (Average 545) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 174 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 110 110 020 Pre-fail Offline - 40 9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 23255 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 174 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 884 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 884 194 Temperature_Celsius 0x0002 187 187 000 Old_age Always - 32 (Min/Max 21/50) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

RAID#2:

Code:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0 2 Throughput_Performance 0x0005 131 131 054 Pre-fail Offline - 107 3 Spin_Up_Time 0x0007 151 151 024 Pre-fail Always - 395 (Average 560) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 172 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 8 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39 9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 23258 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 172 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 872 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 872 194 Temperature_Celsius 0x0002 181 181 000 Old_age Always - 33 (Min/Max 22/50) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 11 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

Raid#3:

Code:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 094 094 016 Pre-fail Always - 65555 2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 103 3 Spin_Up_Time 0x0007 151 151 024 Pre-fail Always - 395 (Average 563) 4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 172 5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 3 7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 107 107 020 Pre-fail Offline - 41 9 Power_On_Hours 0x0012 097 097 000 Old_age Always - 23255 10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 172 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 901 193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 901 194 Temperature_Celsius 0x0002 187 187 000 Old_age Always - 32 (Min/Max 22/54) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 3 197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0

RAID #3 makes me go lolwut.

Re: Help with interpreting SMART data

I hope that mdadm article helps you out. I love mdadm and still use it at work everyday, but for my home use, I've actually switched over to SnapRAID for my media storage and ZFS (raidz2) for all my really important stuff. I still backup to my collocated fileserver and to Amazon Glacier for my irreplaceable stuff (pictures, home movie, and documents).

Also, my Seagate drives have always had ridiculously high Raw_Read_Error_Rates as well. Here are two of my Seagate drives. Both of these drives are healthy and work great.

Code:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 114 097 006 Pre-fail Always - 73247840 3 Spin_Up_Time 0x0003 093 074 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 098 098 020 Old_age Always - 2190 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 058 056 030 Pre-fail Always - 283516086817 9 Power_On_Hours 0x0032 080 080 000 Old_age Always - 17969 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 95 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 001 001 000 Old_age Always - 477 188 Command_Timeout 0x0032 100 092 000 Old_age Always - 4295032858 189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0 190 Airflow_Temperature_Cel 0x0022 077 062 045 Old_age Always - 23 (Min/Max 21/31) 191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 81 193 Load_Cycle_Count 0x0032 099 099 000 Old_age Always - 3499 194 Temperature_Celsius 0x0022 023 040 000 Old_age Always - 23 (0 16 0 0) 195 Hardware_ECC_Recovered 0x001a 024 007 000 Old_age Always - 73247840 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 216603790677821 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 494625907 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 491619127

Code:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 118 099 006 Pre-fail Always - 194365396 3 Spin_Up_Time 0x0003 100 100 000 Pre-fail Always - 0 4 Start_Stop_Count 0x0032 097 097 020 Old_age Always - 3338 5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0 7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 8621789498 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15641 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 58 183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0 184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0 187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0 188 Command_Timeout 0x0032 100 086 000 Old_age Always - 68720787483 189 High_Fly_Writes 0x003a 095 095 000 Old_age Always - 5 190 Airflow_Temperature_Cel 0x0022 071 060 045 Old_age Always - 29 (Min/Max 20/33) 194 Temperature_Celsius 0x0022 029 040 000 Old_age Always - 29 (0 15 0 0) 195 Hardware_ECC_Recovered 0x001a 040 022 000 Old_age Always - 194365396 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0 240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 240951960288185 241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 1527605445 242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3684894397

Re: Help with interpreting SMART data

SnapRAID is actually pretty cool. It looks very similar to the setup for mdadm. Do you use it for storage only or for other stuff too? I'm currently using my array to handle storage for my VMs via KVM and OpenVZ.

I wonder if it could work with LVM. At first glance, I doubt it because the drives are pooled, but maybe.

Good to know my drives aren't the only ones that are kinda screwy. You would think there would be some standard for SMART.. but nope.

EDIT: The reason I am asking is because I am currently running Debian Wheezy with Proxmox on top of it on my server. LVM snapshots for backups sound handy.

Re: Help with interpreting SMART data

You could use SnapRAID for VMs, but that's not what it's designed for. SnapRAID is primarily for storage that doesn't change that often (movies, tv shows, pictures, etc.). Also, SnapRAID doesn't pool by itself, it's an option, but I don't cover it in my tutorial. I use AUFS to pool the disks.

I also use Proxmox (at home, work, and for sites I host for others at the datacenter). I use my ZFS array with an ISCSI share to Proxmox at home and at the datacenter, I use hardware LSI RAID cards for my (4) Proxmox hosts and then backup to a (10) drive mdadm RAID6 array. Both of these methods have worked very well for me.