Howdy,
I have been running 3 Hitachi DeskStar (DeathStar ) drives in a RAID5 on my home server for a while now and recently added another drive to expand the array. In the end, I had to destroy the array and rebuild it with the new drive, but once that was done, it started "rebuilding" after the management software threw an error: "Array 'RAID5' data is not consistent."
That was on 6/21. Fast forward to today. The rebuilding completed with no errors and I have run a verify on the entire array with all 4 drives. No errors. No errors when accessing files that I am aware of and so far backups are up-to-date and good.
here's the SMART data:
Code:
Drive 1:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 098 098 016 Pre-fail Always - 262145
2 Throughput_Performance 0x0005 134 134 054 Pre-fail Offline - 96
3 Spin_Up_Time 0x0007 159 159 024 Pre-fail Always - 408 (Average 500)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 186
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 20
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 110 110 020 Pre-fail Offline - 40
9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 32532
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 186
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 969
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 969
194 Temperature_Celsius 0x0002 162 162 000 Old_age Always - 37 (Min/Max 21/50)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 20
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Drive 2:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 65536
2 Throughput_Performance 0x0005 132 132 054 Pre-fail Offline - 106
3 Spin_Up_Time 0x0007 151 151 024 Pre-fail Always - 436 (Average 522)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 184
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 8
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39
9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 32536
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 184
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 950
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 950
194 Temperature_Celsius 0x0002 150 150 000 Old_age Always - 40 (Min/Max 22/50)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 11
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Drive 3:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 098 098 016 Pre-fail Always - 5
2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 101
3 Spin_Up_Time 0x0007 150 150 024 Pre-fail Always - 439 (Average 524)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 184
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 3
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 112 112 020 Pre-fail Offline - 39
9 Power_On_Hours 0x0012 096 096 000 Old_age Always - 32532
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 184
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 998
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 998
194 Temperature_Celsius 0x0002 153 153 000 Old_age Always - 39 (Min/Max 22/54)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 4
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Drive 4:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 100 100 016 Pre-fail Always - 0
2 Throughput_Performance 0x0005 133 133 054 Pre-fail Offline - 99
3 Spin_Up_Time 0x0007 169 169 024 Pre-fail Always - 440 (Average 416)
4 Start_Stop_Count 0x0012 100 100 000 Old_age Always - 10
5 Reallocated_Sector_Ct 0x0033 100 100 005 Pre-fail Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail Always - 0
8 Seek_Time_Performance 0x0005 121 121 020 Pre-fail Offline - 35
9 Power_On_Hours 0x0012 100 100 000 Old_age Always - 201
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 10
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 12
193 Load_Cycle_Count 0x0012 100 100 000 Old_age Always - 12
194 Temperature_Celsius 0x0002 166 166 000 Old_age Always - 36 (Min/Max 25/44)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 0
197 Current_Pending_Sector 0x0022 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age Always - 0
Now, I have read a couple different threads about how these read error and UDMA CRC errors are not necessarily saying the drive is going out, but that the cable might be going bad or is lose. So far I have reseated the cable on all the drives, but I have not replaced them.
Can I get some opinions on this? In the past I have only really been keeping an eye on Reallocated_Sector_Ct, Reallocated_Event_Count, Current_Pending_Sector, and Offline_Uncorrectable because I read those were the important things to keep an eye on on wikipedia (insert lulz here).
Anyway, up until this totally random rebuild on the 21st, I haven't had any issues with the array since I put it into place. It has been moved from my old server running 10.04, to a new server running 12.04 and now Debian 7.0 with no issues. I only started having problems when I added another drive. I have been running a verify every two weeks and those have caught the current bad sectors. I'm now thinking of setting it up to run a verify every week and then to do a smart test before emailing me the results, so I can keep an eye on them.
Here's the main reason for this post: Should I be worried about replacing the two drives with a high raw_rear_error_count, or leave them be as I have yet to run into problems and I have good backups?
I currently have 1 spare 2TB drive that I can replace one of the drives with, but I would need to order another one to get everything "in the green" again. Thoughts?
Thanks in advance.
EDIT: With all that being said, I am seeing similar things on both my OS drive and the external backup drive, but so far it seems like it is a Seagate thing. Any opinions would be welcome.
Code:
OS Drive:
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.12
Device Model: ST3500418AS
Serial Number: 9VM6WZ62
LU WWN Device Id: 5 000c50 019f6e395
Firmware Version: CC38
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 117 099 006 Pre-fail Always - 143554816
3 Spin_Up_Time 0x0003 097 097 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 175
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 077 060 030 Pre-fail Always - 56698229
9 Power_On_Hours 0x0032 076 076 000 Old_age Always - 21786
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 87
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 096 000 Old_age Always - 157
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 069 056 045 Old_age Always - 31 (Min/Max 28/33)
194 Temperature_Celsius 0x0022 031 044 000 Old_age Always - 31 (0 21 0 0)
195 Hardware_ECC_Recovered 0x001a 034 024 000 Old_age Always - 143554816
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 105398497465792
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 465972698
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 4216579979
External backup drive:
=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda XT
Device Model: ST33000651AS
Serial Number: 9XK06WXD
LU WWN Device Id: 5 000c50 02d0a77dd
Firmware Version: CC43
User Capacity: 3,000,592,982,016 bytes [3.00 TB]
Sector Size: 512 bytes logical/physical
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 115 099 006 Pre-fail Always - 84433708
3 Spin_Up_Time 0x0003 089 089 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 546
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 0
7 Seek_Error_Rate 0x000f 072 060 030 Pre-fail Always - 18697178
9 Power_On_Hours 0x0032 100 100 000 Old_age Always - 870
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 17
183 Runtime_Bad_Block 0x0032 100 100 000 Old_age Always - 0
184 End-to-End_Error 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Command_Timeout 0x0032 100 100 000 Old_age Always - 0
189 High_Fly_Writes 0x003a 095 095 000 Old_age Always - 5
190 Airflow_Temperature_Cel 0x0022 038 028 045 Old_age Always FAILING_NOW 62 (0 47 67 31)
191 G-Sense_Error_Rate 0x0032 100 100 000 Old_age Always - 0
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 9
193 Load_Cycle_Count 0x0032 100 100 000 Old_age Always - 1027
194 Temperature_Celsius 0x0022 062 072 000 Old_age Always - 62 (0 15 0 0)
195 Hardware_ECC_Recovered 0x001a 019 009 000 Old_age Always - 281470766177068
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 82046760256032
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 4092398426
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3580989180
Bookmarks