Since more than a year ago I'm occasionally plagued by a hanging *buntu. Everytime this has happened the problem could be related to "FPDMA QUEUED" errors in the kernel log. Like in this bug report https://bugs.launchpad.net/ubuntu/+s...ux/+bug/550559 As this problem is really really annoying (makes your computer unreliable) I want to take a better look and determine if its the drive or just a linux kernel thing.
I have run multiple SMART tests the past year and everytime the disk passed the test. The resent extended (more than 2 hours) test does present read errors but no pending sectors. Also the test could be completed. Furthermore I booted the Ultimate Boot CD and the Western Digital diagnostics tool found no errors on a full scan.
Another weird thing is Windows in this regard. I have had Vista running on this machine for 2 years without any apparent HDD errors (at least no 'show stoppers' for the OS). Presently I have a Windows 7 / Kubuntu dual boot and again never a sign of hard drive trouble in W7.
This is (hard) driving me insane, can anyone shed a light on this situation, what should I do?
Below I have provided some relevant lines from /var/log/kern.log and the results from a SMART test performed today.
This is one of the caught situations where my *buntu install hangs:
Code:
Sep 30 17:30:33 mycomputer kernel: [31175.824062] ata1.00: exception Emask 0x0 SAct 0x7 SErr 0x0 action 0x6 frozen
Sep 30 17:30:33 mycomputer kernel: [31175.824074] ata1.00: failed command: READ FPDMA QUEUED
Sep 30 17:30:33 mycomputer kernel: [31175.824086] ata1.00: cmd 60/40:00:00:15:7e/00:00:38:00:00/40 tag 0 ncq 32768 in
Sep 30 17:30:33 mycomputer kernel: [31175.824089] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 30 17:30:33 mycomputer kernel: [31175.824095] ata1.00: status: { DRDY }
Sep 30 17:30:33 mycomputer kernel: [31175.824100] ata1.00: failed command: WRITE FPDMA QUEUED
Sep 30 17:30:33 mycomputer kernel: [31175.824110] ata1.00: cmd 61/20:08:18:30:f9/00:00:34:00:00/40 tag 1 ncq 16384 out
Sep 30 17:30:33 mycomputer kernel: [31175.824113] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 30 17:30:33 mycomputer kernel: [31175.824119] ata1.00: status: { DRDY }
Sep 30 17:30:33 mycomputer kernel: [31175.824124] ata1.00: failed command: WRITE FPDMA QUEUED
Sep 30 17:30:33 mycomputer kernel: [31175.824133] ata1.00: cmd 61/18:10:20:1b:0b/00:00:32:00:00/40 tag 2 ncq 12288 out
Sep 30 17:30:33 mycomputer kernel: [31175.824136] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 30 17:30:33 mycomputer kernel: [31175.824140] ata1.00: status: { DRDY }
Sep 30 17:30:33 mycomputer kernel: [31175.824149] ata1: hard resetting link
Sep 30 17:30:34 mycomputer kernel: [31176.316050] ata1: softreset failed (device not ready)
Sep 30 17:30:34 mycomputer kernel: [31176.316059] ata1: applying PMP SRST workaround and retrying
Sep 30 17:30:34 mycomputer kernel: [31176.488053] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 30 17:30:34 mycomputer kernel: [31176.511416] ata1.00: configured for UDMA/133
Sep 30 17:30:34 mycomputer kernel: [31176.524026] ata1.00: device reported invalid CHS sector 0
Sep 30 17:30:34 mycomputer kernel: [31176.524033] ata1.00: device reported invalid CHS sector 0
Sep 30 17:30:34 mycomputer kernel: [31176.524038] ata1.00: device reported invalid CHS sector 0
Sep 30 17:30:34 mycomputer kernel: [31176.524052] ata1: EH complete
Sep 30 17:31:04 mycomputer kernel: [31206.880065] ata1.00: exception Emask 0x0 SAct 0x4 SErr 0x0 action 0x6 frozen
Sep 30 17:31:04 mycomputer kernel: [31206.880185] ata1.00: failed command: READ FPDMA QUEUED
Sep 30 17:31:04 mycomputer kernel: [31206.880267] ata1.00: cmd 60/40:10:00:15:7e/00:00:38:00:00/40 tag 2 ncq 32768 in
Sep 30 17:31:04 mycomputer kernel: [31206.880269] res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Sep 30 17:31:04 mycomputer kernel: [31206.880460] ata1.00: status: { DRDY }
Sep 30 17:31:04 mycomputer kernel: [31206.880520] ata1: hard resetting link
Sep 30 17:31:05 mycomputer kernel: [31207.372037] ata1: softreset failed (device not ready)
Sep 30 17:31:05 mycomputer kernel: [31207.372135] ata1: applying PMP SRST workaround and retrying
Sep 30 17:31:05 mycomputer kernel: [31207.544059] ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Sep 30 17:31:05 mycomputer kernel: [31207.574052] ata1.00: configured for UDMA/133
Sep 30 17:31:05 mycomputer kernel: [31207.588038] ata1.00: device reported invalid CHS sector 0
Sep 30 17:31:05 mycomputer kernel: [31207.588053] ata1: EH complete
Sep 30 17:31:35 mycomputer kernel: [31237.856070] ata1.00: exception Emask 0x0 SAct 0x1 SErr 0x0 action 0x6 frozen
Sep 30 17:31:35 mycomputer kernel: [31237.856191] ata1.00Sep 30 20:57:34 mycomputer kernel: imklog 5.8.6, log source = /proc/kmsg started.
Here are the results from
Code:
sudo smartctl -a /dev/sda
after an extended selftest:
Code:
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.0-31-generic] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green
Device Model: WDC WD5000AACS-00ZUB0
Serial Number: WD-WCASU2847586
LU WWN Device Id: 5 0014ee 2014e7ec9
Firmware Version: 01.01B01
User Capacity: 500,107,862,016 bytes [500 GB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Sat Oct 6 14:59:14 2012 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x84) Offline data collection activity
was suspended by an interrupting command from host.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 0) The previous self-test routine completed
without error or no self-test has ever
been run.
Total time to complete Offline
data collection: (13980) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 163) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x303f) SCT Status supported.
SCT Error Recovery Control supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 198 197 051 Pre-fail Always - 31671
3 Spin_Up_Time 0x0003 167 161 021 Pre-fail Always - 4633
4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3304
5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0
7 Seek_Error_Rate 0x000e 200 200 051 Old_age Always - 0
9 Power_On_Hours 0x0032 079 079 000 Old_age Always - 15556
10 Spin_Retry_Count 0x0012 100 100 051 Old_age Always - 0
11 Calibration_Retry_Count 0x0012 100 100 051 Old_age Always - 0
12 Power_Cycle_Count 0x0032 097 097 000 Old_age Always - 3300
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 134
193 Load_Cycle_Count 0x0032 016 016 000 Old_age Always - 552064
194 Temperature_Celsius 0x0022 112 100 000 Old_age Always - 35
196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0
197 Current_Pending_Sector 0x0012 200 200 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 200 200 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 199 000 Old_age Always - 33
200 Multi_Zone_Error_Rate 0x0008 200 200 051 Old_age Offline - 0
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed without error 00% 15555 -
# 2 Extended offline Completed without error 00% 15455 -
# 3 Short offline Completed without error 00% 15452 -
# 4 Conveyance offline Completed without error 00% 15439 -
# 5 Extended offline Completed without error 00% 15255 -
# 6 Short offline Completed without error 00% 15247 -
# 7 Short offline Completed without error 00% 14841 -
# 8 Extended offline Completed without error 00% 12729 -
# 9 Short offline Completed without error 00% 12649 -
#10 Extended offline Completed without error 00% 12266 -
#11 Short offline Completed without error 00% 12264 -
#12 Short offline Aborted by host 10% 12263 -
#13 Conveyance offline Completed without error 00% 11071 -
#14 Extended offline Completed without error 00% 11070 -
#15 Short offline Completed without error 00% 11016 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Bookmarks