Note to self:
Using hdparm -i /dev/sdb displays HDD information including its serial.
I used this to get the serial of the HDD to match with the physical drive in the case to determine which cable needed to be switched.
Note to self:
Using hdparm -i /dev/sdb displays HDD information including its serial.
I used this to get the serial of the HDD to match with the physical drive in the case to determine which cable needed to be switched.
Last edited by agentofcode; April 4th, 2013 at 02:40 PM.
smartctl -s on -t long /dev/sdb ran on alternate SATA cable and port
smartctl -a /dev/sdb Returns
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Caviar Green (Adv. Format)
Device Model: WDC WD20EARS-00MVWB0
Serial Number: WD-WMAZA1970067
LU WWN Device Id: 5 0014ee 057d8ac55
Firmware Version: 51.0AB51
User Capacity: 2,000,398,934,016 bytes [2.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: 8
ATA Standard is: Exact ATA specification draft version not indicated
Local Time is: Thu Apr 4 07:01:09 2013 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
General SMART Values:
Offline data collection status: (0x82) Offline data collection activity
was completed without error.
Auto Offline Data Collection: Enabled.
Self-test execution status: ( 116) The previous self-test completed having
the read element of the test failed.
Total time to complete Offline
data collection: (39300) seconds.
Offline data collection
capabilities: (0x7b) SMART execute Offline immediate.
Auto Offline data collection on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 255) minutes.
Conveyance self-test routine
recommended polling time: ( 5) minutes.
SCT capabilities: (0x3035) SCT Status supported.
SCT Feature Control supported.
SCT Data Table supported.
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 0
3 Spin_Up_Time 0x0027 253 166 021 Pre-fail Always - 1050
4 Start_Stop_Count 0x0032 098 098 000 Old_age Always - 2312
5 Reallocated_Sector_Ct 0x0033 193 193 140 Pre-fail Always - 144
7 Seek_Error_Rate 0x002e 200 198 000 Old_age Always - 0
9 Power_On_Hours 0x0032 075 075 000 Old_age Always - 18429
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 165
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 102
193 Load_Cycle_Count 0x0032 189 189 000 Old_age Always - 33509
194 Temperature_Celsius 0x0022 121 108 000 Old_age Always - 29
196 Reallocated_Event_Count 0x0032 195 195 000 Old_age Always - 5
197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 27
198 Offline_Uncorrectable 0x0030 200 200 000 Old_age Offline - 19
199 UDMA_CRC_Error_Count 0x0032 200 137 000 Old_age Always - 26837
200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 19
SMART Error Log Version: 1
No Errors Logged
SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 40% 18424 2631941936
# 2 Extended offline Completed: read failure 40% 18375 2631941936
# 3 Extended offline Completed: read failure 40% 18349 2631941936
# 4 Extended offline Interrupted (host reset) 90% 18343 -
# 5 Extended offline Interrupted (host reset) 90% 18343 -
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.
Last edited by agentofcode; April 5th, 2013 at 11:41 PM.
Correct me if I am wrong but assuming I posted the most current log, it appears that sdb is bad. If this is the same version log, then I am unsure on how to select an alternate "SMART Error Log Version". Also I just discovered the dependability issues with using green drives in a RAID. Not enough preliminary research on my part, rookie mistake. Since discovering this, I have gone ahead and ordered replacement drives to build a dependable RAID5.
Now my goal is to get this broken RAID5 back to functional status so I can back up the data prior to setting up the new RAID5 with more dependable HDDs.
Last edited by agentofcode; April 5th, 2013 at 11:45 PM.
Since I determined sdb is bad I decided to try:
mdadm --stop /dev/md0
mdadm --assemble --force /dev/md0 /dev/sd[cd]
to reassemble the RAID with out sdb.
Returns:
Where do I go from here?mdadm: Cannot assemble mbr metadata on /dev/sdc
mdadm: /dev/sdc has no superblock - assembly aborted
Last edited by agentofcode; April 6th, 2013 at 12:15 AM.
You need to assemble with the partitions on those disks like this.
Code:mdadm --assemble --force /dev/md0 /dev/sd[cd]1
Just tried mdadm --assemble --force /dev/md0 /dev/sd[bcd]1
Returns:
Why would sdb1 begin working again? Does the output not state it is bad?mdadm: /dev/md0 has been started with 2 drives (out of 3) and 1 rebuilding.
Last edited by agentofcode; April 6th, 2013 at 12:15 AM.
cat /proc/mdstat
Returns:
UPDATE: 4/5/13, 10:51PM estPersonalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdb1[3] sdd1[2] sdc1[1]
3899417600 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [_UU]
[=>...................] recovery = 7.6% (149796000/1949708800) finish=258.9min speed=115848K/sec
unused devices: <none>
The assembly came back with sdb1[F]. I understand this indicates that the drive failed. So I stopped and started the RAID again with just sdc1 and sdd1 per recommendation.
RAID is now assembled but still not accessible.
Last edited by agentofcode; April 6th, 2013 at 04:54 AM. Reason: added update on assembly of raid
When I start up a terminal I notice a message:
And when I try to access my files via webmin, nothing is in the home directory. Seems as though the RAID5 forgot what directory it was linked up with.Could not chdir to home directory /home/username: No such file or directory
Last edited by agentofcode; April 6th, 2013 at 04:20 AM.
fdisk -l Returns
Disk /dev/sda: 60.0 GB, 60021399040 bytes
255 heads, 63 sectors/track, 7297 cylinders, total 117229295 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0001d60b
Device Boot Start End Blocks Id System
/dev/sda1 * 2048 109893631 54945792 83 Linux
/dev/sda2 109895678 117227519 3665921 5 Extended
/dev/sda5 109895680 117227519 3665920 82 Linux swap / Solaris
Disk /dev/sdc: 2000.4 GB, 2000398934016 bytes
255 heads, 63 sectors/track, 243201 cylinders, total 3907029168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000edfb2
Device Boot Start End Blocks Id System
/dev/sdc1 2048 3899691007 1949844480 fd Linux raid autodetect
/dev/sdc2 3899693054 3907028991 3667969 5 Extended
/dev/sdc5 3899693056 3907028991 3667968 82 Linux swap / Solaris
Disk /dev/sdd: 2000.4 GB, 2000394706432 bytes
255 heads, 63 sectors/track, 243200 cylinders, total 3907020911 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x000740d4
Device Boot Start End Blocks Id System
/dev/sdd1 2048 3899682815 1949840384 fd Linux raid autodetect
/dev/sdd2 3899684862 3907018751 3666945 5 Extended
/dev/sdd5 3899684864 3907018751 3666944 82 Linux swap / Solaris
Disk /dev/md0: 3993.0 GB, 3993003622400 bytes
2 heads, 4 sectors/track, 974854400 cylinders, total 7798835200 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1048576 bytes
Disk identifier: 0x00000000
Disk /dev/md0 doesn't contain a valid partition table
I referred back to the original solved thread I mentioned in my first post and discovered I needed to mount it.
That did the trickCode:mount /dev/md0 /home
I should be good until I get my new drives and can back up my data and rebuild a more dependable RAID, THANKS!
P.S. I intend to follow your APC UPS tutorial as well. Hope to prevent this from ever happening again.
Last edited by agentofcode; April 6th, 2013 at 05:17 AM.
Bookmarks