Page 3 of 4 FirstFirst 1234 LastLast
Results 21 to 30 of 33

Thread: RAID failure

  1. #21
    Join Date
    Oct 2009
    Beans
    Hidden!
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    Re: RAID failure

    I don't get it. Have you tried the same drives in a different machine? Mdadm should detect them if you do a scan.
    Come to #ubuntuforums! We have cookies! | Basic Ubuntu Security Guide

    Tomorrow's an illusion and yesterday's a dream, today is a solution...

  2. #22
    Join Date
    Aug 2010
    Beans
    16

    Re: RAID failure

    No, I don't have a spare ubuntu machine to test them in. However I wonder if you are on the right track with a hardware issue. It seems to be the center drive sdc1 that seems to fail first, it has gone down every time the array has crashed and it was the one that died this time.

    Is there an easy way to determine which port of the motherboard sdc is plugged into?

    EDIT:

    So just to keep you updated on the process of the array, stopped, unmounted and tried to reassemble, it started with two disks....not quite what I was hoping for.
    Did a manage md0 --add sdc1 and success, it is now in the process of readding the device to the array.

    Code:
    Every 2.0s: cat /proc/mdstat                                              Mon Feb 10 17:49:44 2014
    
    Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
    md0 : active raid5 sdc1[4] sdb1[0] sdd1[3]
          3906763776 blocks super 1.2 level 5, 512k chunk, algorithm 2 [3/2] [U_U]
          [>....................]  recovery =  1.3% (25940492/1953381888) finish=234.9min speed=136720
    K/sec
    
    unused devices: <none>
    I'll try swapping the cables around once the array is rebuilt and see what happens
    Last edited by knight8524; February 10th, 2014 at 07:51 AM. Reason: more information

  3. #23
    Join Date
    Oct 2009
    Beans
    Hidden!
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    Re: RAID failure

    You could try getting the serial number and comparing it to the physical drive.

    See here:
    http://www.cyberciti.biz/faq/linux-g...k-information/
    Come to #ubuntuforums! We have cookies! | Basic Ubuntu Security Guide

    Tomorrow's an illusion and yesterday's a dream, today is a solution...

  4. #24
    Join Date
    Aug 2010
    Beans
    16

    Re: RAID failure

    Well got the serial number and changed things around....all to no success, unfortunatly the array failed again this morning, badly.

    Both sdc and sdd failed out of the array and no matter what I tried, i was completly unable to rebuild the array.

    I've zeroed everything, and am now followinbg rubylasers setup tutorial to the letter. Fortunatly all important stuff (photos and so forth) are backed up on other machines. Most of the media is gone, but thats an annoyance not a serious problem.

    Rang the people who run the computer store I got the HDD from originally, since it is greater than 7 days, they won't just swap it for a new one, it has to go off to the manufacturer for testing, which given the numbers it is showing will probably say its fine. For the moment I'm rebuilding it with what I have on hand, and we will see if it is any more stable after this.

    Thanks for the help thus far and I'll keep you informed of my progress

  5. #25
    Join Date
    Oct 2009
    Beans
    Hidden!
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    Re: RAID failure

    Good luck. For what it is worth, I used Ruby's guide when I built the array in my backup server, so it should work fine.

    I'm not really sure if the problem is the drives or something else, which was why I suggested trying another machine.

    Hopefully the new build won't give you any problems.
    Come to #ubuntuforums! We have cookies! | Basic Ubuntu Security Guide

    Tomorrow's an illusion and yesterday's a dream, today is a solution...

  6. #26
    Join Date
    Jul 2010
    Location
    Michigan, USA
    Beans
    2,136
    Distro
    Ubuntu 18.04 Bionic Beaver

    Re: RAID failure

    Two things before wasting any more of your time:
    1. Replace the SATA cable on the drive with the serial of 83CWVNUKS. Based on your previous SMART report, you have UDMA_CRC_ERRORS which is almost always a bad SATA cable, or unlikely, a bad SATA head on your motherboard. Try to replace the cable on that drive with a known good one.
    2. Before you rebuild the array again, I would try to stress test each disk, so you can confirm that they are all working and that SMART values remain the same throughout the tests. It takes a long time on large disks, but it's worth it to weed out any problems upfront. I have a modified version of UnRAID's preclear script that does a good job of testing disks. You can view it here.
    Last edited by rubylaser; February 11th, 2014 at 01:59 AM.

  7. #27
    Join Date
    Aug 2010
    Beans
    16

    Re: RAID failure

    And I'm back again.

    For an update of where I currently am:
    I rebuilt the array before rubylaser posted, and as a result DIDN'T run the stress test.
    I did however get three new SATA cables and replaced all of the cables.
    I then commenced coping my backups (patchy as they are but at least I've got something ) something in the order of 900gb worth of videos/photos/music/etc

    Throughout the day I switched the monitor on and off a few times, and it seemed to be going well. Went to bed and got up the next morning to find that at about 90% completed (give or take) of the copy the raid array had failed out in the same way as before.

    I got rather frustrated at this point so wiped the data and deleted the array, and commenced the stress test from Rubylaser on all 3 of the drives. 30 hours later the results are in....and all three of the drives passed.....I decided to check if this was a error and ran it again....all three of the drives passed again (I've included the emails with the results below, just for completness)

    I'm now rather confused, why would it fail coping and then pass the stress tests. Is there a known issue with coping large ammounts of data? My current plan is to build the array and then copy the files in smaller chunks (ie: it got past 500gb so I'll try coping the data in 500gb chunks - To be honest, I don't expect this to make a difference)

    I'm going to commence the array build as soon as I hit post on this post, but if you have any more ideas (I'm wondering if the issue is an intermitant issue with the MOBO itself and that it might be worth replacing that) I'm all ears.

    Thanks for the help thus far, you have stopped me throwing the server through a window

    [EDIT: The array is currently rebuilding, we will see how it goes]
    Last edited by knight8524; April 12th, 2014 at 08:43 AM.

  8. #28
    Join Date
    May 2010
    Beans
    135

    Re: RAID failure

    Quote Originally Posted by knight8524 View Post
    I'm now rather confused, why would it fail coping and then pass the stress tests.
    Well, why did it fail? When something like this happens, the kernel is usually verbose about it. So check your kernel logs and dmesg. Without logs you're left with blind guesswork and that's not particularly useful.

  9. #29
    Join Date
    Aug 2010
    Beans
    16

    Re: RAID failure

    As requested my kernal logs have been uploaded here (on dropbox due size)

    I believe the failure occured around 2308 April 5 as the logs indicate that the array was remounted as a read only file system at that point, and there are a lot of I/O failures at that point.

    I don't know specifically what I'm looking for so I havn't been able to find the exact line that explains why it failed - I don't think my skills give me much more than blind guesswork even with the logs, but thankyou for everyone's help so far, I have learnt a lot through this thread, even if we havn't been able to solve the problem yet.

  10. #30
    Join Date
    May 2010
    Beans
    135

    Re: RAID failure

    My guess is that you have a bad SATA controller on that board, or a bad PSU, or bad cables or something. You can't get reliable storage with all those link resets going on. There should be no link resets in normal operation...

Page 3 of 4 FirstFirst 1234 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •