Page 9 of 9 FirstFirst ... 789
Results 81 to 85 of 85

Thread: 9.10 upgrade says I have failing hard drive

  1. #81
    Join Date
    May 2009
    Location
    North West England
    Beans
    2,674
    Distro
    Ubuntu Development Release

    Re: 9.10 upgrade says I have failing hard drive

    I am 100% with you there.

    But, my manufactures diagnostics says it is okay.

    Now, once again I am getting errors reported & the Ubuntu 'test your disk' is not working ... (Tried about 10 minutes -- the short / quick one) ... 30 minutes later, it is 'hung' so I have to cancel it.

    But, as I'm also running 10.04 alpha, I have plenty of backups -- You can never have too many backups.

    Phill.

  2. #82
    Join Date
    Apr 2008
    Beans
    5
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: 9.10 upgrade says I have failing hard drive

    I recently upgraded from Intrepid -> Jaunty -> Karmic - as soon as Karmic booted, I got the warning that my drive was bad and that I should replace it.

    I have an Hitachi Deskstar 7K160 / HDS721616PLA380 / 160GB drive.

    Palimpsest overall assessment is 'DISK HAS MANY BAD SECTORS - back up data and replace disk'.

    I wasn't convinced, as my drive has been running fine for years, so I spent several hours investigating this problem.

    I've learnt quite a bit about SMART in the last 24 hours, so I'll post what I know here, so people can make an informed decision before replacing drives - I fear that some people have already replaced perfectly healthy drives because of this false error.


    The palimpsest (very stupid name) utility reports that my disk drive has 196,619 bad sectors - THIS IS NOT CORRECT, my drive ACTUALLY has *3* reallocated sectors, which is perfectly fine for a modern disk drive.

    In my case, SMART attribute 5 (Reallocated Sector Count) has a raw value** of 0x0B0003000000 - Palimpset assumes that this is a single 48-bit integer value and converts it to 196,619 (0x00000003000B - byte-sequence is reversed low-to-high). The format and meaning of the raw value is entirely up to the manufacturers. They can put what they like in here and don't have to release the meaning of the value - some treat it as a 'trade secret'.

    So, as far as SMART monitoring is concerned, what is important are the normalised VALUE and THRESHOLD values.

    Here are my drive SMART stats:

    Code:
    > sudo smartctl -A /dev/sda
    
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x000b   100   095   016    Pre-fail  Always       -       0
      2 Throughput_Performance  0x0005   100   100   050    Pre-fail  Offline      -       85
      3 Spin_Up_Time            0x0007   120   100   024    Pre-fail  Always       -       168 (Average 164)
      4 Start_Stop_Count        0x0012   100   100   000    Old_age   Always       -       1768
      5 Reallocated_Sector_Ct   0x0033   100   100   005    Pre-fail  Always       -       196619
      7 Seek_Error_Rate         0x000b   100   099   067    Pre-fail  Always       -       0
      8 Seek_Time_Performance   0x0005   136   100   020    Pre-fail  Offline      -       31
      9 Power_On_Hours          0x0012   098   098   000    Old_age   Always       -       16416
     10 Spin_Retry_Count        0x0013   100   100   060    Pre-fail  Always       -       0
     12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       1767
    192 Power-Off_Retract_Count 0x0032   099   099   000    Old_age   Always       -       1887
    193 Load_Cycle_Count        0x0012   099   099   000    Old_age   Always       -       1887
    194 Temperature_Celsius     0x0002   166   130   000    Old_age   Always       -       36 (Lifetime Min/Max 13/47)
    196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       -       3
    197 Current_Pending_Sector  0x0022   100   100   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0008   100   100   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x000a   200   253   000    Old_age   Always       -       294
    For most attributes the VALUE will start at either 200 or 100 when your drive is new - over time, some of these values will decrease towards the THRESHOLD value. If a VALUE reaches or drops below the THRESHOLD value, the attribute is flagged as FAILED and the health status of your drive may change.

    Earlier in this thread, someone made the assumption that the threshold value had a direct relation to the attribute count - this isn't necessarily true. The VALUE and THRESHOLD are calculated and updated by the hard-drive's firmware - only the manufacturer knows what these normalised values really mean (they're more likely to be percentage values than actual counts).

    The WHEN_FAILED column will show the point in the lifetime of the drive, that the attribute VALUE reached the THRESHOLD - the drive keeps track of how many hours it has been in use (powered-on accumulated hours) and will put the current value in the WHEN_FAILED column.

    As you can see from the smartctl output, NONE of my drive's attributes values have reached their threshold and therefore all the WHEN_FAILED values are blank.

    I have a healthy drive.

    Palimpsest should not be interpreting the raw values of some attributes and then making assumptions about them - it certainly should not be suggesting I change my drive based on a value that has NO AGREED FORMAT. The hard drive firmware is designed to indicate problems through the SMART attributes table - the important indicators are VALUE, THRESHOLD and WHEN_FAILED and that is what I'll be paying careful attention to, from now on.


    ** (to get a raw value from palimpsest, just hover the mouse pointer over the attribute)
    Last edited by Rick Deckard; March 8th, 2010 at 03:36 PM. Reason: Typos

  3. #83
    Join Date
    Apr 2010
    Beans
    1

    Re: 9.10 upgrade says I have failing hard drive

    My 2 cents:



    Assuming the "Bad sector" probability subspace doesn't naturally take up near 100% of all failure probability space, the scenario we are in is highly improbable. You'd think one of us would find our drives were failing for a different reason. I'm concluding its a bug.

  4. #84
    Join Date
    Apr 2008
    Beans
    5
    Distro
    Ubuntu 9.10 Karmic Koala

    Re: 9.10 upgrade says I have failing hard drive

    Prompted by the last post, I decided to check on my drives again out of curiosity. Bizarrely, palimpsest is now reporting 'SMART unavailable' for my two installed hard-drives...

    ...oh well, I really wish I hadn't 'upgraded' from Jaunty to Karmic. Roll-on 10.04...

  5. #85
    Join Date
    Aug 2009
    Beans
    19

    Re: 9.10 upgrade says I have failing hard drive

    !!!WARNING!!!THIS REALLY MAY NOT BE A BUG. I thought it was for the longest time. I'm a techie when it comes to hardware though, and after testing it on over 5 drives all in the same box, different boxes, different jumpers, configs, you name it.... the report only occured for me only on the drives that eventually went bad. All the drives that it said were failing have, either a week later, or 3 months later, etc. If you have this error, make sure to back up pretty much all the time. One minute your drive will work, the next it wont. period. Until then though, to my knowledge it will work perfectly, which is what makes it seem like a simple bug. I discounted this error over and over and thank goodness I back up fairly religiously or I would have lost more data. So, please, back your important stuff up and stop thinking of this as a bug before you lose some data like I did. If it so happens your hard drive doesn't fail for the next year and a half you have permission to call me a horses ****.

Page 9 of 9 FirstFirst ... 789

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •