Page 4 of 4 FirstFirst ... 234
Results 31 to 33 of 33

Thread: Bad Sectors - CAN NOT save the data on the HDD

  1. #31
    Join Date
    Mar 2011
    Location
    U.K.
    Beans
    Hidden!
    Distro
    Ubuntu 22.04 Jammy Jellyfish

    Re: Bad Sectors - CAN NOT save the data on the HDD

    O.K. ditch that idea .. although the author seems to know his stuff (but on old HDD's).

    But R-Linux is risk free.

  2. #32
    Join Date
    Mar 2010
    Location
    Squidbilly-Land
    Beans
    Hidden!
    Distro
    Ubuntu

    Re: Bad Sectors - CAN NOT save the data on the HDD

    Has the OP posted the system logs showing problems? The last 3 weeks or so, my 18.04 VM host has had some stability issues. I've narrowed it down to either the kernel line (not just 1 kernel) or bad RAM.

    On my system, after a crash, I can see the stack trace in logs from the prior crashes using journalctl -b -2. The failures look like this:
    Code:
    Jul 26 23:43:05 hadar kernel: Hardware name: System manufacturer System Product Name/ROG STRIX B450-F 
    Jul 26 23:43:05 hadar kernel: Call Trace:
    Jul 26 23:43:05 hadar kernel:  dump_stack+0x6d/0x8b
    Jul 26 23:43:05 hadar kernel:  bad_page+0xcb/0x120
    Jul 26 23:43:05 hadar kernel:  free_pages_check_bad+0x5f/0x70
    Jul 26 23:43:05 hadar kernel:  free_pcppages_bulk+0x472/0x6d0
    Jul 26 23:43:05 hadar kernel:  ? page_counter_cancel+0x23/0x30
    Jul 26 23:43:05 hadar kernel:  free_unref_page_commit+0xb9/0xe0
    Jul 26 23:43:05 hadar kernel:  free_unref_page_list+0x108/0x190
    Jul 26 23:43:05 hadar kernel:  shrink_page_list+0x379/0xbb0
    Jul 26 23:43:05 hadar kernel:  shrink_inactive_list+0x204/0x3d0
    Jul 26 23:43:05 hadar kernel:  shrink_node_memcg+0x3b4/0x820
    Jul 26 23:43:05 hadar kernel:  shrink_node+0xb5/0x410
    Jul 26 23:43:05 hadar kernel:  ? shrink_node+0xb5/0x410
    Jul 26 23:43:05 hadar kernel:  balance_pgdat+0x293/0x5f0
    Jul 26 23:43:05 hadar kernel:  kswapd+0x156/0x3c0
    Jul 26 23:43:05 hadar kernel:  ? wait_woken+0x80/0x80
    Jul 26 23:43:05 hadar kernel:  kthread+0x121/0x140
    Jul 26 23:43:05 hadar kernel:  ? balance_pgdat+0x5f0/0x5f0
    Jul 26 23:43:05 hadar kernel:  ? kthread_park+0x90/0x90
    Jul 26 23:43:05 hadar kernel:  ret_from_fork+0x22/0x40
    So, I wrote a tiny script to see how often the issue happens on each boot:
    Code:
    $ ~/bin/crashing 
    Boot -0 : 1
    Boot -1 : 5
    Boot -2 : 8
    Boot -3 : 1
    Boot -4 : 93
    Boot -5 : 223
    Boot -6 : 596
    Boot -7 : 258
    Boot -8 : 383
    Boot -9 : 0
    0 or 1 means no problem. Higher numbers show stack problems. Here's the script, but I doubt it will work for others as is:
    $ more ~/bin/crashing
    Code:
    #!/bin/bash
    for i in {0..9} ; do
       RC=$(journalctl -b -$i |grep  'dump_stack' |wc -l)
       echo "Boot -$i : $RC"
    done;
    3 boots ago, was when I lowered the DDR4 RAM speed from 2800 to 2733 and that ran without issue for about 10 days. Then a few days ago, even at 2733Mhz, the system began crashing - full lockup. No different TTY or ssh was possible. Had to use the reset button. Boot -2 and -1 where at the same 2733Mhz, but the last reboot I slowed the RAM again to 2666Mhz (I think).
    Code:
    $ sudo inxi -m
    Memory:    Used/Total: 15719.5/32108.7MB
               Array-1 capacity: 128 GB devices: 4 EC: None
               Device-1: DIMM_A1 size: 8 GB speed: 2666 MT/s type: DDR4
               Device-2: DIMM_A2 size: 8 GB speed: 2666 MT/s type: DDR4
               Device-3: DIMM_B1 size: 8 GB speed: 2666 MT/s type: DDR4
               Device-4: DIMM_B2 size: 8 GB speed: 2666 MT/s type: DDR4
    The box has been very busy the last few days. It has been up about 22 hrs now, which is good. On Saturday, during a maintenance window, I'll reseat the RAM, swap some paired DIMMs around in the slots, put the speed back up to 2933Mhz and see if that helps. If not, I'll take 2 of the sticks out, since the machine as 2-pairs which weren't bought as a matched set. 16G is a little tight for this system, but if I'm careful on RAM use, it should not be an issue.
    Code:
    $ free -m
                  total        used        free      shared  buff/cache   available
    Mem:          32108       15222        6836          41       10049       16382
    Swap:          4355         119        4236
    My current RAM use is just on the 16G cusp. I can half RAM allocated to one of the VMs and not power another, which should keep it around 12G used. I can move one or two VMs to a different system too.

    Should say, this VM host has been fairly stable for 2.5 yrs. SMART data for the connected storage is all fine. No bad blocks or any reallocated blocks at all. That was the first thing I checked. Weekly SMART tests run automatically and get logged on all my storage.

  3. #33
    Join Date
    Oct 2009
    Location
    Sydney
    Beans
    4,301
    Distro
    Ubuntu 20.04 Focal Fossa

    Re: Bad Sectors - CAN NOT save the data on the HDD

    Hi again,

    Quote Originally Posted by TheFu View Post
    Has the OP posted the system logs showing problems?
    1- No one has ever asked me to post any log of any kind AFAIK.
    2- Even if that happened, I can't login to the system anyway.

Page 4 of 4 FirstFirst ... 234

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •