My work computer has recently started freezing up. When I leave it running for a longer period of time (such as overnight) and returning to it, it looks much like it went to sleep (the screens are turned off but don't give any "no signal" messages, the power button on the PC is still turned on, and the fans are still working). However, I cannot wake it up, regardless of what I do (I've tried moving the mouse, pressing keys, pressing ctrl+alt+del, and even the power button). I also cannot ssh to it any more and the system logs also do not get recorded properly. So I have to force reboot it (by holding the power button).
In all this time it has never frozen/crashed on me while I was actively using it. Additionally, the idle time required for it to freeze up varies wildly. Once I left an rsynccommand running over lunch (roughly an hour) and it froze by the time I came back, while another time it was on for over a week (successfully executing daily rsync and rclone jobs via cron the whole time) with no freezes.
I tried some solutions I found online to turn off sleep/suspend, such as turning off "Automatic Suspend" in Power settings, turning off "Suspend when laptop lid is closed" in gnome Tweaks, and adding intel_idle.max_cstate=1 to /etc/default/grub. None of these prevented the issue. I've also messed around with several BIOS settings that could plausibly cause something like this (namely, turning off AMD's XMP feature, adjusting RAM voltage to the exact recommended values I found on the seller's website, and changing Wake Up Event settings) to no avail.
I've also considered a hardware issue, so I ran memtest86 and got no errors. I also checked my SSD and HDD for errors as best I could (using SMART diagnostics) and got nothing.
The system logs aren't helpful as they aren't written to when the freeze happens (as noted above) - I'm attaching /var/log/syslog from the most recent freeze anyway. I found a solution to this online, namely following the logs live over an SSH connection from a different computer (using ssh user@workstation journalctl -f), however nothing in the logs jumped out at me as an obvious cause of the freeze - I'm attaching two such log files leading to freezes here regardless.
I'm kind of at a loss on how to continue diagnosing this problem. Are there perhaps some other log files I could look at to get an idea of what's going on and whether it's a software or hardware issue? Some way to check for motherboard and CPU errors? Maybe even PSU errors?
Computer specs:
- OS: Ubuntu 20.04.2 LTS
- CPU: AMD Ryzen 7 1700
- MB: MSI X370 SLI PLUS
- RAM: G.Skill Ripjaws V 16GB DDR4-3200
- GPU: NVIDIA Titan V
- SSD: Samsung - 860 Evo 500GB M.2-2280
- HDD: Toshiba - P300 3TB 3.5" 7200RPM
Bookmarks