I have an intel NUC11TNHV5, which is an I5-1145G7, with 32gb good memory (G.Skill RipJaws), and a Samsung 980 Pro NVMe as system disk (500gb), with ubuntu 22.04.04 in a pretty vanilla setup. It serves as the brains of a player piano system, running a custom QT application I wrote, as well as Kobe. Only devices hooked up are a monitor (with touchscreen via USB), several USB serial devices used for MIDI and similar functions, no in-chassis devices. And all these devices were idle when it hangs.
When the system is just sitting there it periodically just -- stops. The screen saver is on (before this happens), there is nothing on the screen and it will not come up. It is not accessible from the network.
If I reboot/reset, the syslog shows nothing helpful. The last entry before the reboot is innocuous. You can see it in the syslog below. This has happened a half dozen times or so, I never see anything interesting. The time between hangs is maybe 3-12 days. There's no hint of any related issue like power problems (while this is not on a UPS I have a bunch of them in the house and they do not show any power event at the time).
I have updated the uefi to current with no change in behavior. I've reviewed all settings and see nothing unusual. No attempt at over-clocking or anything like that (not sure this one even can).
The only consistent things are that the failure leaves no trail (at least I can find), and occurs when the system is idle. Though it's idle 98% of the time so that's not terribly meaningful.
I have zabbix running on the system (network/system monitor). Polling stops when it hangs of course, but the last poll shows nothing unusual - negligable CPU, memory usage, etc.
I have used this hardware for a year or so. The only change that I can think of that seems related to the timing is moving from a hard wired ethernet to using wifi. The piano is in a different place where I do not have ethernet readily available. I do not get wifi errors however, and it's in an area with a very strong signal (wifi 6, circa -60dBm). The AP shows a simple disconnect at the time of the hang, nothing unusual, good single.
I'm looking for ideas. Getting ethernet to this means running a wire down a hall and putting a switch in somewhere in the middle, but that is probably my next idea. Ugly (literally).
Anything else I can check? Any other setting I can put in that might yield more information when it fails again?
Linwood
Code:Jul 22 09:26:50 piano NetworkManager[550]: <info> [1721654810.0279] dhcp4 (wlo1): state changed new lease, address=192.168.130.55Jul 22 09:30:01 piano CRON[56491]: (root) CMD ([ -x /etc/init.d/anacron ] && if [ ! -d /run/systemd/system ]; then /usr/sbin/invoke-rc.d anacron start >/dev/null; fi) Jul 22 09:32:55 piano systemd[1]: Started Run anacron jobs. Jul 22 09:32:55 piano systemd[1]: anacron.service: Deactivated successfully. Jul 22 09:32:55 piano anacron[56529]: Anacron 2.3 started on 2024-07-22 Jul 22 09:32:55 piano anacron[56529]: Normal exit (0 jobs run) Jul 22 17:00:37 piano systemd-modules-load[281]: Inserted module 'lp' Jul 22 17:00:37 piano systemd-modules-load[281]: Inserted module 'ppdev' Jul 22 17:00:37 piano systemd-modules-load[281]: Inserted module 'parport_pc' Jul 22 17:00:37 piano systemd[1]: Reached target Swaps. Jul 22 17:00:37 piano systemd[1]: Starting Flush Journal to Persistent Storage...
Bookmarks