My Nextcloud server keeps hard locking on me. Once every few days with no discernible pattern. All times of day, sometimes a couple of days elapsing, sometimes longer than a week. Due to simultaneous log death, triage is proving especially challenging. For security/isolation, I feel that I must jail it within a LXD container—which is commonly done—but it's otherwise pretty much a default install. It is a component install: LAMP stack then the latest NC directly from the official site. I did not use the NC snap. The things I've already tried are:
- Upgraded RAM to 16 GB (replaced all RAM modules with new, so problem is unlikely memory related)
- Updated to most recent BIOS (so if it's CPU or MOBO, I have no further recourse other than new HW)
- Added 90 GB ZFS ARC cache
- Constrained Nextcloud container to only 2 of 4 CPU threads.
By constraining CPU threads, the last failure at least did not freeze up the whole server. But I couldn't kill the LXD container, nor could I reach the Nextcloud instance. I could log into container directly (SSH didn't work), but no commands would execute, returning just a continually blinking cursor that had to be killed (vis SSH into the server). Logs were again frustratingly unhelpful. Web searches, likewise. There are hints that the problem could be a combo of kernel bugs and ZFS. Or perhaps my server is just too Mickey Mouse to handle such infrastructure, though that doesn't seem likely since many successfully run their NC instances from an RPi.
I should add that NC is the only container running on it. Though at one point I had plans to multitask the server, given the above, it is not used for anything else.
I am thinking of replacing ZFS with LVM. Also considering physically isolating the server and running LAMP+NC on a base install with no containment but also sans fancy file system, though this would be far from ideal and only a last resort.
HW description:
Code:
duckhook@charon:~$ sudo lshw -sanitize
computer
description: Desktop Computer
product: To Be Filled By O.E.M. (To Be Filled By O.E.M.)
vendor: To Be Filled By O.E.M.
version: To Be Filled By O.E.M.
serial: [REMOVED]
width: 64 bits
capabilities: smbios-2.8 dmi-2.8 smp vsyscall32
configuration: boot=normal chassis=desktop family=To Be Filled By O.E.M. sku=To Be Filled By O.E.M. uuid=[REMOVED]
*-core
description: Motherboard
product: Q1900B-ITX
vendor: ASRock
physical id: 0
serial: [REMOVED]
*-firmware
description: BIOS
vendor: American Megatrends Inc.
physical id: 0
version: P2.20
date: 02/12/2019
size: 64KiB
capacity: 8MiB
capabilities: pci upgrade shadowing cdboot bootselect socketedrom edd int13floppy1200 int13floppy720 int13floppy2880 int5printscreen int9keyboard int14serial int17printer acpi usb biosbootspecification uefi
*-memory
description: System Memory
physical id: a
slot: System board or motherboard
size: 16GiB
*-bank:0
description: DIMM DDR3 1333 MHz (0.8 ns)
product: HMT41GS6AFR8A-PB
vendor: Hynix Semiconduc
physical id: 0
serial: [REMOVED]
slot: A1_DIMM0
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-bank:1
description: DIMM DDR3 1333 MHz (0.8 ns)
product: HMT41GS6AFR8A-PB
vendor: Hynix Semiconduc
physical id: 1
serial: [REMOVED]
slot: A1_DIMM1
size: 8GiB
width: 64 bits
clock: 1333MHz (0.8ns)
*-cache:0
description: L1 cache
physical id: 11
slot: CPU Internal L1
size: 224KiB
capacity: 224KiB
capabilities: internal write-back
configuration: level=1
*-cache:1
description: L2 cache
physical id: 12
slot: CPU Internal L2
size: 2MiB
capacity: 2MiB
capabilities: internal write-back unified
configuration: level=2
*-cpu
description: CPU
product: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz
vendor: Intel Corp.
physical id: 13
bus info: cpu@0
version: Intel(R) Celeron(R) CPU J1900 @ 1.99GHz
slot: CPUSocket
size: 2317MHz
capacity: 2416MHz
width: 64 bits
clock: 83MHz
capabilities: lm fpu fpu_exception wp vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp x86-64 constant_tsc arch_perfmon pebs bts rep_good nopl xtopology tsc_reliable nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm sse4_1 sse4_2 movbe popcnt tsc_deadline_timer rdrand lahf_lm 3dnowprefetch epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid tsc_adjust smep erms dtherm ida arat cpufreq
configuration: cores=4 enabledcores=4 threads=4
*-pci
description: Host bridge
product: Atom Processor Z36xxx/Z37xxx Series SoC Transaction Register
vendor: Intel Corporation
physical id: 100
bus info: pci@0000:00:00.0
version: 0c
width: 32 bits
clock: 33MHz
configuration: driver=iosf_mbi_pci
resources: irq:0
*-display
description: VGA compatible controller
product: Atom Processor Z36xxx/Z37xxx Series Graphics & Display
vendor: Intel Corporation
physical id: 2
bus info: pci@0000:00:02.0
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pm msi vga_controller bus_master cap_list rom
configuration: driver=i915 latency=0
resources: irq:94 memory:d0000000-d03fffff memory:c0000000-cfffffff ioport:f080(size=8) memory:c0000-dffff
*-sata
description: SATA controller
product: Atom Processor E3800 Series SATA AHCI Controller
vendor: Intel Corporation
physical id: 13
bus info: pci@0000:00:13.0
version: 0c
width: 32 bits
clock: 66MHz
capabilities: sata msi pm ahci_1.0 bus_master cap_list
configuration: driver=ahci latency=0
resources: irq:92 ioport:f070(size=8) ioport:f060(size=4) ioport:f050(size=8) ioport:f040(size=4) ioport:f020(size=32) memory:d0716000-d07167ff
*-usb
description: USB controller
product: Atom Processor Z36xxx/Z37xxx, Celeron N2000 Series USB xHCI
vendor: Intel Corporation
physical id: 14
bus info: pci@0000:00:14.0
version: 0c
width: 64 bits
clock: 33MHz
capabilities: pm msi xhci bus_master cap_list
configuration: driver=xhci_hcd latency=0
resources: irq:91 memory:d0700000-d070ffff
*-usbhost:0
product: xHCI Host Controller
vendor: Linux 5.8.0-44-generic xhci-hcd
physical id: 0
bus info: usb@1
logical name: usb1
version: 5.08
capabilities: usb-2.00
configuration: driver=hub slots=6 speed=480Mbit/s
*-usb
description: USB hub
product: USB2.0 Hub
vendor: Genesys Logic, Inc.
physical id: 2
bus info: usb@1:2
version: 85.37
capabilities: usb-2.00
configuration: driver=hub maxpower=100mA slots=3 speed=480Mbit/s
*-usbhost:1
product: xHCI Host Controller
vendor: Linux 5.8.0-44-generic xhci-hcd
physical id: 1
bus info: usb@2
logical name: usb2
version: 5.08
capabilities: usb-3.00
configuration: driver=hub slots=1 speed=5000Mbit/s
*-generic
description: Encryption controller
product: Atom Processor Z36xxx/Z37xxx Series Trusted Execution Engine
vendor: Intel Corporation
physical id: 1a
bus info: pci@0000:00:1a.0
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pm msi bus_master cap_list
configuration: driver=mei_txe latency=0
resources: irq:95 memory:d0500000-d05fffff memory:d0400000-d04fffff
*-multimedia
description: Audio device
product: Atom Processor Z36xxx/Z37xxx Series High Definition Audio Controller
vendor: Intel Corporation
physical id: 1b
bus info: pci@0000:00:1b.0
version: 0c
width: 64 bits
clock: 33MHz
capabilities: pm msi bus_master cap_list
configuration: driver=snd_hda_intel latency=0
resources: irq:96 memory:d0710000-d0713fff
*-pci:0
description: PCI bridge
product: Atom Processor E3800 Series PCI Express Root Port 1
vendor: Intel Corporation
physical id: 1c
bus info: pci@0000:00:1c.0
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:87 ioport:1000(size=4096)
*-pci:1
description: PCI bridge
product: Atom Processor E3800 Series PCI Express Root Port 2
vendor: Intel Corporation
physical id: 1c.1
bus info: pci@0000:00:1c.1
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:88 ioport:e000(size=4096) memory:d0600000-d06fffff
*-network
description: Ethernet interface
product: RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
vendor: Realtek Semiconductor Co., Ltd.
physical id: 0
bus info: pci@0000:02:00.0
logical name: enp2s0
version: 11
serial: [REMOVED]
size: 1Gbit/s
capacity: 1Gbit/s
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress msix vpd bus_master cap_list ethernet physical tp mii 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
configuration: autonegotiation=on broadcast=yes driver=r8169 driverversion=5.8.0-44-generic duplex=full firmware=rtl8168g-2_0.0.1 02/06/13 ip=[REMOVED] latency=0 link=yes multicast=yes port=MII speed=1Gbit/s
resources: irq:17 ioport:e000(size=256) memory:d0604000-d0604fff memory:d0600000-d0603fff
*-pci:2
description: PCI bridge
product: Atom Processor E3800 Series PCI Express Root Port 3
vendor: Intel Corporation
physical id: 1c.2
bus info: pci@0000:00:1c.2
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:89 ioport:2000(size=4096)
*-pci:3
description: PCI bridge
product: Atom Processor E3800 Series PCI Express Root Port 4
vendor: Intel Corporation
physical id: 1c.3
bus info: pci@0000:00:1c.3
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pci pciexpress msi pm normal_decode bus_master cap_list
configuration: driver=pcieport
resources: irq:90 ioport:3000(size=4096)
*-isa
description: ISA bridge
product: Atom Processor Z36xxx/Z37xxx Series Power Control Unit
vendor: Intel Corporation
physical id: 1f
bus info: pci@0000:00:1f.0
version: 0c
width: 32 bits
clock: 33MHz
capabilities: isa bus_master cap_list
configuration: driver=lpc_ich latency=0
resources: irq:0
*-serial
description: SMBus
product: Atom Processor E3800 Series SMBus Controller
vendor: Intel Corporation
physical id: 1f.3
bus info: pci@0000:00:1f.3
version: 0c
width: 32 bits
clock: 33MHz
capabilities: pm cap_list
configuration: driver=i801_smbus latency=0
resources: irq:18 memory:d0714000-d071401f ioport:f000(size=32)
*-pnp00:00
product: PnP device PNP0b00
physical id: 1
capabilities: pnp
configuration: driver=rtc_cmos
*-pnp00:01
product: PnP device PNP0c02
physical id: 2
capabilities: pnp
configuration: driver=system
*-pnp00:02
product: PnP device PNP0c02
physical id: 3
capabilities: pnp
configuration: driver=system
*-pnp00:03
product: PnP device PNP0400
physical id: 4
capabilities: pnp
configuration: driver=parport_pc
*-pnp00:04
product: PnP device PNP0501
physical id: 5
capabilities: pnp
configuration: driver=serial
*-pnp00:05
product: PnP device PNP0501
physical id: 6
capabilities: pnp
configuration: driver=serial
*-pnp00:06
product: PnP device PNP0c02
physical id: 7
capabilities: pnp
configuration: driver=system
*-scsi:0
physical id: 8
logical name: scsi0
capabilities: emulated
*-disk
description: ATA Disk
product: KINGSTON SV300S3
physical id: 0.0.0
bus info: scsi@0:0.0.0
logical name: /dev/sda
version: BBF0
serial: [REMOVED]
size: 111GiB (120GB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=81ee0c35-8d76-46cf-8046-2c6c24b48acb logicalsectorsize=512 sectorsize=512
*-volume:0
description: Windows FAT volume
vendor: mkfs.fat
physical id: 1
bus info: scsi@0:0.0.0,1
logical name: /dev/sda1
logical name: /boot/efi
version: FAT32
serial: [REMOVED]
size: 510MiB
capacity: 511MiB
capabilities: boot fat initialized
configuration: FATs=2 filesystem=fat label=UEFI mount.fstype=vfat mount.options=rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro name=EFI System Partition state=mounted
*-volume:1
description: EXT4 volume
vendor: Linux
physical id: 2
bus info: scsi@0:0.0.0,2
logical name: /dev/sda2
logical name: /
version: 1.0
serial: [REMOVED]
size: 30GiB
capabilities: journaled extended_attributes large_files huge_files dir_nlink recover 64bit extents ext4 ext2 initialized
configuration: created=2021-01-04 20:28:14 filesystem=ext4 lastmountpoint=/ modified=2021-01-04 20:59:45 mount.fstype=ext4 mount.options=rw,relatime mounted=2021-03-06 00:53:22 name=root state=mounted
*-volume:2
description: OS X ZFS partition or Solaris /usr partition
vendor: Solaris
physical id: 3
bus info: scsi@0:0.0.0,3
logical name: /dev/sda3
serial: [REMOVED]
capacity: 79GiB
configuration: name=Solaris /usr & Mac ZFS
*-scsi:1
physical id: 9
logical name: scsi1
capabilities: emulated
*-disk
description: ATA Disk
product: WDC WD30EZRX-00M
vendor: Western Digital
physical id: 0.0.0
bus info: scsi@1:0.0.0
logical name: /dev/sdb
version: 0A80
serial: [REMOVED]
size: 2794GiB (3TB)
capabilities: gpt-1.00 partitioned partitioned:gpt
configuration: ansiversion=5 guid=2b1d0253-3039-1240-b2bf-a3241670f547 logicalsectorsize=512 sectorsize=4096
*-volume:0
description: OS X ZFS partition or Solaris /usr partition
vendor: Solaris
physical id: 1
bus info: scsi@1:0.0.0,1
logical name: /dev/sdb1
serial: [REMOVED]
capacity: 2794GiB
configuration: name=zfs-2ac91fd99ba76c71
*-volume:1
description: reserved partition
vendor: Solaris
physical id: 9
bus info: scsi@1:0.0.0,9
logical name: /dev/sdb9
serial: [REMOVED]
capacity: 8191KiB
*-network
description: Ethernet interface
physical id: 1
logical name: veth8eaf1df1
serial: [REMOVED]
size: 10Gbit/s
capabilities: ethernet physical
configuration: autonegotiation=off broadcast=yes driver=veth driverversion=1.0 duplex=full link=yes multicast=yes port=twisted pair speed=10Gbit/s
I can pastebin the logs if anyone feels it to be necessary, but they are massive and unrevealing, with different and unrelated processes being called before each lockup. The logging daemons always die along with the kernel.
I welcome hearing from all of you about your NC triumphs/travails and especially your implementation strategies/recommendations.
Bookmarks