Docker container crashes frequently

Hey there,
I recently moved my OH3.3.0 setup from native install on Debian into dockerized setup. Since then the container is crashing constantly after few hours.
The only slight complexity I have is a USB zwave stick, that needs to be handed into the container. The rest is more or less straight forward.
Here are some error messages I found:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f91f7f9f4e4, pid=29, tid=1283
#
# JRE version: OpenJDK Runtime Environment Temurin-11.0.15+10 (11.0.15+10) (build 11.0.15+10)
# Java VM: OpenJDK 64-Bit Server VM Temurin-11.0.15+10 (11.0.15+10, mixed mode, tiered, compressed oops, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7974e4]  frame::frame(long*, long*, long*, unsigned char*)+0xc4
#
# Core dump will be written. Default location: /openhab/userdata/core
#
# An error report file with more information is saved as:
# /tmp/hs_err_pid29.log
#
# If you would like to submit a bug report, please visit:
#   https://github.com/adoptium/adoptium-support/issues
#

events.log

2022-09-22 03:58:00.088 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:5ef9b14944' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Serial Error: Port /dev/ttyACM0 does not exist
2022-09-22 03:58:00.090 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:5ef9b14944:node10' changed from ONLINE to OFFLINE (BRIDGE_OFFLINE): Controller is offline
...

hs_err_pid29.log:

...
---------------  T H R E A D  ---------------

Current thread (0x00007fac7c2d1800):  JavaThread "RXTXPortMonitor(/dev/ttyACM1)" daemon [_thread_in_Java, id=680, stack(0x00007fac3698e000,0x00007fac36a8f000)]

Stack: [0x00007fac3698e000,0x00007fac36a8f000],  sp=0x00007fac36a8d738,  free space=1021k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
j  java.lang.Thread.join()V+2 java.base@11.0.15
j  org.openhab.binding.zwave.handler.ZWaveSerialHandler.disposeReceiveThread()V+30
j  org.openhab.binding.zwave.handler.ZWaveSerialHandler.onSerialPortError(Ljava/lang/String;)V+69
j  org.openhab.binding.zwave.handler.ZWaveSerialHandler$ZWaveReceiveThread.serialEvent(Lorg/openhab/core/io/transport/serial/SerialPortEvent;)V+17
j  org.openhab.core.io.transport.serial.rxtx.RxTxSerialPort$1.serialEvent(Lgnu/io/SerialPortEvent;)V+17
j  gnu.io.RXTXPort.sendEvent(IZ)Z+397
j  gnu.io.RXTXPort$MonitorThread.run()V+42
v  ~StubRoutines::call_stub
V  [libjvm.so+0x8d5dbb]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x39b
V  [libjvm.so+0x8d3d7d]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x1ed
V  [libjvm.so+0x98118c]  thread_entry(JavaThread*, Thread*)+0x6c
V  [libjvm.so+0xed4cfa]  JavaThread::thread_main_inner()+0x1ba
V  [libjvm.so+0xed18ff]  Thread::call_run()+0x14f
V  [libjvm.so+0xc719be]  thread_native_entry(Thread*)+0xee


siginfo: si_signo: 11 (SIGSEGV), si_code: 1 (SEGV_MAPERR), si_addr: 0x00007fac000009a0
....

Seems like the Zwave stick disappears from the system and thus somehow OH3 dies. Will investigate in that direction.
Does anyone have any other recommendations how/what to troubleshoot and resolve this?
Thank much for your hints.
Sebastian

Note to self. Updated tty handling on Docker host.
Updated

root@docker:/lib/udev/rules.d# cat 99-usb-serial.rules
# Zwave Stick
SUBSYSTEM=="tty", ATTRS{idVendor}=="0658", MODE="0770", GROUP="openhab", SYMLINK+="ttyZwave0"
# Conbee Stick
SUBSYSTEM=="tty", ATTRS{idVendor}=="1cf1", ATTRS{idProduct}=="0030", MODE="0770", GROUP="dialout", SYMLINK+="ttyConbee0"

root@docker:/lib/udev/rules.d#  udevadm control --reload-rules && udevadm trigger

so that my USB sticks are always addressed the same way.

root@docker:/lib/udev/rules.d# ls -al /dev/tty* |grep ACM
crwxrwx--- 1 root openhab 166,  0 Sep 22 08:41 /dev/ttyACM0
crwxrwx--- 1 root dialout 166,  1 Sep 22 08:41 /dev/ttyACM1
lrwxrwxrwx 1 root root          7 Sep 22 08:41 /dev/ttyConbee0 -> ttyACM1
lrwxrwxrwx 1 root root          7 Sep 22 08:41 /dev/ttyZwave0 -> ttyACM0

Then mapped the new /dev/ttyZwave0 into the Container

docker run -d
...
  --device=/dev/ttyZwave0:/dev/ttyACM0:rwm \
...

Also moved from own UID/GID for openhab user into the default 9001 UID/GID setup as proposed on the openhab dockerhub docu page.
lets see if this is more stable.

Updating your host OS (including Docker) might help if it is caused by some known and fixed kernel or Docker bug. Maybe some journalctl logging on your host explains why the device got disconnected?

Thanks for your support @wborn.
I updated my docker host and apps to the most recent version already. Also read some kernel posts that possibly the latest kernel introduced some issues.
I optimized my setup a bit as written above and for the last 6 hours the system is stable.
Knock on wood…

OH crashed again tonight 03:58h. Syslog sais, that the USB host controller was suddenly gone from the system.

Sep 23 03:58:33 docker kernel: [72558.634074] xhci_hcd 0000:00:06.0: xHCI host not responding to stop endpoint command.
Sep 23 03:58:33 docker kernel: [72558.634117] xhci_hcd 0000:00:06.0: USBSTS: 0x00000001 HCHalted
Sep 23 03:58:33 docker kernel: [72558.634121] xhci_hcd 0000:00:06.0: xHCI host controller not responding, assume dead
Sep 23 03:58:33 docker kernel: [72558.634125] xhci_hcd 0000:00:06.0: HC died; cleaning up
Sep 23 03:58:33 docker kernel: [72558.634672] usb 1-1: USB disconnect, device number 2
Sep 23 03:58:33 docker kernel: [72558.634672] usb 1-1.1: USB disconnect, device number 3
Sep 23 03:58:33 docker kernel: [72558.635226] usb 1-1.3: USB disconnect, device number 6

I’m running XCP-ng virtualization stack and do push a whole USB-PCI card into the VM. Most likely the issue is somewhere in that area.

Would be nice if OH isnt bailing out completely in case such thing happens, so that only the binding dies and the rest remains active.

In my experience it may help to create a virtual proxy device via socat. I use this for a serial device that I connect through the network and it runs very stable. Before that OH did not crash completely but had to be rebooted after the connection was lost. This may however also help with physically connected devices.

The following script is mounted in the /etc/cont-init.d/ folder of the OH docker container and creates the ttyNET0 device that connects to my actual ttyNET1 device (which is actually another socat instance that connects to the network device). OH is configured to use the ttyNET0 device.

#!/bin/bash
# https://community.openhab.org/t/cant-use-forwarded-socat-serial-port-in-lgtvserial-binding-ioexception/97965
# https://community.openhab.org/t/forwarding-of-serial-and-usb-ports-over-the-network-to-openhab/46597

# use while loop to restart socat on connection end
while /bin/true; do
    socat -d -d -s -lf /openhab/userdata/logs/socat_proxy.log pty,link=/dev/ttyNET0,raw,user=openhab,group=openhab,mode=777 pty,link=/dev/ttyNET1,raw,echo=0
    sleep 1
done &> /dev/null &

Thanks for all your hints.
In the end, I did an update of my hypervisor (xcp-ng) which sorted the issues. No more disconnects for the last days.
Closing here.

1 Like