JVM Crash and OH Restart Loop When Connecting to Serial Devices with Long Device Names

First of all thanks for the development of openHAB and the Zigbee binding, it’s a great piece of software :+1:

I just upgraded to OpenHAB 3.3 running on an Ubuntu 20.04 system (apt installation).

My system sometimes changes the assignments of the USB ports to the tty devices, i.e. sometimes the Zigbee stick will be at /dev/ttyUSB0 and sometimes /dev/ttyUSB1. So I changed the device to a symlink that always resolves to the right device: /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0. I also added those devices to the gnu.io.rxtx.SerialPorts setting in /etc/default/openhab like so:

EXTRA_JAVA_OPTS="-Dgnu.io.rxtx.SerialPorts=/dev/serial/by-id/usb-SHK_NANO_CUL_433-if00-port0:/dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0:/dev/ttyUSB0:/dev/ttyUSB1"

Since this change I have a major problem: when OH starts up, it only reaches the point where the Zigbee binding connects to the serial port:

2022-07-02 11:20:53.064 [DEBUG] [nding.zigbee.serial.ZigBeeSerialPort] - Connecting to serial port [/dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0] at 57600 baud, flow control FLOWCONTROL_OUT_XONOFF.

This is the last log entry I see, then the system seems to crash completely. Then there are no log entries anymore and the system starts up again from the beginning. I can’t do much on the UI or the console (only for a few seconds), then the serial connection kills openHAB completely and it goes into a restart loop.

Do you have any advice on how to fix this? A few specific questions:

  • Do I need to do anything to reload the configuration in /etc/default/openhab apart from systemctl daemon-reload and openhab-cli clean-cache? My openHAB runs as a systemd service.
  • Is there a known problem when I use symlinks as serial devices? The symlink itself belongs to root, but everyone can read, write and execute it (lrwxrwxrwx). The device itself (currently ttyUSB1) belongs to group dialout and the openhab user is in the dialout group. This part has been working without problems in openHAB 3.2.
  • I don’t have any chance to change the serial device setting back because the UI is only available for a few seconds before it is killed. Same for the console. Is there a way to change the setting very quickly with one single command on the console or can I change it back directly in the database? If so, can you point me to the right table?

Any other ideas on how I could fix this are highly appreciated.

Is that ( still ) required ? I did so too using OH2. I am using Conbee ( Zigbee ) and Zwave stick but for OH3 I am not using EXTRA_JAVA_OPTS to define gnu.io.rtx.SerialPorts.

I commented out the EXTRA_JAVA_OPTS line and don’t see a change in behavior, so this might not be needed anymore. I will confirm as soon as my system is up and running again.

In the mean time I managed to disable the zigbee binding with a bundle:stop command in the console within the few seconds before it connected to the stick. Now I can at least use the basic UI functionalities and the console again without being kicked out after a few seconds.

Unfortunately I can not change the setting of the Zigbee Stick on the UI while the bundle is not started. When I try to save it I get

java.lang.IllegalStateException: Thing with UID zigbee:coordinator_ember:01382705 has no handler attached.

Is there any way to change this setting while the bundle is not started? On the console or in the database maybe?

I also tried to delete the thing on the UI but it does not work either. On the next startup it is back again and kills OH when connecting to the stick. Is there a way to delete things on the console or in the database?

Thanks for your help :+1:

I haven’t tried doing that with stopped things but yes, the karaf console offers possibility to delete things/items. Alternative you may try the REST API ( Developer Tools => API Explorer ).


karaf console:
openhab> openhab:things
Usage: openhab:things list - lists all things
Usage: openhab:things show <thingUID>* - show details about one or more things; show details for all things if no thingUID provided
Usage: openhab:things clear - removes all managed things
Usage: openhab:things remove <thingUID> - removes a thing
Usage: openhab:things trigger <channelUID> [<event>] - triggers the <channelUID> with <event> (if given)
Usage: openhab:things disable <thingUID> - disables a thing
Usage: openhab:things enable <thingUID> - enables a thing

Thank you @Wolfgang_S, I was able to remove the thing using the things remove <thingUID> command and then created it again using the UI as usual.

EDIT: meanwhile I found out that it is better to just disable the thing instead of deleting it. When deleting and adding it again I also had to delete and re-discover all my Zigbee things again.

It still might be worth investigating why /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0 completely kills openhab and causes a restart loop :thinking:

The looping in the startup seems to be a general problem when using /dev/serial/by-id/.
I tried that with Enocean, but reverted it quickly.

Bridge enocean:bridge:FT1ZI6SV "EnOcean Gateway" @ "Flur EG"
[
path="/dev/ttyUSB0",
//path="/dev/serial/by-id/usb-EnOcean_GmbH_EnOcean_USB_300_DB_FT1ZI6SV-if00-port0",
enableSmack=true,
espVersion="ESP3"
]

Using /dev/serial/by-id/ seems to be more intuitive that dealing with udev rules.

Thanks for trying that out. I am debugging the issue remotely right now.

The problem occurs in the constructor of the class gnu.io.RXTXPort in the library nrjavaserial-5.2.1. Calling the native method open() crashes openHAB, but only under certain conditions. The call is

fd = open( name );

and the native method signature is

private native synchronized int open( String name ) throws PortInUseException;

The interesting thing is that this call is executed twice on my system, but only one fails. In the first call (which succeeds and is executed by the serial binding) the device name passed to the method is

/dev/serial/by-id/usb-SHK_NANO_CUL_433-if00-port0

and the second time (from the Zigbee binding) it is:

/dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0

The second call crashes openHAB. Does anyone have a clue why this happens the second time?
The file permissions of the symlinks look exactly the same:

Thanks for any help :+1: Meanwhile I will also keep looking.

EDIT: it might be related to threading issues in nrjavaserial such as this one.

EDIT 2: I noticed that the serial binding opens the serial port in a OH-safeCall thread, while the Zigbee code runs in a OH-thingHandler daemon thread. Do you guys know if that could be related to the issue?

Serial Binding Stack:

Thread [OH-safeCall-1] (Suspended (breakpoint at line 173 in RXTXPort))	
	RXTXPort.<init>(String) line: 173	
	RXTXCommDriver.getCommPort(String, int) line: 983	
	CommPortIdentifier.open(String, int) line: 467	
	SerialPortIdentifierImpl.open(String, int) line: 53	
	SerialBridgeHandler.initialize() line: 138	
	GeneratedMethodAccessor91.invoke(Object, Object[]) line: not available	
	DelegatingMethodAccessorImpl.invoke(Object, Object[]) line: 43	
	Method.invoke(Object, Object...) line: 566	
	InvocationHandlerSync<T>(AbstractInvocationHandler<T>).invokeDirect(Invocation) line: 154	
	Invocation.call() line: 52	
	FutureTask<V>.run() line: 264	
	QueueingThreadPoolExecutor(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1128	
	ThreadPoolExecutor$Worker.run() line: 628	
	Thread.run() line: 829

Zigbee Binding Stack:

Daemon Thread [OH-thingHandler-5] (Suspended (breakpoint at line 173 in RXTXPort))	
	owns: EmberHandler  (id=705)	
	RXTXPort.<init>(String) line: 173	
	RXTXCommDriver.getCommPort(String, int) line: 983	
	CommPortIdentifier.open(String, int) line: 467	
	SerialPortIdentifierImpl.open(String, int) line: 53	
	ZigBeeSerialPort.open(int, ZigBeePort$FlowControl) line: 157	
	ZigBeeSerialPort.open() line: 123	
	ZigBeeDongleEzsp.initialiseEzspProtocol() line: 1289	
	ZigBeeDongleEzsp.initialize() line: 432	
	ZigBeeNetworkManager.initialize() line: 418	
	EmberHandler(ZigBeeCoordinatorHandler).initialiseZigBee() line: 431	
	EmberHandler(ZigBeeCoordinatorHandler).lambda$2() line: 557	
	354614459.run() line: not available	
	Executors$RunnableAdapter<T>.call() line: 515	
	ScheduledThreadPoolExecutor$ScheduledFutureTask<V>(FutureTask<V>).run() line: 264	
	ScheduledThreadPoolExecutor$ScheduledFutureTask<V>.run() line: 304	
	WrappedScheduledExecutorService(ThreadPoolExecutor).runWorker(ThreadPoolExecutor$Worker) line: 1128	
	ThreadPoolExecutor$Worker.run() line: 628	
	Thread.run() line: 829

EDIT 3: presumably it’s not a concurrency issue because the JVM still crashes if I deactivate the other serial device. I opened a github issue for nrjavaserial, maybe they have an idea what the problem is.

EDIT 4: most likely we are dealing with buffer overflow issues. In the native code several log messages are constructed that are stored in character buffers of length 80. Those buffers will overflow when constructing messages with long device names like

open: locking worked for /dev/serial/by-id/usb-Silicon_Labs_BV_2010_10_01382705-if00-port0\n

This message has 91 characters and will thus overflow the buffer of 80 characters.

Since it turns out that this issue is not zigbee-specific, I will remove the tag in the topic name. If there is a more appropriate category, please let me know or move the topic accordingly.

There is a pending pull request now that will hopefully fix the buffer overflows.

Until this change makes it to an openHAB release, the problem can only be solved using udev rules (which is what I kind of wanted to avoid, but I guess this is the only solution currently).

I found very helpful instructions on stackexchange how this can be configured and will summarize it here quickly:

  1. Get infos about available attributes of your device: udevadm info --name=/dev/ttyUSBx --attribute-walk
  2. Find a unique combination of attributes for your device (or a single unique attribute)
  3. Create a file /etc/udev/rules.d/99-usb-serial.rules and edit it
  4. Enter one or more rules using unique attribute combinations and a SYMLINK instruction (see below)

My rules look like this:

SUBSYSTEM=="tty", ATTRS{product}=="NANO CUL", SYMLINK+="nanocul"
SUBSYSTEM=="tty", ATTRS{product}=="BV 2010/10", SYMLINK+="zigbeestick"

I only used one attribute per device because the product names are unique in my case. When the devices are plugged in, symlinks will be created at

/dev/nanocul
/dev/zigbeestick

that are symlinks to the actual tty devices:

lrwxrwxrwx  1 root root             7 Aug 11 10:17 nanocul -> ttyUSB2
lrwxrwxrwx  1 root root             7 Aug 11 07:00 zigbeestick -> ttyUSB0

Now I can use /dev/nanocul and /dev/zigbeestick in openHAB and will always end up with the correct device :+1: