Wierd USB / modbus problem

After upgrading to 4.1 i seem to have a wierd USB / modbus problem

USB 3 devices:

After reboot, everything seems to work very fine. But after some random hours (everything between 1 to 36 hours), all arduinos suddenly seems to all fail at the same time.
The other USB-devices also get communication problems (But comfoair and wavin things still shows as online)

I cant see any smoking gun. I suspect It might have to do with memory problems - but I cant see what is cause and effect.

I tried, without succes:

  • higher Xms, xmx settings in /etc/default/openhab
    (currently -Xms512m -Xmx1536m)
  • removing 7 arduino things (less traffic)
  • removing zmartify binding

sudo systemctl status openhab.service

openhabian@openhabian:~ $ sudo systemctl status openhab.service
● openhab.service - openHAB - empowering the smart home
     Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/openhab.service.d
             └─override.conf
     Active: active (running) since Sat 2024-02-17 17:28:38 CET; 17h ago
       Docs: https://www.openhab.org/docs/
             https://community.openhab.org
    Process: 755 ExecStartPre=/bin/rm -f /var/lock/LCK..ttyAMA0 /var/lock/LCK..ttyACM0 (code=exited, status=0/SUCCESS)
   Main PID: 764 (java)
      Tasks: 220 (limit: 4531)
        CPU: 1h 49min 44.882s
     CGroup: /system.slice/openhab.service
             └─764 /usr/bin/java -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dopenhab.runtime=/usr/share/openhab/runtime -Dopenhab.userdata=/var/lib/openhab -Dopenhab.log>

Feb 18 10:32:31 openhabian karaf[764]:   File "/var/lib/openhab/cache/org.eclipse.osgi/266/0/bundleFile/Lib/threading.py", line 353, in run
Feb 18 10:32:31 openhabian karaf[764]:     self.finished.wait(self.interval)
Feb 18 10:32:31 openhabian karaf[764]:   File "/var/lib/openhab/cache/org.eclipse.osgi/266/0/bundleFile/Lib/threading.py", line 463, in wait
Feb 18 10:32:31 openhabian karaf[764]:     self.__cond.release()
Feb 18 10:32:31 openhabian karaf[764]: IllegalMonitorStateException: java.lang.IllegalMonitorStateException
Feb 18 10:32:31 openhabian karaf[764]: Exception in thread Thread-215:Traceback (most recent call last):
Feb 18 10:32:31 openhabian karaf[764]:   File "/var/lib/openhab/cache/org.eclipse.osgi/266/0/bundleFile/Lib/threading.py", line 222, in _Thread__bootstrap
Feb 18 10:32:31 openhabian karaf[764]:     self.run()
Feb 18 10:32:31 openhabian karaf[764]:   File "/var/lib/openhab/cache/org.eclipse.osgi/266/0/bundleFile/Lib/threading.py", line 353, in run
Feb 18 10:32:31 openhabian karaf[764]:     self.finished.wait(self.interval)

openhab-cli console -p habopen shell:info

Karaf
  Karaf version               4.4.4
  Karaf home                  /usr/share/openhab/runtime
  Karaf base                  /var/lib/openhab
  OSGi Framework              org.eclipse.osgi-3.18.0.v20220516-2155

JVM
  Java Virtual Machine        OpenJDK Client VM version 17.0.9+9-Raspbian-1deb11u1rpt1
  Version                     17.0.9				Reboot:
  Vendor                      Raspbian
  Pid                         764
  Uptime                      17 hours 23 minutes
  Process CPU time            1 hour 49 minutes
  Process CPU load            0.00
  System CPU load             0.50
  Open file descriptors       237
  Max file descriptors        102,642
  Total compile time          1 minute
Threads
  Live threads                222
  Daemon threads              122
  Peak                        227
  Total started               19348				1181
Memory
  Current heap size           352,507 kbytes
  Maximum heap size           1,520,512 kbytes
  Committed heap size         507,264 kbytes
  Pending objects             0
  Garbage collector           Name = 'Copy', Collections = 957, Time = 25.611 seconds		Collections = 98
  Garbage collector           Name = 'MarkSweepCompact', Collections = 5, Time = 2.084 seconds
Classes
  Current classes loaded      38,903
  Total classes loaded        39,752
  Total classes unloaded      849
Operating system
  Name                        Linux version 6.1.21-v8+
  Architecture                arm
  Processors                  4
  Total physical memory       3,936,500 kbytes
  Free physical memory        1,417,252 kbytes

Problem seem to be related to jython and scripts you have. I’ve looked at stack trace shown in openhab service output and clearly it indicates python (Lib/threading.py), which has nothing to do with modbus binding. Unless you apply some transformations with it?

Looking at this:

Apparently you have a timer thread somewhere there which gets into a trouble with its own state.

I tried to remove all rules (i do have some timers there). The threading.py errors disapeared from sudo systemctl status openhab.service

  • but that didnt stop the errors with modbus.
    Hence I think the threading.py errors was an effect of the problem - not the causes.

The time before the problem arise, is somehow related to the amount of arduino things (in the things file). If i only have 1 thing, openhab will run for 2-3 days, before the USB-problem arise. Otherwise it will only run for some hours.

Logging shows when OK:

2024-03-10 12:35:14.376 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0c 03 00 00 00 1d 84 de 
2024-03-10 12:35:15.993 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response (CRC OK): 0c 03 3a 00 0c 00 1e d8 e4 d8 e5 d8 d8 d8 d8 d8 d8 00 d1 d8 d8 d8 d8 00 da d8 d8 00 ce d8 f0 d8 e4 d8 e5 d8 d8 d8 d8 d8 d8 02 54 d8 d8 d8 d8 03 e7 d8 d8 03 e7 00 cc d8 e2 d8 e2 d8 e2 f6 c5 
2024-03-10 12:35:16.061 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 12 03 00 00 00 1d 87 60 
2024-03-10 12:35:16.137 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response (CRC OK): 12 03 3a 00 12 01 cd d8 d8 02 3b d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 00 b6 01 10 04 26 d8 d8 d8 d4 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 01 da 01 06 d8 f0 d8 f0 d8 f0 d8 f0 c6 ed 
2024-03-10 12:35:16.197 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0d 03 00 00 00 1a c4 cd 
2024-03-10 12:35:16.268 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response (CRC OK): 0d 03 34 00 0d 00 28 d8 d8 d8 d4 01 07 d8 d8 01 05 01 03 d8 d8 d8 d8 d8 d8 d8 d8 00 d4 04 26 d8 d8 d8 d4 01 76 d8 d8 01 c5 01 8c d8 d8 d8 d8 d8 d8 d8 d8 03 4c d8 f0 86 9f

to suddently everything goes EOF:

2024-03-10 13:51:51.336 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 06 03 00 00 00 14 44 72
2024-03-10 13:51:51.390 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response (CRC OK): 06 03 28 00 06 01 c1 d8 d8 d8 d4 01 18 d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 ff 25 00 b8 d8 f0 04 27 d8 d8 d8 d4 01 40 d8 d8 d8 d8 d8 d8 42 64
2024-03-10 13:51:51.448 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 09 03 00 00 00 14 44 8d
2024-03-10 13:51:51.505 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response (CRC OK): 09 03 28 00 09 00 28 d8 d8 d8 e5 d8 d8 d8 d8 00 ef 01 1a d8 d8 d8 d8 d8 d8 d8 d8 d8 d8 04 27 d8 d8 d8 e5 d8 d8 d8 d8 01 b2 01 41 ba 19
2024-03-10 13:52:03.926 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:11.936 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:11.937 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: Error reading response (EOF)
2024-03-10 13:52:11.938 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 1 out of 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=15, functionCode=READ_MULTIPLE_REGISTERS, start=36, length=12, maxTries=5]). Will try again soon. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: IOException Error reading response (EOF) [operation ID 394374e8-2aa7-40f8-94a9-de1eb1538592]
2024-03-10 13:52:11.985 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:19.994 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:19.995 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: Error reading response (EOF)
2024-03-10 13:52:19.995 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 2 out of 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=15, functionCode=READ_MULTIPLE_REGISTERS, start=36, length=12, maxTries=5]). Will try again soon. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: IOException Error reading response (EOF) [operation ID 394374e8-2aa7-40f8-94a9-de1eb1538592]
2024-03-10 13:52:20.041 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:28.050 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:28.051 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: Error reading response (EOF)
2024-03-10 13:52:28.052 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 3 out of 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=15, functionCode=READ_MULTIPLE_REGISTERS, start=36, length=12, maxTries=5]). Will try again soon. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: IOException Error reading response (EOF) [operation ID 394374e8-2aa7-40f8-94a9-de1eb1538592]
2024-03-10 13:52:28.097 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:36.103 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:36.104 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: Error reading response (EOF)
2024-03-10 13:52:36.105 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 4 out of 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=15, functionCode=READ_MULTIPLE_REGISTERS, start=36, length=12, maxTries=5]). Will try again soon. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: IOException Error reading response (EOF) [operation ID 394374e8-2aa7-40f8-94a9-de1eb1538592]
2024-03-10 13:52:36.153 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:44.159 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 0f 03 00 24 00 0c 04 ea
2024-03-10 13:52:44.160 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: Error reading response (EOF)
2024-03-10 13:52:44.161 [ERROR] [rt.modbus.internal.ModbusManagerImpl] - Last try 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=15, functionCode=READ_MULTIPLE_REGISTERS, start=36, length=12, maxTries=5]). Aborting. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: IOException Error reading response (EOF) [operation ID 394374e8-2aa7-40f8-94a9-de1eb1538592]
2024-03-10 13:52:44.230 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 0b 03 00 00 00 1d 85 69
2024-03-10 13:52:52.239 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 0b 03 00 00 00 1d 85 69
2024-03-10 13:52:52.240 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: Error reading response (EOF)

This suggests a failure of communication at operating system level or on arduino side. Have a look on ‘dmesg’ output, sometimes failures with client devices will produce system level log entry. That was a cass (AFAIR) for some people who run smartmeter binding.

Interesting!
I tried to reset arduino - that didnt change anything.

Did dmesg after comm start to fail:

[ 3479.915592] cp210x ttyUSB1: usb_serial_generic_read_bulk_callback - urb stopped: -32
[ 3479.915742] cp210x ttyUSB1: failed to get comm status: -32
[ 4746.281459] cp210x ttyUSB2: usb_serial_generic_read_bulk_callback - urb stopped: -32
[ 4746.281646] ftdi_sio ttyUSB0: usb_serial_generic_read_bulk_callback - urb stopped: -32
[ 4746.281738] cp210x ttyUSB1: failed to get comm status: -32
[ 4746.281771] cp210x ttyUSB1: usb_serial_generic_read_bulk_callback - urb stopped: -32

At 3479s there are 2 CRC errors, that didnt “kill” the communication.

But about 20 min later (4746.281771) the comm dies (EOF)

ttyUSB0 → /dev/ttyUSBarduino
ttyUSB1 → /dev/ttyUSBwavin (Zmartify addon)
ttyUSB2 → /dev/ttyUSBcomfoair

Guess you might be facing situation related to cp210x with specific devices: https://forums.raspberrypi.com/viewtopic.php?p=1954047&sid=b59b962d07fd45739c9e8c71fe4c3899#p1954047

I’ve seen some sort of these issues also with USB-CAN converters which caused freeze of hub. Yet, in my case it was proven to be caused by improper termination of CAN line (missing 120 ohm resistor). Do you have a terminators on your RS485 lines? It might plan role to the issue, the longer line and faster data rate you have, the higher probability of issues is.

hmm …
I DO actually have

  • a bad wiring (star connection instead of bus. cat5e). Not possible to change
  • removed 120R (i tried different solutions, 0,1,multiple Resistors, caps to stop powerdrain.) Removing120R seemed on oscilloscope to give way better square comm signal/noise.

I thoughed bad hardware could give CRC/EOF-errors … but actually freeze the hub ?? Doe to hardware, like power issues?
(In my mind, software should be way more forgiving). Do you know excactly what causes the freeze?

And why is The “uptime” is roughly proportional to the number of software things.
limiting to 1 arduino still freezes (after way longer uptime) but then it cant be the comm somehow being jammed

I am not a trained automation engineer, hence I can only echo some stuff I’ve heard from them. Without terminator you will face reflections of signal which will keep going from one end of line to other, interfering with signal generated by your adapter. With star topology you will have even higher problem.
You can see relation between number of software things and hangs, because each thing you have in openHAB will cause communication. The more traffic you have on the physical interface, the more noise you have and, the higher likelihood of triggering issue with USB adapter.

Iny case of CAN/USB issues I’ve contacted author of it and spoke with him. He attempted to reproduce issue, but with his environment, with proper terminations and so on, issue did not exist. Eventually I’ve asked technician to double check cabling, this way hanging of CAN interface disappeared.
For CAN there is variety of basic repeaters/isolators which can also act as a splitter. You can find similar tools for RS485. They will turn your star topology into several buses, each with its own terminators.