Hello, I’m trying to debug a problem that I see after migrating from a OH 3.x setup to 4.0.4.
Some information:
- The original install, and recent update, was done via apt-get on Debian on an x86 NUC
- System information:
openhab> shell:info
Karaf
Karaf version 4.4.3
Karaf home /usr/share/openhab/runtime
Karaf base /var/lib/openhab
OSGi Framework org.eclipse.osgi-3.18.0.v20220516-2155
JVM
Java Virtual Machine OpenJDK 64-Bit Server VM version 17.0.9+8-LTS
Version 17.0.9
Vendor Azul Systems, Inc.
Pid 8080
Uptime 22 hours 31 minutes
Process CPU time 2 hours 7 minutes
Process CPU load 0.03
System CPU load 0.43
Open file descriptors 2,291
Max file descriptors 102,642
Total compile time 8 minutes
Threads
Live threads 4887
Daemon threads 123
Peak 4897
Total started 45495
Memory
Current heap size 461,405 kbytes
Maximum heap size 2,097,152 kbytes
Committed heap size 1,048,576 kbytes
Pending objects 0
Garbage collector Name = 'G1 Young Generation', Collections = 1547, Time = 37.442 seconds
Garbage collector Name = 'G1 Old Generation', Collections = 0, Time = 0.000 seconds
Classes
Current classes loaded 32,847
Total classes loaded 44,225
Total classes unloaded 11,378
Operating system
Name Linux version 4.19.0-25-amd64
Architecture amd64
Processors 4
Total physical memory 8,052,296 kbytes
Free physical memory 278,068 kbytes
The system starts normally, but the problem occurs at some point (hours) where one (of two) ZWave sticks stops working. Disabling/enabling the binding, via UI or console, does not restore functionality. Stopping/starting the service seems to work.
2023-11-13 09:18:40.204 [DEBUG] [wave.discovery.ZWaveDiscoveryService] - Creating ZWave discovery service for zwave:serial_zstick:zwavesecure with scan time of 60
2023-11-13 09:18:40.204 [DEBUG] [wave.discovery.ZWaveDiscoveryService] - ZWave discovery: Active zwave:serial_zstick:zwavesecure
2023-11-13 09:18:40.205 [INFO ] [zwave.handler.ZWaveControllerHandler] - Attempting to add listener when controller is null
2023-11-13 09:18:40.355 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - Initializing ZWave serial controller.
2023-11-13 09:18:40.355 [DEBUG] [zwave.handler.ZWaveControllerHandler] - Initializing ZWave Controller zwave:serial_zstick:zwavesecure.
2023-11-13 09:18:40.368 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - Initializing ZWave thing handler zwave:device:zwavesecure:node2.
2023-11-13 09:18:40.369 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - Initializing ZWave thing handler zwave:device:zwavesecure:node6.
2023-11-13 09:18:45.390 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - Connecting to serial port '/dev/ttyACM0'
2023-11-13 09:18:45.399 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - Starting receive thread
2023-11-13 09:18:45.400 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - Serial port is initialized
2023-11-13 09:18:45.400 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Starting ZWave thread: Receive
2023-11-13 09:18:45.400 [DEBUG] [zwave.handler.ZWaveControllerHandler] - Initialising ZWave controller
2023-11-13 09:18:45.402 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller
2023-11-13 09:18:45.405 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.
2023-11-13 09:18:45.406 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.lang.Thread.start0(Native Method) ~[?:?]
at java.lang.Thread.start(Thread.java:809) ~[?:?]
at java.util.Timer.<init>(Timer.java:188) ~[?:?]
at java.util.Timer.<init>(Timer.java:170) ~[?:?]
at java.util.Timer.<init>(Timer.java:143) ~[?:?]
at org.openhab.binding.zwave.internal.protocol.ZWaveController.<init>(ZWaveController.java:168) ~[?:?]
at org.openhab.binding.zwave.handler.ZWaveControllerHandler.initializeNetwork(ZWaveControllerHandler.java:190) ~[?:?]
at org.openhab.binding.zwave.handler.ZWaveSerialHandler.watchSerialPort(ZWaveSerialHandler.java:142) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:840) ~[?:?]
2023-11-13 09:18:45.408 [DEBUG] [nal.protocol.ZWaveTransactionManager] - ZWaveReceiveThread queue empty
2023-11-13 09:18:45.409 [DEBUG] [nal.protocol.ZWaveTransactionManager] - Transaction SendNextMessage 0 out at start. Holdoff false.
2023-11-13 09:18:50.401 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@3b415270' takes more than 5000ms.
2023-11-13 09:18:50.403 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@3b415270' takes more than 5000ms.
2023-11-13 09:18:50.404 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@3b415270' takes more than 5000ms.
I graphed my heap usage over time, and it stays within expected (and generous) limits.
What I did see increasing was the thread count. Looking at the shell:info I pasted above, you can see where the threads are in the 4887-4897 range; when it gets here, OH starts doing bad things.
Looking into related threads regarding out or memory/threads, I found a thread where dev:dump-create is used. Looking into the threads.txt, I see literally thousands of entries like this:
"HttpClient@66c1ab89-6296" Id=6296 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@29589826
at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@29589826
at java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
"HttpClient@66c1ab89-6297" Id=6297 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@29589826
at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@29589826
at java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
"HttpClient@66c1ab89-6298" Id=6298 TIMED_WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@29589826
at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@29589826
at java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
"HttpClient@66c1ab89-scheduler-1" Id=6299 WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5275657a
at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method)
- waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@5275657a
at java.base@17.0.9/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:506)
at java.base@17.0.9/java.util.concurrent.ForkJoinPool.unmanagedBlock(ForkJoinPool.java:3465)
at java.base@17.0.9/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3436)
at java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1623)
at java.base@17.0.9/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1170)
at java.base@17.0.9/java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:899)
at java.base@17.0.9/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1062)
at java.base@17.0.9/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1122)
at java.base@17.0.9/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base@17.0.9/java.lang.Thread.run(Thread.java:840)
Now I’m stuck… I can’t figure out what is creating all those threads that are waiting on a network “something” to happen?
Any suggestions of how to proceed from here with debug?
[Edited to correct OH version of 4.0.4 in the text, rather than Karaf version]