I’m on a Raspberry pi 3 with 1gb RAM. Openhab 4.1.0.M1. The main addons I have are Homematic, MQTT, BigAssFans, TPLink, GPIO, Shelly.
My install has been pretty stable for years. In the last few months I’ve added the Shelly binding, but not a lot else has changed.
In the last couple of weeks I’ve started getting out of memory errors. These broadly look like:
2023-10-06 02:24:22.775 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@ddaba4' takes more than 5000ms.
2023-10-06 02:24:32.708 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@ddaba4' takes more than 5000ms.
2023-10-06 02:24:52.968 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@ddaba4' takes more than 5000ms.
2023-10-06 02:25:21.839 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
at java.lang.Thread.start0(Native Method) ~[?:?]
at java.lang.Thread.start(Thread.java:802) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1593) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor.reExecutePeriodic(ScheduledThreadPoolExecutor.java:360) ~[?:?]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:307) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[?:?]
at java.lang.Thread.run(Thread.java:833) ~[?:?]
I’ve used jstack and jmap to see what’s going on. In broad terms it looks to me like my heap use is pretty consistent through the day - I’ve looked a few times and it seems to be using around 40-60MB, more at startup, a bit less once the garbage collector has been running.
However, what I see is that when these errors started today I had an awful lot of threads - 2,228 in total. A lot of them looked to be websockets that were waiting on a concurrent lock - 1,721 of them in total. Further below is a grep of the threads from the dump, and some examples of what those threads appear to be doing.
I’m interested in any hints as to what might be going on (have people seen something like this before), or what my next step in diagnostics would be to see what’s happening. None of my rules have explicit concurrent locks, so this must be something in the code itself. Given Shelly is the latest thing I added, that might suggest a problem with the Shelly binding, but it could be coincidental.
The list of threads is too long to include (I can attach it if it’s useful, but I suspect it’s not), most of the volume of threads are as follows:
pi@raspberrypi2:~ $ fgrep prio= threaddump.txt
...
"HttpClient@18ab912-scheduler-1" #23913 prio=5 os_prio=0 cpu=0.41ms elapsed=15018.15s tid=0x58244098 nid=0x28bc waiting on condition [0x2ed1e000]
"WebSocketClient@29182530-23942" #23942 daemon prio=5 os_prio=0 cpu=0.59ms elapsed=14958.41s tid=0x2ef10650 nid=0x28de runnable [0x2eaee000]
"WebSocketClient@29182530-23943" #23943 daemon prio=5 os_prio=0 cpu=0.43ms elapsed=14958.40s tid=0x31ef6d10 nid=0x28df runnable [0x2eb3e000]
"WebSocketClient@29182530-23944" #23944 daemon prio=5 os_prio=0 cpu=17.28ms elapsed=14958.40s tid=0x31ef5d58 nid=0x28e0 waiting on condition [0x2ea4e000]
"WebSocketClient@29182530-23945" #23945 daemon prio=5 os_prio=0 cpu=15.95ms elapsed=14958.40s tid=0x2ef044a8 nid=0x28e1 waiting on condition [0x2e9fe000]
"WebSocketClient@29182530-23946" #23946 daemon prio=5 os_prio=0 cpu=12.41ms elapsed=14958.40s tid=0x2ef0e158 nid=0x28e2 waiting on condition [0x2e9ae000]
"WebSocketClient@29182530-23947" #23947 daemon prio=5 os_prio=0 cpu=13.30ms elapsed=14958.40s tid=0x31ef47e8 nid=0x28e3 waiting on condition [0x2e95e000]
"WebSocketClient@29182530-23948" #23948 daemon prio=5 os_prio=0 cpu=14.86ms elapsed=14958.40s tid=0x2ef126c8 nid=0x28e4 waiting on condition [0x2e90e000]
"WebSocketClient@29182530-23949" #23949 daemon prio=5 os_prio=0 cpu=13.74ms elapsed=14958.40s tid=0x2ef15648 nid=0x28e5 waiting on condition [0x2e8be000]
"HttpClient@1bd4a42-scheduler-1" #23950 prio=5 os_prio=0 cpu=0.40ms elapsed=14958.39s tid=0x00862bf0 nid=0x28e6 waiting on condition [0x2e86e000]
"WebSocketClient@7372344-23960" #23960 daemon prio=5 os_prio=0 cpu=0.34ms elapsed=14958.09s tid=0x2ef22a58 nid=0x28f0 runnable [0x2e77e000]
"WebSocketClient@7372344-23961" #23961 daemon prio=5 os_prio=0 cpu=0.24ms elapsed=14958.09s tid=0x2ef207a8 nid=0x28f1 runnable [0x2e68e000]
"WebSocketClient@7372344-23962" #23962 daemon prio=5 os_prio=0 cpu=15.79ms elapsed=14958.09s tid=0x2ef21760 nid=0x28f2 waiting on condition [0x2e6de000]
"WebSocketClient@7372344-23963" #23963 daemon prio=5 os_prio=0 cpu=13.96ms elapsed=14958.09s tid=0x2ef13880 nid=0x28f3 waiting on condition [0x2e81e000]
"WebSocketClient@7372344-23964" #23964 daemon prio=5 os_prio=0 cpu=13.03ms elapsed=14958.09s tid=0x2ef17a60 nid=0x28f4 waiting on condition [0x2e7ce000]
...
Examples of a few of them with full log are as follows:
"WebSocketClient@24187389-829" #829 daemon prio=5 os_prio=0 cpu=303.16ms elapsed=76894.48s tid=0x59256148 nid=0x254e waiting on condition [0x54c64000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
- parking to wait for <0x6f2f0488> (a java.util.concurrent.SynchronousQueue$TransferStack)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.7/LockSupport.java:252)
at java.util.concurrent.SynchronousQueue$TransferStack.transfer(java.base@17.0.7/SynchronousQueue.java:401)
at java.util.concurrent.SynchronousQueue.poll(java.base@17.0.7/SynchronousQueue.java:903)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.reservedWait(ReservedThreadExecutor.java:324)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:399)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
at java.lang.Thread.run(java.base@17.0.7/Thread.java:833)
"WebSocketClient@24187389-830" #830 daemon prio=5 os_prio=0 cpu=378.20ms elapsed=76894.48s tid=0x59256d80 nid=0x254f waiting on condition [0x54c14000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
- parking to wait for <0x6f2f0698> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.7/LockSupport.java:252)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@17.0.7/AbstractQueuedSynchronizer.java:1672)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
at java.lang.Thread.run(java.base@17.0.7/Thread.java:833)
"WebSocketClient@24187389-831" #831 daemon prio=5 os_prio=0 cpu=132.94ms elapsed=76894.48s tid=0x59257db8 nid=0x2550 waiting on condition [0x54bc4000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
- parking to wait for <0x6f2f0698> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.7/LockSupport.java:252)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@17.0.7/AbstractQueuedSynchronizer.java:1672)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
at java.lang.Thread.run(java.base@17.0.7/Thread.java:833)
"WebSocketClient@24187389-832" #832 daemon prio=5 os_prio=0 cpu=253.63ms elapsed=76894.48s tid=0x59259050 nid=0x2551 waiting on condition [0x54b74000]
java.lang.Thread.State: TIMED_WAITING (parking)
at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
- parking to wait for <0x6f2f0698> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.parkNanos(java.base@17.0.7/LockSupport.java:252)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@17.0.7/AbstractQueuedSynchronizer.java:1672)
at org.eclipse.jetty.util.BlockingArrayQueue.poll(BlockingArrayQueue.java:382)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.idleJobPoll(QueuedThreadPool.java:974)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1018)
at java.lang.Thread.run(java.base@17.0.7/Thread.java:833)