Queue Full Messages in Log

SuperTed · January 6, 2025, 8:33am

Hi

I need some help in identifying an issue that regularly requires a reboot of OH.

The issue started after upgrading to OH 4.2 and has continued into OH 4.3.

Hardware: Was previously running OH on a RPi 3 B+ with 1GB RAM. Thought it’s time I upgraded to more RAM after these issues started so currently running an RPi 4 with 8GB RAM. The problem started on the former and has continued on the latter.

JVM version: openjdk 17.0.13

This may be a symptom but in the logs, I start getting the following every second:

2024-12-29 07:07:40.421 [ERROR] [e.ember.internal.ash.AshFrameHandler] - AshReceiveParserThread Exception: 
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:329) ~[?:?]
	at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler.handleIncomingFrame(AshFrameHandler.java:242) ~[bundleFile:?]
	at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler.access$700(AshFrameHandler.java:59) ~[bundleFile:?]
	at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler$AshReceiveParserThread.run(AshFrameHandler.java:319) [bundleFile:?]

and

2025-01-05 03:31:38.943 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
	at java.lang.Thread.start0(Native Method) ~[?:?]
	at java.lang.Thread.start(Thread.java:809) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:945) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.ensurePrestart(ThreadPoolExecutor.java:1593) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor.reExecutePeriodic(ScheduledThreadPoolExecutor.java:360) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:307) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]

When these errors happen, I get different behaviours. Sometimes, no rules will run at all. Other times, rules run but I get to notifications, rules run but the actions aren’t completed such as updating the state of an item, sometimes, rules will work after 2-3 attempts.

I came across the following (old) topic Zigbee devices offline after migration to openhabian 1.7 - #3 by domofl and although it didn’t offer any real solution, I:

removed the two Zigbee devices I have.
Removed the Zigbee Controller.
Re-added the Zigbee Controller.
Hard reset the two Zigbee Devices
Re-added the two Zigbee devices to the Controller.

None of this has made a difference.

So, in order to try and get to the bottom of it, I set up Zabbix with additional JVM Monitoring and I’m still confused.

I am far from a Java expert however, nothing stands out to me in terms of resource constraints. I’ve had the issue twice in the last 7 days and during that time:

CPU utilisation avg. is 2.5% and has been a max of 5.7%.
RAM utilisation has been an avg. of 17% with a maximum of 19%.
Committed Heap Memory is 250MB and the avg used by the JVM is 210MB with a max of 216MB.

What I have seen from Zabbix on the back of the latest incident and it may be expected (I’m not sure):

Garbage collections per second also climbs at the point of JVM start however, 3 days before I start seeing the errors, it goes to 0 and stays there until the next reboot.
Threads max out at approx. 4,900.

I also added the following line to the Java Options file

EXTRA_JAVA_OPTS="-Xms192m -Xmx768m -XX:+ExitOnOutOfMemoryError"

however, the JVM does not restart because looking at the data, it’s because it doesn’t look like a memory issue despite what the errors in the logs say.

Any help is much appreciated.

chris · January 6, 2025, 8:48am

My guess this is a symptom of other issues, and not the cause.

The ASH queue is only 20 frames long - each frame is around 200 bytes (max - most will be only 50 or 60 bytes). So, unless you have less than 4kb of memory, this is unlikely to be the cause of a memory problem.

This queue is the receive queue for zigbee, and normally this is processed in real-time. As above, the queue is short, and if the queue is overrunning, then probably something somewhere is blocking and preventing the system from running.

SuperTed · January 6, 2025, 2:58pm

Thanks. Thought as much.

Any ideas where to even begin given that Zabbix doesn’t appear to indicate any memory issues but I’m conscious i may be interpreting it incorrectly or some imperative metric is missing from my monitoring.

rlkoshak · January 6, 2025, 3:28pm

The Doctor Binding helps find issues with your system [4.0.0;4.9.9] is an addon that will let you monior the JVM Heap. That might be more informative.

SuperTed · January 9, 2025, 11:31am

Any further help is much appreciated as I continue to struggle with this.

I installed The Doctor Binding and although it didn’t tell me anything different to what Zabbix is telling me, it helped me validate some of the data that I am seeing in Zabbix.

So I had the issue again last night and since then, my log is filled with errors such as below but as we’ve established, this is likely a symptom and not the problem:

2025-01-09 10:53:23.467 [ERROR] [e.ember.internal.ash.AshFrameHandler] - AshReceiveParserThread Exception:
java.lang.IllegalStateException: Queue full
        at java.util.AbstractQueue.add(AbstractQueue.java:98) ~[?:?]
        at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:329) ~[?:?]
        at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler.handleIncomingFrame(AshFrameHandler.java:242) ~[bundleFile:?]
        at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler.access$700(AshFrameHandler.java:59) ~[bundleFile:?]
        at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler$AshReceiveParserThread.run(AshFrameHandler.java:319) [bundleFile:?]

I have 8GB of RAM. OH and Zabbix are the only things running on it so I thought I’d increase the max heap size to 2GB.

However, at the time of the issue, between 15-20% of the JVM Heap size was being utilised so it doesn’t appear to be a memory issue on the face of it which is really strange.

I don’t know whether anyone can shed anymore light on the below but this line is the first sign of any issues and the errors come flooding in after this

2025-01-08 22:28:03.066 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - QueuedThreadPool[WebSocketClient@15065109]@1dafba7{STOPPING,8<=0<=200,i=7,r=-1,q=0}[NO_TRY] Couldn't stop Thread[WebSocketClient@15065109-116037,5,main]

SuperTed · January 10, 2025, 11:31am

Just an update on here if it helps anyone in the future…

I think I may have got to the bottom of it but it’s still early days and there’s still some unanswered questions.

In Zabbix, it showed me that the threads are constantly climbing. Where you see the plateau on the below image is where I see the issues and the JVM is not able to create any more threads.

However, upon further research, I came to the conclusion that the number of threads is only limited by RAM, Heap size and CPU. Given that my RAM is only 20% used, my Heap size is only 15% used and my CPU avg is 4%, I didn’t think this would be a cause. My Java knowledge is rather sketchy too so I thought this may possibly be expected.

However, in the absence of any other avenue of investigation, I thought I’d look into the threads and what they’re doing and why they’re climbing so I used a JVM profiler to get visibility of the threads. When I came back in the morning, there were hundreds of parked threads with the name

:coapEndpoint-UDP-224.0.1.187:5683

Then I remembered an error that I have intermittently seen in the logs since the 4.2 OH upgrade

Stopped without executing or closing null

which a previous search told me is from the Shelly Binding but as no one else reported any major issues, I didn’t think too much of it.

So, I deleted the shelly Binding (and switched all my Shelly devices to MQTT) and low and behold, the thread count is now totally stable.

Like I say, it’s early days and I can’t say yet that the issue is fixed. I saw memory issues every 3-4 days but this is the impact on the thread count now that the Shelly Binding has been removed:

So it looks like the Shelly Binding is not disposing of threads correctly.

@markus7017 FYI

EDIT: Just spotted OpenHab 4.2.1 stopping after 3 days - OutOfMemoryError. That user also has the Shelly Binding installed.

markus7017 · January 20, 2025, 12:38pm

puh, those are always the nice things to fix. Do you have additional information? Is it related to specific devices? Specific to Gen 2 or 3 devices (using Web Sockets) How long does it take? Do you see reconnects / errors / exceptions in openhab.log (having info level set to DEBUG or TRACE)