Java CPU 100% on 1 core

oppo1967p · December 29, 2022, 10:18am

Hi,
I am using OH3 version 3.3 on openhabian platform (RPi3).
Using only few bindings from original source.

I found recently that the system was not responsive and was very slow especially in the webpage display, iOS app, etc.

So I checked the CPU usage using top and htop and found that the RPi was using only 1 core and using it at 100% (even over 100% sometimes). The process using this CPU time is “java” with user “openhab”. But the core used is not always the same one. It changes over time.

Due to the high CPU usage I am not even able to access the karaf console.

I also noticed in the past that after a while the unit just freezes and stop responding at all.
So, I don’t want to reboot it as I want to find the root cause.

How can I track down what is causing this CPU usage?
How can I see how much CPU is using a specific binding?
Why is openhab using only 1 core?

Any help will be appreciated.

oppo1967p · December 29, 2022, 10:52am

Now I am getting these error messages in the log:

2022-12-29 11:30:00.467 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@ca3cb0' takes more than 5000ms.

2022-12-29 11:30:18.026 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@ca3cb0' takes more than 5000ms.

2022-12-29 11:30:49.015 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@12c1a04' takes more than 5000ms.

2022-12-29 11:31:46.338 [INFO ] [ternal.dhcp.DHCPPacketListenerServer] - DHCP request packet listener online

2022-12-29 11:31:46.342 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.automation.internal.module.handler.GroupStateTriggerHandler@1c6836e' takes more than 5000ms.

2022-12-29 11:32:17.136 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@12c1a04' takes more than 5000ms.

2022-12-29 11:32:34.537 [WARN ] [WaveSerialHandler$ZWaveReceiveThread] - Exception during ZWave thread. 

java.lang.IllegalStateException: Queue full

	at java.util.AbstractQueue.add(AbstractQueue.java:98) ~[?:?]

	at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:326) ~[?:?]

	at org.openhab.binding.zwave.internal.protocol.ZWaveTransactionManager.processReceiveMessage(ZWaveTransactionManager.java:447) ~[bundleFile:?]

	at org.openhab.binding.zwave.internal.protocol.ZWaveController.incomingPacket(ZWaveController.java:1094) ~[bundleFile:?]

	at org.openhab.binding.zwave.handler.ZWaveControllerHandler.incomingMessage(ZWaveControllerHandler.java:462) ~[bundleFile:?]

	at org.openhab.binding.zwave.handler.ZWaveSerialHandler$ZWaveReceiveThread.run(ZWaveSerialHandler.java:384) [bundleFile:?]

2022-12-29 11:33:00.780 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.automation.internal.module.handler.GroupStateTriggerHandler@1b8b098' takes more than 5000ms.

oppo1967p · December 29, 2022, 11:15am

Here more recent error messages:

2022-12-29 12:10:16.103 [INFO ] [openhab.event.ItemStateChangedEvent ] - [FAILED toString()]

==> /var/log/openhab/openhab.log <==

2022-12-29 12:02:28.228 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 

java.util.concurrent.RejectedExecutionException: Queue capacity exceeded

	at java.util.concurrent.ForkJoinPool$WorkQueue.growArray(ForkJoinPool.java:916) ~[?:?]

	at java.util.concurrent.ForkJoinPool$WorkQueue.lockedPush(ForkJoinPool.java:871) ~[?:?]

	at java.util.concurrent.ForkJoinPool.externalPush(ForkJoinPool.java:1902) ~[?:?]

	at java.util.concurrent.ForkJoinPool.externalSubmit(ForkJoinPool.java:1921) ~[?:?]

	at java.util.concurrent.ForkJoinPool.execute(ForkJoinPool.java:2453) ~[?:?]

	at java.util.concurrent.CompletableFuture.asyncRunStage(CompletableFuture.java:1750) ~[?:?]

	at java.util.concurrent.CompletableFuture.runAsync(CompletableFuture.java:1944) ~[?:?]

	at com.qubular.openhab.binding.vicare.internal.VicareDeviceThingHandler.handleCommand(VicareDeviceThingHandler.java:293) ~[?:?]

	at com.qubular.openhab.binding.vicare.internal.VicareBridgeHandler.lambda$featurePoller$0(VicareBridgeHandler.java:118) ~[?:?]

	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]

	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[?:?]

	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[?:?]

	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]

	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]

	at java.lang.Thread.run(Thread.java:829) [?:?]

2022-12-29 12:11:17.546 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 

java.lang.OutOfMemoryError: Java heap space

2022-12-29 12:11:21.932 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.metrics.ThingStateMetric@b3ee26' takes more than 5000ms.

2022-12-29 12:11:35.006 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller

2022-12-29 12:11:35.010 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.

2022-12-29 12:12:05.493 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@12c1a04' takes more than 5000ms.

oppo1967p · December 29, 2022, 3:06pm

Anyone?

I had to reboot the RPi and magically the CPU load decreased, the system returned responsive, etc etc.
As it happens frequently (every month or so) I need to understand what is going on as this behaviour in a domotic system is dangerous.

Any one can help?

Lionello_Marrelli · December 29, 2022, 3:15pm

Something seems to be causing a memory leak. Once the memory is full many strange things happen. I use the systeminfo binding to monitor the memory allocation, both total memory and java heap space. You may be able to guess what’s the cause by removing some items or bindings until the memory leak does not happen anymore.

oppo1967p · December 29, 2022, 3:22pm

Hi Lionello,
a memory leak should not happen in the certified bindings activated from inside OH, as they are checked by the developers, right?

So where the leak could happen?

Lionello_Marrelli · December 29, 2022, 3:34pm

Developers cannot test all possible user’s configuration. I’ve seen in the past several threads about memory leaks for some bindings.
There have been threads reporting problems with zwave (e.g. this one)

oppo1967p · December 29, 2022, 3:45pm

I understand, but my OH3 fails sometime after a month, sometimes after 3 months.
How can I check all the bindings with this timeframe?
Any suggestion?

oppo1967p · December 29, 2022, 6:04pm

Hi Lionello,
after restart, I have

usedHeap=60%-91% (working smoothly)
available heap=30-120MB
memory available=189MB

what should be the threshold for an alarm on the usedHeap?

hilbrand · December 29, 2022, 6:25pm

The out of memory also could be collateral damage. If data is received but can’t be processed queues can increase in size and cause the system to run out of memory. At least that is what I’m reading in your logs. It can be binding locks the shared scheduler causing other bindings not be able to process data, or even the binding itself.
In your log I see an unknown binding com.qubular.openhab.binding.vicare that also reports it’s queue is full. It could be this binding is blocked, either due to an other binding blocking or the binding blocking itself, while still receiving data and eventually causing it to run out of memory. (Or the queue running full is happening in the zwavel binding). A way to check if threads are blocked can be done in the karaf Console :

threads --monitors --locks

oppo1967p · December 29, 2022, 6:35pm

thanks for your reply.

Karaf was not accessible unfortunately.
When I tried to login Karaf, there was a timeout.