openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

I’m not sure if this is related or not, but the RRD4j service seems to be using an enormous amount of CPU. I compared to my OH3 (which is running RRD4j) and I don’t see any of these threads being under load. I’ve also tuned my RRD4j down to only provide persistence for a small number of items instead of everything as the default currently has. As a test, can you disable/remove the RRD4j service and see if this problem calms down?

"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-1]" #250 daemon prio=5 os_prio=0 cpu=26610.80ms elapsed=141491.92s tid=0xe109b400 nid=0x265 waiting on condition  [0xd7ebd000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-2]" #251 daemon prio=5 os_prio=0 cpu=28106.98ms elapsed=141491.89s tid=0xe35af400 nid=0x266 waiting on condition  [0xd7e6d000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-3]" #252 daemon prio=5 os_prio=0 cpu=27348.65ms elapsed=141491.88s tid=0xe229c800 nid=0x267 waiting on condition  [0xd7e1d000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-4]" #253 daemon prio=5 os_prio=0 cpu=26975.74ms elapsed=141491.88s tid=0xe229e400 nid=0x268 waiting on condition  [0xd7dcd000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-5]" #254 daemon prio=5 os_prio=0 cpu=27351.98ms elapsed=141491.87s tid=0xe35bf000 nid=0x26a waiting on condition  [0xd7d7d000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-6]" #255 daemon prio=5 os_prio=0 cpu=27822.25ms elapsed=141491.86s tid=0xe0e2e800 nid=0x26b waiting on condition  [0xd7d2d000]
"OH-RRD4j-1" #259 prio=5 os_prio=0 cpu=90183.52ms elapsed=141490.48s tid=0x002c8800 nid=0x26f waiting on condition  [0xd7c24000]
"OH-RRD4j-2" #261 prio=5 os_prio=0 cpu=90188.39ms elapsed=141490.42s tid=0xda74a400 nid=0x272 waiting on condition  [0xd99be000]
"OH-RRD4j-3" #263 prio=5 os_prio=0 cpu=90222.19ms elapsed=141488.81s tid=0xe45be000 nid=0x275 waiting on condition  [0xd7bd4000]
1 Like

Just for what it’s worth: I am not (and have never been) using RRD4j, I’m using jdbc persistence.

Also for me currently java is still using less than 1% cpu two days after last restart of OH. So for me this is something that obviously happens very suddenly when it happens, not something that builds up over time.

Now I’m experiencing heap space exception too…it seems related to the scheduler:

2021-02-04 05:04:37.680 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@166d93e' takes more than 5000ms.
2021-02-04 05:05:04.064 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
`java.lang.OutOfMemoryError: Java heap space`

My openhab is stable version 3.0.1-2 on raspberry 4 -_-

RRD4j is on by default on OH3. It must be disabled if you don’t want it.

I’m not entirely sure exactly what happens during an upgrade, but at least in my case the jdbc addon that was installed in 2.5 was also installed after upgrading to 3.0 and no other persistence addons has ever been installed.

This in not necessarily the case. This is just where the memory happened to be out.
Java will spend more and more time garbage collecting when it runs out of memory which explains the high load on the system.

I had the same problems, even with OH 2.5 and this was the reason I migrated to Groovy rules and started to develop a metric addon.
Using metrics I could see that the memory would build up over a couple of days.
With RPi 4 (at least the 4GB+ versions) the problem isn’t that present any more, but with smaller memory I had this a lot.

Different reasons can lead to the same symptoms here so we should not mingle everything. I think removing all DSL rules and see if this still occurs is the first important step.

1 Like

In openHAB 3 the default strategy is to store the state of all items in a rrd4j database. As this is done under a scheduled process it can explain the high number of scheduler processes. I don’t know if this causes the out-of-memory problem.
For more on the default persistence strategy see: rrd4j - Persistence Services | openHAB

I think you are on to something. I have test cluster with one rpi3 that is not running zram. It runs out of memory and I can see a lot of rrd4j activity. I bet disabling rrd4j will make things a lot better

Same here. I was running 3.0 with file based rules without problems. Now with 3.0.1 stable I am running in Java Heap space exception every 1 or two days. My first suspicion was the MODBUS Binding, after I added it and got some pollers on my Pluggit Ventilation, the problem occured. But now I disabled all ModBus-Things, and it still happens.

From openhab.log

2021-02-04 17:15:48.748 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:16:05.475 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:16:39.171 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:17:57.072 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:18:32.636 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:20:49.560 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@85831e' takes more than 5000ms.
2021-02-04 17:21:14.167 [WARN ] [com.zaxxer.hikari.pool.HikariPool   ] - 1m51s816ms328μs101ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2021-02-04 17:23:28.837 [WARN ] [hannel.AbstractChannelHandlerContext] - An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:29.785 [WARN ] [netty.util.concurrent.DefaultPromise] - An exception was thrown by org.openhab.binding.ipcamera.internal.handler.IpCameraHandler$2.operationComplete()
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:29.809 [WARN ] [org.eclipse.jetty.io.ManagedSelector] - java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:14.074 [ERROR] [se.californium.elements.UDPConnector] - Exception in network stage thread [UDP-Receiver-0.0.0.0/0.0.0.0:5683[0]]:
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:22:45.863 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:30.836 [ERROR] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance
2021-02-04 17:23:32.098 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 710d4355-00d2-412a-a507-adca5dec9cce, base URL = http://localhost:8080)
2021-02-04 17:23:32.112 [INFO ] [y.internal.handler.ShellyBaseHandler] - shellyswitch25-10a5fa: Thing goes OFFLINE: message.offline.status-error-watchdog
2021-02-04 17:23:33.210 [INFO ] [y.internal.handler.ShellyBaseHandler] - shellyswitch25-f3fb9b: Thing goes OFFLINE: message.offline.status-error-watchdog
2021-02-04 17:23:44.767 [INFO ] [.reconnect.PeriodicReconnectStrategy] - Try to restore connection to '127.0.0.1'. Next attempt in 60000ms
2021-02-04 17:23:47.925 [INFO ] [.transport.mqtt.MqttBrokerConnection] - Starting MQTT broker connection to '127.0.0.1' with clientid 2e54a7bd-7dc3-44f0-9075-1fc7940ef0d5
2021-02-04 17:23:59.092 [WARN ] [era.internal.handler.IpCameraHandler] - !!!! Camera possibly closed the channel on the binding, cause reported is: Java heap space
2021-02-04 17:24:02.184 [WARN ] [com.zaxxer.hikari.pool.HikariPool   ] - 3m41s182ms62μs223ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2021-02-04 17:26:26.303 [WARN ] [com.zaxxer.hikari.pool.HikariPool   ] - 1m37s459ms62μs834ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2021-02-04 17:25:19.174 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 710d4355-00d2-412a-a507-adca5dec9cce, base URL = http://localhost:8080)
2021-02-04 17:29:26.470 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.lang.OutOfMemoryError: Java heap space

And so on…

Quick poll. Of the people having this issue, are you on Pi2/Pi3 type systems or systems with more memory/cpu?

My system is a Proxmox VM with Buster. I gave 2 cpu(Celeron) und 2 GB memory.

Maybe a wild guess, but I currently believe that disable things does not solve the issue… in my case, the old 2.5.9 system is even more instable since I am disable every"thing" that I have already migraded to a new OH3 instance. I recommend deinstalling the binding itself after the things are disabled. But I have no idea if that will have an effect on your problem :slight_smile:

@papaPhil

Actually yesterday I also thought about this idea, since I noticed in the log the enty “ModBus Manager starting” which I think means the binding is even active with all things disabled. I will follow this idea and uninstall the binding so I can see which effect it has.

Hello community,
I upgraded my OH2.5 with very few content to OH3 nearly on release day. After that I started playing around with blockly based rules and dsl rules. Soon the amount of rules was getting bigger and the system got more an more unstable. The java heap space errors occured nearly daily.

The logfile-entries based on the systeminfo binding showed a memory consumption on a quite stable level. The problems really seem to be related to the gui based rules. I moved all my rules to file based rules and from that moment on openhab doesnt seem to crash anymore.

Maybe the problems are also related to the mixture of openjdk8 and openjdk11. Both are still present because of the upgrade. I should give it a shot and try a fresh, clean install of plain OH3.
Cheers …

1 Like

I think it has nothing to do with the different jdk versions. I started with oh 3 (actually oh3.1 m1) on a fresh system with openhabian. But with the time the number of rules grows, actually 52 ui dsl rules. And from time to time I have Java heap space errors, but mostly when I am playing with the rules (e.g. play the rule to test something).

I use a Raspberry pi 3b+

Proxmox vm with 2cpu and 4gb memory, Ubuntu 20.04

Hi all,
I run into the same problem with fresh install OH3.0 on new Raspberry pi4 4go.
Got some heap space crash after upgrade to OH3.0.1 (note that I installed 3.0.0 not long enough to see this bug perhaps)
Hint : could it be related to Visual Studio Code ? I noticed that before definitively crash and loop 100% cpu my VSC link (LSP) to OH broke. (no code/error check…). Do you play with VSC also ? It is perhaps just a side effect of java memory leak at this moment …

No, I don’t use vsc…

Hello @morph166955,

Due to the forum restore, can you please describe again how to generate the dump file for the Java process? In case of a heap space error I will try to generate a file.