openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

rbeudel · February 9, 2021, 7:38am

I modifyd the java settings. Since four days no freeze. Openhab is using up to 90% of my 2GB memory of the VM. No other programms running in this VM. For the moment it is fine.

Schnuecks · February 9, 2021, 7:40am

Have set it in my docker-compose and will report back.

wzbfyb · February 9, 2021, 7:45am

coun’t me in.

I use MQTT, ZWave, Enocean, Netatmo, Squeezebox, HABot.

Just had it crash this morning. Saw MQTT issues in the log, but actually what happened I think is that a motion sensor caused a (DSL, UI created) rule to fire.

to add insult to injury, the docker container does not exit on such an exception. meaning that the restart policies of Docker cannot step in. I created a ticket for the latter:

icompas · February 9, 2021, 7:53am

I’m pretty certain, that because of the slow down of the entire system, all communication via IP is impacted also. The Linux cli is very slow in that moments.
So far, I observed this morning, that the memory usage increased by 6% day to day.
I pulled a dump this morning. Please find it attached.

Also, there was post on this topic mentioning something about multi threading and single threading with some setting. Due to the community issues yesterday, that entry seems to be disappeared.

my-dump.txt (210.2 KB)

Actually, I have not a clue to what exactly to look, but I don’t see any indication of any huge number of locked threads. HTTP and LSP connections are waiting for a connection and anything else seems to have a status of either waiting of runnable.

Sascha_ · February 9, 2021, 3:16pm

Hey all,

I had another one. After restart, I noticed a usage increase in memory, so I disbled all rules (first yellow box left) - leading to a NEARLY continous line but slowly increasing (up to second box left).

At that moment I enabled my run-often rules again and this led to a parallely increase of the usage up to approx. 65% (third box). Without any intervention, the usage then sunk and slowly increased again.

Last box is current behaviour, it reaches a “top point” and then sinks again.

I suspect rules and/or the management of the rules / rules engine to be the cause.
Or maybe its the item update handler - because my rules change a few items every run.

Lets keep on watching,
Sascha

rossko57 · February 9, 2021, 4:04pm

Nice picture
I’ll interpret it as meaning rules do use memory, but it gets periodically garbage collected (your downward step). All seems to be as it should be.

Underlying that, something else is slowly grabbing memory, the gentle slope, which is never recovered.

Look at the bit in the middle - turning on rules increases slope upwards, but then that is mostly recovered at the step down. The difference between step and start looks to be a continuation of the gentle line.

Seaside · February 9, 2021, 4:34pm

I’ve ruled out rrd4j, which I have uninstalled. I don’t have any rules in my instance but a fair amount of item updates, and it runs out of memory in 48hrs or so.

Trying different memory settings for the jvm to see if it helps.

Sascha_ · February 9, 2021, 4:47pm

So the item update handler is not out of the picture yet, I guess.

Did you install no bindings and no rules or did you use any bindings and/or “just disabled” the rules?

mhilbush · February 9, 2021, 6:21pm

I’m sorry, I haven’t been following this thread very closely. So perhaps I’m misinterpreting what you mean by “item update handler”. If so, please disregard what I posted below.

A while back I wrote a binding that can generate a high rate of state updates, which I use occasionally to check OH memory utilization and CPU consumption. In the image below, I’m running 220 state updates per second. While on the low end of what I’m able to test, you can see the memory utilization is very consistent over the 20+ minutes it’s been running. I’d run the generator at a higher rate (I can get over 2000 state updates per second), but I currently have mapdb persistence turned on for all state updates, so it’s tasking the box a wee bit.

Schnuecks · February 10, 2021, 7:25am

Hello,

the problem still exists with the new xms and xmx settings.

Here are my logs
openhab.log (343.8 KB)
events.log (738.2 KB)

bodow · February 10, 2021, 10:58pm

I am on 3.0.1 and I just want to share my experiences

Last weekend I migrated 42 of my text based rules to the GUI. All of these rules were DSL rules. Before the migration I didn’t have any java.lang.OutOfMemoryError problems. After the migration it took 3-4 hours after restart of openHAB, that OutOfMemoryError happened. This was reproduceable several times. Then I disabled all GUI rules and activated the text based rules again. After that no more OutOfMemoryErrors happened again (for 48 hours until now).

I also have the TR064 binding running and I can say, that it is capable of running without OutOfMemoryErrors. In my system version 3.1.0.202101260427 of the TR064 binding is enabled.

My experience with Java based system is, that you won’t succeed with raising the Java heap space in case of memory leaks. Normally it just takes longer until OutOfMemoryErrors happen. Therefore the adjustment of EXTRA_JAVA_OPTS isn’t a sustainable solution.

Schnuecks · February 11, 2021, 12:15am

I had done exactly the same, with the same conclusion. dsl rules had worked without any problem. since migration this error occurs.

edit: yesterday i had change that kind of rules which triggers on item change, and set them to cron mode all 10min and so far, all is up an running.

Seaside · February 11, 2021, 7:41pm

This sees to be a serious issue. Any core developers looking in to this? Guess there should be a github issue.
For me currently I only see this on my testcluster rpi3 not on my pi4, but they run quite different environments and bindings. On the rpi3 I run remote openhab but I don’t think it is leaking memory.

morph166955 · February 13, 2021, 2:31am

So before I go dumpster diving into the new rules engine code for a leak, is everyone that’s having this issue using the new rules UI or are some people having this problem using strictly the older text based rules files? I have a PR open to deal with some of this, but knowing where to look will help.

wzbfyb · February 13, 2021, 6:15am

I have a mix of rules files, ui dsl and ui js

johannesbonn · February 13, 2021, 7:36am

I have also a mix and I have started to move my dsl rules to ecma and from day to day it gets better and better. I think the dsl ui rules are the problem…

SpaceGlider · February 13, 2021, 8:02am

I only have text based rules files and have this issue too.

Interestingly, I often see this issue occur when I open VS code (which connects to OH REST API?).

Raspberri Pi 4 with 4GB RAM.

DanielMalmgren · February 13, 2021, 9:00am

I was on 3.0 milestones for quite a while without any problem (with old text based rules) and it wasn’t until I converted them over to rules based (DSL) rules my problems started, so in my case I’m quite certain it has with this to do. Still not quite sure everyone else in this thread is affected by the same problem, I think we have with multiple different problems to do…

spy0r · February 13, 2021, 9:21am

For all those creating OOM errors with GUI DSL rules, you are familiar with DSL-rules via UI cause "out of memory" errors (java heap space) · Issue #2031 · openhab/openhab-core · GitHub ?

csand · February 13, 2021, 10:57am

I also have this problem. I’m on OH 3.0.1 where the mentiond bug #2031 should be resolved.
It takes around 16 hours for the system to first become very slow and unresponsive, before it completely stops working with lots of OOM messages in the log.
It often starts with timeout messages of the “tr064” binding which then cannot get data from its channels.

2021-02-13 07:40:26.159 [INFO ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wanTotalBytesSent, getAction=GetTotalBytesSent, dataType='ui4, parameter='null'}: java.util.concurrent.TimeoutException: Total timeout 2000 ms elapsed

continued by messages of the ItemUpdater :

2021-02-13 07:45:51.964 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@19b9384' takes more than 5000ms.

and finally results in “Java heap space” errors, like the following :

2021-02-13 07:56:31.835 [WARN ] [era.internal.handler.IpCameraHandler] - !!!! Camera possibly closed the channel on the binding, cause reported is: Java heap space
...

2021-02-13 08:29:18.276 [WARN ] [org.eclipse.jetty.io.ManagedSelector] - java.lang.OutOfMemoryError: Java heap space
2021-02-13 08:29:09.150 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space

I converted all my text based DSL rules to UI based ECMA rules.

Because of the last message above of the “WrappedScheduledExecutorService” I was thinking about what rules or whatever I have configured that is schedules in my openhab environment.
I have only one rule (ECMA) that runs during midnight to reset some item states.

The only other thing that might be scheuled ist the image polling of the “ipCamera” binding.

So I disabled the imagepolling for now and will see if the OOM will disappear.

I will keep you informed.