openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

Raspbian GNU/Linux 10 (buster)
Linux 5.10.17-v7l+ x86
Raspberry Pi 4 Model B Rev 1.4 8GB
Openhab 3.1.0-2253

Since what felt like the last 50 snapshots, Openhab 3.1.x has become more and more sluggish after a running time of approx. 8 hours and then stops working completely.
Logging is still active.
That means the Habpanel, HappApp and also the DSL rules are no longer processed.
After a restart, OH runs perfectly again for about 6-8 hours.

The first warning messages in the logbook are:
2021-03-08 06: 52: 41.939 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 55: 39.494 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 58: 23.832 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 58: 46.834 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 59: 20.055 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.

Then followed by timeouts of all bindings and rules failures, etc.
Then “OutOfMemoryError: Java heap space” can also be found in the log!

I haven’t made any major changes to the system in weeks.

I only switched from 3.0.0/1 to the 3.1.x snapshots because of the “Java Heap Space” problem with DSL rules via MainUI in OH3.0, because unfortunately the problem was only fixed here.

I don’t find the problem, it must be due to a significant change in the OH3.1.x snapshot.

I hope not that this bug will flow into the 3.1 final.

Unfortunately, I can no longer say exactly from which snapshot version this problem occurred.

Can someone help me or confirm a similar problem?

To all people who have problems that no one else has, cannot be understood and no developer can explain himself.

The error described was eliminated by a complete new installation (new image installed and backup imported).

This means that the error was not due to the above snapshots, but to the environment around it (Linux, Java etc. etc.), which may be caused by file errors.

→ Just as a tip, if in doubt, always reinstall the entire Openhab environment first to rule out such errors!

A wish to the developers of Openhab, can’t you check the environment when starting Openhab to see if everything around is OK?

build 2262 here, and still 100% CPU (java process) :frowning:

edit: it seems 2267 is OK, CPU is normal

Just out of curiosity, and it probably won’t fix this, can you shut OH down, clear the cache, and see if it fixes the issue?

1 Like

clearing the cache didn’t fix this issue - after 3 days same same

I am not sure, but I figured I only have problems when utilizing the Z-Wave network:

I had problem with stable 3.0. version. It was running out of memory very soon, just few hours and cpu consumption was also high. But since I upgraded to 3.1.0.M2 I did not encounter OOME. More than 10 days have passed no issue. I think my issue was fixed due to these two changes. Cache script and Invalid dsl rule

1 Like

I had the
openHAB 3.1.0
Build #2317

Installed, the error is still there After a few hours.
My Build is the m3?

and again…

2021-04-11 19:35:13.065 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@21e08cfd' takes more than 5000ms.

2021-04-11 19:35:15.273 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@38b703b7' takes more than 5000ms.

==> /var/log/openhab/events.log <==

2021-04-11 19:35:15.299 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'fenstersensoroffice_lastupdate' changed from 2021-04-11T18:43:52.137+0200 to 2021-04-11T19:34:07.728+0200

2021-04-11 19:35:15.302 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 1450 to 1449

==> /var/log/openhab/openhab.log <==

2021-04-11 19:35:21.728 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = bb38dcdb-82d6-4ae4-a732-23fea397de7f, base URL = http://localhost:8080)


CPU LOAD 99.5%

after 18hr uptime.

can someone of the developers say something to this?
thats a really big problem!

Openhab ´3 is definitv not a produktiv system, openhab 2.5 runs really smooth and without any problems.

1 Like

and again…

:exploding_head:

Most people in this thread have had their version of this symptom sorted out by the rule related fixes. You have not.
It is only a symptom, and the situation could arise from many potential root cause sources.

To try to find the source of your problem, you are going to have to carefully go through elimination and info gathering steps that previous posters did.

Sorry, rossko57, but IMHO this issue is not solved yet.

To me it is obvious that this is a memory leakage problem of some sort. And no matter which component is causing it, I can’t really imagine that a re-installation of openHAB can solve it, neither a weird issue in legacy configuration from pre-migration times is causing it.

For my part, this issue is clearly reproducible if I leave VS Code open for a longer time, whereas it is unpredictable how long it takes until the issue appears. Most of the time, Z-Wave binding is the first that suffers, but this seems to be just a symptom, the first victim, basically.

Since I upgraded to V3, the VS Code plugin which should report coding issues in rules is not working reliably anymore, as some files are flagged having errors, others are not. With that, I also thought first it might be due to some compilation/coding issues in my rules (migration from openhab 2 to 3 unfortunately made many changes necessary), and with the faulty plugin it took me weeks to find all rules that still have issues.
Now, I believe that I meanwhile found and fixed all faulty rules. However, if I leave VS Code running for some while (i.e. 4 hours, my machine has 8GB RAM), issues appear again. I can’t rule out that there still might be some rarely-triggering rule that still has issues, or if it is just due to the VS Code plugin running - however, whatever it is, this is definitely a bug that has been introduced with openHAB V3, and I’m very concerned that apparently no one is taking it serious.

Regards,
stedon81

P.S.: My environment: openHAB running on a virtual Debian machine, VS Code running under Windows, accessing the openHAB files via Samba

1 Like

As rossko57 indicated, most of the people in this thread have had this issue solved for them by bug fixes that have been added to the core. Since these bug fixes did not fix your problem means that your problem is caused by something different. Therefore you need to go through the above thread to see the steps that were taken to identify the problem discussed in this thread to identify your problem, which though it’s caused by something other than what was discussed in this thread, has the same symptoms.

It seems like you may have identified a potential source for your problem so please open a new thread and post all the relevant information you can gather related to the problem including some of the steps above to identify what’s using up the memory, specific version of OH you are running, how it was installed, version of VSCode and version of the OH extension, etc.

It’s not that no one is taking it seriously. It’s that you are the first and as far as I know only person who has reported such a problem. Your specific problem is not related to the problems in this thread. And even with this post we don’t have nearly enough detail to help with the problem.

I noticed the same problem with VS Code. High CPU load when VS Code is running and eventually crashes openhab 3 due to running out of memory, according to the logs. The problem existest on my both openhab 3 raspberry pi’s. To me it looked like after an update from VS code / openhab plugin the problem came back! I noticed this earlier with openhab 2.x.

My sollution was simple, don’t use: Languageserver option in the Openhab plugin. In the past with 2.x, when enabling this, CPU usage shoots to 100% and load became very high (2 - 3), turning the option off calmes the CPU and the problem is gone. In my current situation the CPU also shoots to 100% but never (waiting for 10 minutes and see openhab 3 crash) returns to normal. Just changing the value from ON to OFF or OFF to ON produces this behaviour.

So my tip, don’t use the language server if you have enabled this. I vaguely remember the Rpi isn’t capable, but i could be wrong.

I can confirm I noticed the same symptoms with VS Code OpenHAB extension and OpenHAB stability.

The past week I have not used VS Code, and OpenHAB was very stable and responsive. The weeks before, I did use VS Code to define the model via files. Then I had a lot of problems with OpenHAB stability: CPU saturation, a lot of lag, and finally java.lang.OutOfMemoryError: Java heap space. Sometimes I used VS Code simultaneously on 2 computers and then the problems occurred even faster.

Now I have disabled the Languageserver option and will evaluate if this helps.

I’m running OpenHAB 3.1.0.M3 on RPi4. VS Code OpenHAB extension v1.0.0.
I have about 250 things, 1300 items and 110 rules.

1 Like

Man this was my solution. VS code extension with language server enabled. Over last year or so I fought this, my OH install both 2.x and 3.x would run out of memory “randomly” after weeks of perfect operation. Usually after I made a very minor change or something. I would make the change and minimize vscode, within hours OH was dead, and I would piss around with it for a few days rebooting and lockingup and trying to figure out WHY things changed. I would get frustrated and give up closing VS code, and then boom it worked. Disable language server option, and no more problems.

This needs to be shouted louder, I dunno now many other nerds are playing with the VS Code extension and also chasing this frustrating mystery bug.

1 Like

Presumably just closing your VSC window when finished with it is just as effective?

In my case, just closing VS Code when finished is not a solution. Once I use VS Code with Languageserver enabled, it looks like OpenHAB will eventually run into problems. I used to code in the evening and then closed VS Code when going to sleep. Then I noticed that the “put house in sleep mode” rules did take a while to process. The morning after, the “house wakeup” rules sometimes failed. To prevent this, I restarted my RPi every time after I used VS Code. Then OpenHAB runs fine and rules are processed immediately when triggered.

Good info. So whatever gets messed up, stays messed up. On the face of it, a suspected memory leak in LSP. Given you’ve got more than just vague suspicions, would you be able to open a github issue on core for this?

For me it’s even more weird, dunno if it is related to this bug discussed here, but sine OH3 many of my rules file go crazy if I edited them (in VSCode). E.g. files with statically initialized variables like Maps lost their initialization and hence rules don’t work anymore until I restart OpenHAB. Very hard to experiment with rule changes with a behavior like this. Or log outputs are just not printed, and “System started” rules are not executed.
I have the feeling that complete rule DSL became very fragile with OH3.
And no, rossko57, I can’t be more specific as behavior is very erratic and I haven’t managed to find a safe way to reproduce, otherwise I would have filed a bug.
But maybe there are also others who experienced this, seems that I’m not alone with these issues.