openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

I had set it up as a test system, i personally wouldn’t use a snapshot in productive, but if it solves a more or less critical problem with the memory it’s worth a shot.

I would never recommend to use snapshot releases, but I will share my own personal experience.

FWIW, I’ve been using snapshot versions for the past 5 years (2.x and 3.x) on two productive systems. In fact, I’ve never used anything other than a snapshot release. I don’t like waiting for bug fixes. And I always take a backup before upgrading, so there’s always a fallback. You do have to be a bit careful about which snapshots you use as some of them (in practice, a very small percentage) have severe problems.

3 Likes

I agree with @mhilbush on normally never running snapshots but making exceptions here. I’ve been running the OH3 snapshot since it became “mostly stable” months ago (pre-release). I really don’t have any issues on it on the whole. I have had one or two issues where a commit caused some issues and no one caught it that day, loaded the snapshot, had a horrible result, and just backed it out. That said, that’s rare. One thing to definitely note though, this isn’t a 3.0.x-SNAPSHOT, this is a 3.1-SNAPSHOT. So you are pulling a whole bunch of new stuff in and you need to be aware of any conflicts that may not have had documentation published yet formally. Just backup your system first.

1 Like

Thanks Mark and Morph for running this down and finding a fix, I followed progress on github and this was a nasty buggy :+1: :+1: :+1:

I’d like to hold on to this, so I’m still on 3.0.1. From what I understand the plan is to release a 3.0.2 with these critical fixes? Until then, my system works fine if I just keep an eye on the free memory and restart OH when it’s getting full, which has been around once a week now for a while.

Raspbian GNU/Linux 10 (buster)
Linux 5.10.17-v7l+ x86
Raspberry Pi 4 Model B Rev 1.4 8GB
Openhab 3.1.0-2253

Since what felt like the last 50 snapshots, Openhab 3.1.x has become more and more sluggish after a running time of approx. 8 hours and then stops working completely.
Logging is still active.
That means the Habpanel, HappApp and also the DSL rules are no longer processed.
After a restart, OH runs perfectly again for about 6-8 hours.

The first warning messages in the logbook are:
2021-03-08 06: 52: 41.939 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 55: 39.494 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 58: 23.832 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 58: 46.834 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.
2021-03-08 06: 59: 20.055 [WARN] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.internal.items.ItemUpdater@a0ac64’ takes more than 5000ms.

Then followed by timeouts of all bindings and rules failures, etc.
Then “OutOfMemoryError: Java heap space” can also be found in the log!

I haven’t made any major changes to the system in weeks.

I only switched from 3.0.0/1 to the 3.1.x snapshots because of the “Java Heap Space” problem with DSL rules via MainUI in OH3.0, because unfortunately the problem was only fixed here.

I don’t find the problem, it must be due to a significant change in the OH3.1.x snapshot.

I hope not that this bug will flow into the 3.1 final.

Unfortunately, I can no longer say exactly from which snapshot version this problem occurred.

Can someone help me or confirm a similar problem?

To all people who have problems that no one else has, cannot be understood and no developer can explain himself.

The error described was eliminated by a complete new installation (new image installed and backup imported).

This means that the error was not due to the above snapshots, but to the environment around it (Linux, Java etc. etc.), which may be caused by file errors.

→ Just as a tip, if in doubt, always reinstall the entire Openhab environment first to rule out such errors!

A wish to the developers of Openhab, can’t you check the environment when starting Openhab to see if everything around is OK?

build 2262 here, and still 100% CPU (java process) :frowning:

edit: it seems 2267 is OK, CPU is normal

Just out of curiosity, and it probably won’t fix this, can you shut OH down, clear the cache, and see if it fixes the issue?

1 Like

clearing the cache didn’t fix this issue - after 3 days same same

I am not sure, but I figured I only have problems when utilizing the Z-Wave network:

I had problem with stable 3.0. version. It was running out of memory very soon, just few hours and cpu consumption was also high. But since I upgraded to 3.1.0.M2 I did not encounter OOME. More than 10 days have passed no issue. I think my issue was fixed due to these two changes. Cache script and Invalid dsl rule

1 Like

I had the
openHAB 3.1.0
Build #2317

Installed, the error is still there After a few hours.
My Build is the m3?

and again…

2021-04-11 19:35:13.065 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@21e08cfd' takes more than 5000ms.

2021-04-11 19:35:15.273 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@38b703b7' takes more than 5000ms.

==> /var/log/openhab/events.log <==

2021-04-11 19:35:15.299 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'fenstersensoroffice_lastupdate' changed from 2021-04-11T18:43:52.137+0200 to 2021-04-11T19:34:07.728+0200

2021-04-11 19:35:15.302 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Memory_Used' changed from 1450 to 1449

==> /var/log/openhab/openhab.log <==

2021-04-11 19:35:21.728 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = bb38dcdb-82d6-4ae4-a732-23fea397de7f, base URL = http://localhost:8080)


CPU LOAD 99.5%

after 18hr uptime.

can someone of the developers say something to this?
thats a really big problem!

Openhab ´3 is definitv not a produktiv system, openhab 2.5 runs really smooth and without any problems.

1 Like

and again…

:exploding_head:

Most people in this thread have had their version of this symptom sorted out by the rule related fixes. You have not.
It is only a symptom, and the situation could arise from many potential root cause sources.

To try to find the source of your problem, you are going to have to carefully go through elimination and info gathering steps that previous posters did.

Sorry, rossko57, but IMHO this issue is not solved yet.

To me it is obvious that this is a memory leakage problem of some sort. And no matter which component is causing it, I can’t really imagine that a re-installation of openHAB can solve it, neither a weird issue in legacy configuration from pre-migration times is causing it.

For my part, this issue is clearly reproducible if I leave VS Code open for a longer time, whereas it is unpredictable how long it takes until the issue appears. Most of the time, Z-Wave binding is the first that suffers, but this seems to be just a symptom, the first victim, basically.

Since I upgraded to V3, the VS Code plugin which should report coding issues in rules is not working reliably anymore, as some files are flagged having errors, others are not. With that, I also thought first it might be due to some compilation/coding issues in my rules (migration from openhab 2 to 3 unfortunately made many changes necessary), and with the faulty plugin it took me weeks to find all rules that still have issues.
Now, I believe that I meanwhile found and fixed all faulty rules. However, if I leave VS Code running for some while (i.e. 4 hours, my machine has 8GB RAM), issues appear again. I can’t rule out that there still might be some rarely-triggering rule that still has issues, or if it is just due to the VS Code plugin running - however, whatever it is, this is definitely a bug that has been introduced with openHAB V3, and I’m very concerned that apparently no one is taking it serious.

Regards,
stedon81

P.S.: My environment: openHAB running on a virtual Debian machine, VS Code running under Windows, accessing the openHAB files via Samba

1 Like

As rossko57 indicated, most of the people in this thread have had this issue solved for them by bug fixes that have been added to the core. Since these bug fixes did not fix your problem means that your problem is caused by something different. Therefore you need to go through the above thread to see the steps that were taken to identify the problem discussed in this thread to identify your problem, which though it’s caused by something other than what was discussed in this thread, has the same symptoms.

It seems like you may have identified a potential source for your problem so please open a new thread and post all the relevant information you can gather related to the problem including some of the steps above to identify what’s using up the memory, specific version of OH you are running, how it was installed, version of VSCode and version of the OH extension, etc.

It’s not that no one is taking it seriously. It’s that you are the first and as far as I know only person who has reported such a problem. Your specific problem is not related to the problems in this thread. And even with this post we don’t have nearly enough detail to help with the problem.

I noticed the same problem with VS Code. High CPU load when VS Code is running and eventually crashes openhab 3 due to running out of memory, according to the logs. The problem existest on my both openhab 3 raspberry pi’s. To me it looked like after an update from VS code / openhab plugin the problem came back! I noticed this earlier with openhab 2.x.

My sollution was simple, don’t use: Languageserver option in the Openhab plugin. In the past with 2.x, when enabling this, CPU usage shoots to 100% and load became very high (2 - 3), turning the option off calmes the CPU and the problem is gone. In my current situation the CPU also shoots to 100% but never (waiting for 10 minutes and see openhab 3 crash) returns to normal. Just changing the value from ON to OFF or OFF to ON produces this behaviour.

So my tip, don’t use the language server if you have enabled this. I vaguely remember the Rpi isn’t capable, but i could be wrong.

I can confirm I noticed the same symptoms with VS Code OpenHAB extension and OpenHAB stability.

The past week I have not used VS Code, and OpenHAB was very stable and responsive. The weeks before, I did use VS Code to define the model via files. Then I had a lot of problems with OpenHAB stability: CPU saturation, a lot of lag, and finally java.lang.OutOfMemoryError: Java heap space. Sometimes I used VS Code simultaneously on 2 computers and then the problems occurred even faster.

Now I have disabled the Languageserver option and will evaluate if this helps.

I’m running OpenHAB 3.1.0.M3 on RPi4. VS Code OpenHAB extension v1.0.0.
I have about 250 things, 1300 items and 110 rules.

1 Like