openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

@morph166955 I installed that jar, but OH wouldn’t finish loading the rules at startup. I confirmed the jar was installed and active. I tried to restart a couple times with the same result.

Ok. Ill try to rebuild and repost.

Ok. When you repost, can you include the output from the console so that I can confirm the version number?

Will do. Not sure what’s getting stuck on yours, it loads on my test VM however that only has the one test rule in it. Ill pull a clean branch from git and just change the two relevant lines to make sure it’s perfectly compiled.

Ok, I have about 10 or so rules files that load, along with several dozen things and a few thousand items.

Spoiler alert, it didn’t help…

Going to try a few other things.

2021-02-15 02:22:34.305 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.306 [WARN ] [e.jetty.util.thread.QueuedThreadPool] -
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.308 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:26:05.103 [WARN ] [e.jetty.util.thread.QueuedThreadPool] -
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:26:05.103 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space

@mhilbush New version updated. This reverts both [automation] Cache parsed script in order to improve performance by kaikreuzer · Pull Request #2057 · openhab/openhab-core · GitHub and [automation] Correctly return the evaluation result of DSL scripts by kaikreuzer · Pull Request #1952 · openhab/openhab-core · GitHub

I’m still not 100% sure that either of these are the cause, but I can’t find any other changes to the rules engine that would be causing this so I’m crossing my fingers. Worst case, we know what isn’t the problem.

openhab> bundle:update 177 file:///home/openhab/org.openhab.core.model.script.runtime-3.1.0-SNAPSHOT-threadpool23.jar
openhab> bundle:list -s | grep model.script.runtime
177 │ Active │ 80 │ 3.1.0.202102150239 │ org.openhab.core.model.script.runtime

org.openhab.core.model.script.runtime-3.1.0-SNAPSHOT-threadpool23.jar

EDIT: Ignore RRD4J, it’s not the problem either…

EDIT 2: Opened [automation] memory leak caused by bad rule in UI · Issue #2200 · openhab/openhab-core · GitHub to track all of this.

1 Like

@morph166955 After installing your bundle, the problem still occurs.

Installing your bundle freed up the memory that was being held somewhere. But enabling the bad rule, still resulted in the continuous memory increase.

That’s very interesting. Also gives me some direction as it looks like the restarting of org.openhab.core.model.script.runtime caused the memory to free up which I would hope means that the leak is somewhere in there (or something associated with it). Thank you!

EDIT: For anyone who is following all of this, if you get to a point where you see the memory increasing but the system has not had an OOM crash, what happens when you do bundle:restart on org.openhab.core.model.script.runtime inside the karaf console? Does the memory release?

BTW, I see why you get the stack trace and I don’t. You must have debug enabled on the bundle.

Good work by @mhilbush and @morph166955 trying to fix this issue. I would like to see a comment by any of the maintainers for this problem, seems to be a big issue for a lot of people.

Regards, S

2 Likes

You can see the latest status on the issue here.

3 Likes

I submitted a fix for the invalid UI DSL rule memory leak. I’m aware of one other memory leak, but it occurs much less frequently, so should not be as critical. I’ll open a separate bug report for that issue shortly.

2 Likes

Looks like there’s work ongoing for most of these problems. Is there any more info needed?

My memory usage for the last week looks like this:

(I started measuring this monday and short thereafter restarted OH)
I’m planning to do a OH restart again soon too get more free RAM again. Anything I could check before doing that to provide more information about what’s leaking? I’m on 3.0.1.

Looks like it was committed and @Kai has cherry picked it for the next 3.0.x release.

I’ve deployed the Snapshot S2217 into my production today 10:33am and memory still looks good to me, only a slightly increasing graph:

Thank you all very much for this :slight_smile:

I opened a separate issue for another memory leak. This one is much less severe than the one caused by an invalid rule. I’m not clear yet on how this one should be fixed.

4 Likes

For anyone looking to test this, 3.1.0-S2216 and beyond should include the fix committed yesterday. I’d be very curious to know if someone wants to upgrade to the snapshot and see if it fixes the issues.

I had setup my PI4 with the latest snapshot and it seems to solve that issue. Had run it over a few hours with that rules that break the other one and it runs at normal memory usage no error so far. thanks for all.

1 Like

Is the latest snapshot stable enough to use it for my productive system? Actually I am using 3.1m1 without problems, separate from the heap space issue. Do you have any problems with the snapshot build?