openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

mhilbush · February 14, 2021, 11:33pm

@morph166955 I installed that jar, but OH wouldn’t finish loading the rules at startup. I confirmed the jar was installed and active. I tried to restart a couple times with the same result.

morph166955 · February 14, 2021, 11:37pm

Ok. Ill try to rebuild and repost.

mhilbush · February 14, 2021, 11:51pm

Ok. When you repost, can you include the output from the console so that I can confirm the version number?

morph166955 · February 14, 2021, 11:58pm

Will do. Not sure what’s getting stuck on yours, it loads on my test VM however that only has the one test rule in it. Ill pull a clean branch from git and just change the two relevant lines to make sure it’s perfectly compiled.

mhilbush · February 15, 2021, 12:11am

Ok, I have about 10 or so rules files that load, along with several dozen things and a few thousand items.

morph166955 · February 15, 2021, 2:42am

Spoiler alert, it didn’t help…

Going to try a few other things.

2021-02-15 02:22:34.305 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.307 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.306 [WARN ] [e.jetty.util.thread.QueuedThreadPool] -
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:22:34.308 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:26:05.103 [WARN ] [e.jetty.util.thread.QueuedThreadPool] -
java.lang.OutOfMemoryError: Java heap space
2021-02-15 02:26:05.103 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space

morph166955 · February 15, 2021, 2:49am

@mhilbush New version updated. This reverts both [automation] Cache parsed script in order to improve performance by kaikreuzer · Pull Request #2057 · openhab/openhab-core · GitHub and [automation] Correctly return the evaluation result of DSL scripts by kaikreuzer · Pull Request #1952 · openhab/openhab-core · GitHub

I’m still not 100% sure that either of these are the cause, but I can’t find any other changes to the rules engine that would be causing this so I’m crossing my fingers. Worst case, we know what isn’t the problem.

openhab> bundle:update 177 file:///home/openhab/org.openhab.core.model.script.runtime-3.1.0-SNAPSHOT-threadpool23.jar
openhab> bundle:list -s | grep model.script.runtime
177 │ Active │ 80 │ 3.1.0.202102150239 │ org.openhab.core.model.script.runtime

org.openhab.core.model.script.runtime-3.1.0-SNAPSHOT-threadpool23.jar

EDIT: Ignore RRD4J, it’s not the problem either…

EDIT 2: Opened [automation] memory leak caused by bad rule in UI · Issue #2200 · openhab/openhab-core · GitHub to track all of this.

mhilbush · February 15, 2021, 12:53pm

@morph166955 After installing your bundle, the problem still occurs.

Installing your bundle freed up the memory that was being held somewhere. But enabling the bad rule, still resulted in the continuous memory increase.

morph166955 · February 15, 2021, 1:02pm

That’s very interesting. Also gives me some direction as it looks like the restarting of org.openhab.core.model.script.runtime caused the memory to free up which I would hope means that the leak is somewhere in there (or something associated with it). Thank you!

EDIT: For anyone who is following all of this, if you get to a point where you see the memory increasing but the system has not had an OOM crash, what happens when you do bundle:restart on org.openhab.core.model.script.runtime inside the karaf console? Does the memory release?

mhilbush · February 15, 2021, 3:37pm

BTW, I see why you get the stack trace and I don’t. You must have debug enabled on the bundle.

github.com

openhab/openhab-core/blob/127724c0e31ad8abf0855d87fbf96204267a39be/bundles/org.openhab.core.automation.module.script/src/main/java/org/openhab/core/automation/module/script/internal/handler/ScriptActionHandler.java#L66


    public @Nullable Map<String, Object> execute(final Map<String, Object> context) {
        Map<String, Object> resultMap = new HashMap<>();
        getScriptEngine().ifPresent(scriptEngine -> {
            setExecutionContext(scriptEngine, context);
            try {
                Object result = scriptEngine.eval(script);
                resultMap.put("result", result);
            } catch (ScriptException e) {
                logger.error("Script execution of rule with UID '{}' failed: {}", ruleUID, e.getMessage(),
                        logger.isDebugEnabled() ? e : null);
            }
        });
        return resultMap;
    }
}

Seaside · February 16, 2021, 8:16pm

Good work by @mhilbush and @morph166955 trying to fix this issue. I would like to see a comment by any of the maintainers for this problem, seems to be a big issue for a lot of people.

Regards, S

mhilbush · February 16, 2021, 9:00pm

You can see the latest status on the issue here.

mhilbush · February 20, 2021, 5:36pm

I submitted a fix for the invalid UI DSL rule memory leak. I’m aware of one other memory leak, but it occurs much less frequently, so should not be as critical. I’ll open a separate bug report for that issue shortly.

DanielMalmgren · February 20, 2021, 7:06pm

Looks like there’s work ongoing for most of these problems. Is there any more info needed?

My memory usage for the last week looks like this:

(I started measuring this monday and short thereafter restarted OH)
I’m planning to do a OH restart again soon too get more free RAM again. Anything I could check before doing that to provide more information about what’s leaking? I’m on 3.0.1.

morph166955 · February 20, 2021, 8:10pm

Looks like it was committed and @Kai has cherry picked it for the next 3.0.x release.

Sascha_ · February 21, 2021, 11:54am

I’ve deployed the Snapshot S2217 into my production today 10:33am and memory still looks good to me, only a slightly increasing graph:

Thank you all very much for this

mhilbush · February 21, 2021, 1:31pm

I opened a separate issue for another memory leak. This one is much less severe than the one caused by an invalid rule. I’m not clear yet on how this one should be fixed.

morph166955 · February 21, 2021, 9:11pm

For anyone looking to test this, 3.1.0-S2216 and beyond should include the fix committed yesterday. I’d be very curious to know if someone wants to upgrade to the snapshot and see if it fixes the issues.

Schnuecks · February 22, 2021, 7:45am

I had setup my PI4 with the latest snapshot and it seems to solve that issue. Had run it over a few hours with that rules that break the other one and it runs at normal memory usage no error so far. thanks for all.

johannesbonn · February 22, 2021, 9:04am

Is the latest snapshot stable enough to use it for my productive system? Actually I am using 3.1m1 without problems, separate from the heap space issue. Do you have any problems with the snapshot build?