openHAB 3 runs out of memory / java heap space errors, CPU 100%+ after a few hours

Don’t clear the cache. It’ll just force the reinstallation of your bindings and make OH that much slower to come up next time. Especially since it’s not doing any good.

If you’re on OH 3 you have rrd4j by default with a strategy of every minute, every change and restore on startup for all supported Items.

Anything is possible but I think that’s unlikely.

Nothing stands out as problematic except…

This is a problem mainly with loading and parsing of .rules files. But I don’t think it makes the rules run slower. But it would be a really good idea to remove the primitives. Never use a primitive or even specific they type of something unless you have to. And if you have to wait for the last possible moment to convert it to a primitive. Leave everything as Numbers instead.

It may not be slowing down you execution but it could be adding minutes or more to how long it takes to load and parse your rules. And if Oh 3 works like OH 2, every time there is a change to one of your .items or .rules files, it will reload them all. If it’s already reloading the files, it will wait for that to finish and then reload them again! If you edit a bunch of files in sequence, you can bring your machine to it’s knees for a couple of hours.

I don’t know if this is still a problem for OH 3 or not but it’s something to watch for.

With those scripts disabled, it continued to run fine all day. I cleaned them up a bit to use less primatives (I think). Turned them back on, and the same thing happened. I enabled the thermostat and indoor environmental scripts (not the weather station).

So the problem is definitely with these scripts.

That seems to be good evidence. But except for the primitives there is nothing in those scripts that appear possible to create a memory leak or the CPU to go out of control. There are no loops, nothing is creating a lot of variables that don’t get thrown away.

I think your best bet at this point is to file an issue.

One more process of elimination to try here. For the thermostat script and indoor environment scripts that were in old DSL, I deleted them and made new ones in ECMA. I have the weather station DSL script disabled for now.

So I’m going to see if it can get through the day without exploding. If the problem seems to go away, I think that points to a memory issue with OH3 and DSL scripts?

Are those DSL scripts created via the UI or from files? I seem to have similar problems with those from the UI…

The UI. And I am quickly realizing there is a potential huge problem with this. These two DSL scripts, composed in the UI, took 4 to 7 seconds to execute. During those 4-7 seconds, the CPU would be over 100%. And after several hours of normal operation (scripts running every 5-10 minutes), the whole thing would just keel over dead out of memory and CPU pegged.

The exact same operation performed by an ECMA script composed in the UI executes so fast the indicator doesn’t even change to running. This has been running for hours now honestly this is the best I’ve ever seen my install of OH3 operate. Everything it is does is instant and nothing is bogging down.

What or why this problem exists is well over my head, but process of elimination here seems to suggest it exists.

You posted only parts of the rules, which triggers are you using and with what settings?

That’s what i also saw. In my case the simplest rules from the UI in DSL take up to 10s with high CPU load, but not neccessarely high memory usage. I reduced how often these rules run but i think i only increased the time between the crashes.

Those are scripts, in their entirety. There are separate rules that trigger them. Those triggers are either item state changes, or an every 5 minutes cron.

1 Like

I experience the same problem. I’ve created a new topic, but I think I can let it merge with this topic.
[OH3] high cpu load, unresponsive OH

Since i now moved all my DSL rules in files again, my cpu load looks stable and extremely low! I think that did the trick in my case!

(same rules, same triggers)

One git issue has to be declared if not existing.

1 Like

I though I have my problems tackled but I still have a high CPU load.
Can someone run a

shell:threads - - list

in the karaf console and see what are the top CPU time consumers? For me DirWatcher and RuleRefresher are suspiciously high!
I use file-only DSL rules, but I also use jython with the (fixed) helper libraries…
When watching the event stream in the debug sidebar, I see sometimes that rules get reloaded even though I did not touch the configuration…
I also still see some org.eclipse.x bundles when I do bundle:list -s. Shouldn’t they be gone?

I tried one more thing and it looks like it helped:

sudo apt purge openhab2

Followed by a reboot.
I upgraded with the openhabian config tool, but somehow the were still leftovers… Now it’s running like a charm.

For me this doesn’t apply unfortunately, because I run OH3 in a docker container, so I have a “clean” environment

in my case the RXTXPortMonitor is the highest, but none of the threads are suspiciously high.

@Pedals2Paddles: Did you already create a git issue?

I’m observing a similar issue: round about 24 hours of running openhab3 in a docker container (clean install, setup from scratch) the VM throws an out-of-memory error and becomes slow beforehand. I suspect heavy rule-execution or exec-binding at the moment and will disable those one by one and hopefully narrow it down a bit…

Same question, do you have set up DSL rules via the UI? If so, put them in files like you did in OH2. That did the trick at my case and since then i have extremely low CPU and everything is stable

please post link to git issue so we can play along at home
thanks

2 Likes