OH 3.4 to 4.0 on RP4 = massive performance issues

Dear all,

I migrated my OH 3.4 to the latest 4.0 stable version using openhabian. I have a RP4 8 GB version. No other changes since 3 months (besides widgets and one page). I still use (and prefer) file based config where possible.

While OH 3.4 run quite fast, OH 4.0 is not able to fully start with my massive amount of things, items and rules. I read trough the forum already and extended zram by factor 3 as RAM (max 8 GB) and memory (max 128GB) shpould not be a limit anyhow. which just little improvement.

Before rebooting log shows me especially something like:

2023-10-24 20:35:00.754 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@182bd8c' takes more than 5000ms.

Some infos that might help:

sudo systemctl status openhab

Gives:

● openhab.service - openHAB - empowering the smart home
     Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/openhab.service.d
             └─override.conf
     Active: active (running) since Tue 2023-10-24 20:35:38 CEST; 7min ago
       Docs: https://www.openhab.org/docs/
             https://community.openhab.org
   Main PID: 10472 (java)
      Tasks: 343 (limit: 4915)
        CPU: 8min 58.356s
     CGroup: /system.slice/openhab.service
             └─10472 /usr/bin/java -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dopenhab.runtime=/usr/share>

For me it looks like openhab uses one core and this is full.

Loading all things and items but zero rules causes CPU below 10%.

I am still wondering why / what such major differences are caused by from OH 3.4 to OH 4.0 and what to do!

Thanks and best
Matthias

I tried loading one rule after the other. This works but loading the rules takes EXTREMLY long!
I also found this post.

As an example the following file takes about 20 sec until CPU load goes down.

rule "Statistik Fahrradschuppen"
when
    Item TuerschlossantriebFahrradschuppen1LOCKSTATE changed
then
    if (BootFinished.state != ON) {
        return;
    }
    if (newState.toString == "LOCKED") {
        Statistik_Fahrradschuppen_Verriegeln.postUpdate((Statistik_Fahrradschuppen_Verriegeln.state as Number)+ 1 )
    }    
    if (newState.toString == "UNLOCKED") {
        Statistik_Fahrradschuppen_Entriegeln.postUpdate((Statistik_Fahrradschuppen_Entriegeln.state as Number)+ 1 )
    }
end

rule "Statistik Gartenschuppen"
when
    Item TuerschlossantriebGartenschuppen1LOCKSTATE changed
then
    if (BootFinished.state != ON) {
        return;
    }
    if (newState.toString == "LOCKED") {
        Statistik_Gartenschuppen_Verriegeln.postUpdate((Statistik_Gartenschuppen_Verriegeln.state as Number)+ 1 )
    }    
    if (newState.toString == "UNLOCKED") {
        Statistik_Gartenschuppen_Entriegeln.postUpdate((Statistik_Gartenschuppen_Entriegeln.state as Number)+ 1 )
    }
end

rule "Statistik Garagentor"
when
    Item Garagentor1DOORSTATE changed
then
    if (BootFinished.state != ON) {
        return;
    }
    
    if (newState.toString == "OPEN") {
        Statistik_Garagentor_Oeffnen.postUpdate((Statistik_Garagentor_Oeffnen.state as Number)+ 1 )
    }
    if (newState.toString == "CLOSED") {
        Statistik_Garagentor_Schliessen.postUpdate((Statistik_Garagentor_Schliessen.state as Number)+ 1 )
    }    
    if (newState.toString == "VENTILATION_POSITION") {
        Statistik_Garagentor_Lueften.postUpdate((Statistik_Garagentor_Lueften.state as Number)+ 1 )
    }    
end


You have plenty of RAM so the usual culpret is excluded.

You are using Rules DSL so the known problem with JS Scripting on 32-bit ARM is not the problem.

The fact that OH is pegging the CPU on a single thread is suspicious. I’m pretty certain that Java allows use of multiple threads (i.e. it will spread execution across multiple CPUs so the fact that only one core is pegged implies that there is only one thread in OH that has run amok.

First thing I would do is set the logging level to trace level logging and see if you can see something churning over and over again.

If that doesn’t show anything try eliminating add-ons one by one until the behavior stops or you run out of add-ons.

If you do run out of add-ons, review all your rules, maybe even add some logging, so see if you have a single rule that is running amok.

That’s about all I can offer given the information available.

Thanks for the answer!

Logging of what component exactly?

As I wrote - it is loading the rules files. I can repeat this for every rule file I have and I see the same behaviour. I guess it is parsing the rule file and not the rule logic itself.

Any glue?

Thanks

Everything! Set the logging level of org.openhab to TRACE and everything will get set to TRACE unless otherwise set to something else.

It’s always been slow to parse .rules files. And nothing changed in that regard between OH 3 and OH 4. There’s nothing to explain why that makes a difference.

I took the change to fully setup a fresh openhabian 4 system. Things got better a lot, nevertheless especially rule paring is still much below “old” OH3. I am kind of lost!

Took a time to start analysing. Trinstalling OH 4 completely fresh caused improvement and more cores bing used. Nevertheless, the system is extremely instable as soon as I start changing ANYTHING on a file. This causes often OH to restart itself. The identical files have worked fine on OH 3 => very unlogic for me.

I activated TRACE but then generate log files that long that they are not readable any more. What is a good idea here to continue?

I created an issue as well: Instable system with heavy file based config when migrating to OH 4 · Issue #3871 · openhab/openhab-core · GitHub
With a fresh setup this should not be the case anyhow.

Thanks

It could really depend on many things. Can you compare number of threads started by OH 3 vs OH 4? Maybe some of bindings changed threading model.
As far I remember there were some optimizations to filesystem watches to limit resource consumption.

The OH 3.4 system is gone as I upgraded / setup up freshly.

If you run top -H it will show the thread names which may help with figuring out what causes the CPU load.

You can also get some nice details about what threads cause the CPU load in the Karaf console using:

ttop --stats=tid,name,state,user_time,cpu_time,user_time_perc,cpu_time_perc --order=cpu_time --millis=300

Executing the threads command on the Karaf console shows what part of the code is being executed by the threads.

Thank you for hints.

When I activated the console, I got this but openhab has NOT restarted! By the way: I have a lot of debug output when opening openhabian-config even though I never activated this and use a fresh setup!?

When system is running since more then 1h:

Reloaded a items file 5 seconds before:

When system starts to slow down, before I it somehow decides to restaart I get logs like this:

2023-11-16 09:03:38.900 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@1bb48e8' takes more than 5000ms.
2023-11-16 09:03:38.901 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@672ed5' takes more than 5000ms.
2023-11-16 09:03:38.922 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@128af4' takes more than 5000ms.
==> /var/log/openhab/events.log <==
2023-11-16 09:03:38.940 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_lastchange' changed from 2023-11-16T08:57:02.064739567+0100 to 2023-11-16T08:57:11.292204997+0100
2023-11-16 09:03:38.942 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_Wirkleistung' changed from 385 W to 384 W
2023-11-16 09:03:38.949 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Hausstrom_LeistungVerbrauch' changed from 389 W to 388 W
2023-11-16 09:03:47.730 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_SpannungL1' changed from 224.2 V to 225.1 V
2023-11-16 09:03:52.076 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_SpannungL2' changed from 226.4 V to 226 V
==> /var/log/openhab/openhab.log <==
2023-11-16 09:03:57.489 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@128af4' takes more than 5000ms.
2023-11-16 09:03:57.496 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@672ed5' takes more than 5000ms.
==> /var/log/openhab/events.log <==
2023-11-16 09:04:02.030 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_Zaehlerstand' changed from 11704.1632 kWh to 11704.1643 kWh
2023-11-16 09:04:06.414 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_lastchange' changed from 2023-11-16T08:57:11.292204997+0100 to 2023-11-16T08:57:21.221106968+0100
2023-11-16 09:04:06.416 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'BadezimmerHandtuchheizungAbsoluteLuftfeuchtigkeit' changed from 10.022488637672952 to 9.904323914080639
==> /var/log/openhab/openhab.log <==
2023-11-16 09:04:10.718 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@128af4' takes more than 5000ms.
2023-11-16 09:04:16.384 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@672ed5' takes more than 5000ms.
==> /var/log/openhab/events.log <==
2023-11-16 09:04:16.410 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'BadezimmerHandtuchheizungThermostat1ACTUALTEMPERATURE' changed from 20.4 °C to 20.5 °C
2023-11-16 09:04:16.418 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_StromI1' changed from 1.14 to 1.15
2023-11-16 09:04:25.152 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_Wirkleistung' changed from 384 W to 388 W
==> /var/log/openhab/openhab.log <==
2023-11-16 09:04:34.947 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@128af4' takes more than 5000ms.
2023-11-16 09:04:34.948 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@672ed5' takes more than 5000ms.
==> /var/log/openhab/events.log <==
2023-11-16 09:04:34.993 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'EsszimmerAirqualityLastseen' changed from 2023-11-16T07:56:40.401+0000 to 2023-11-16T07:57:28.278+0000
2023-11-16 09:04:34.996 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'EsszimmerAirqualityLinkquality' changed from 18 to 14
==> /var/log/openhab/openhab.log <==
2023-11-16 09:04:43.693 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@128af4' takes more than 5000ms.
==> /var/log/openhab/events.log <==
2023-11-16 09:04:43.709 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'BootStopRule' received command ON
2023-11-16 09:04:43.789 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_SpannungL1' changed from 225.1 V to 224.9 V
2023-11-16 09:04:48.099 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_SpannungL2' changed from 226 V to 225.9 V
2023-11-16 09:04:48.152 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_SpannungL3' changed from 226.4 V to 226.1 V
2023-11-16 09:04:48.153 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Stromzaehler_StromI2' changed from 1.34 to 1.32
==> /var/log/openhab/openhab.log <==
2023-11-16 09:04:53.342 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@128af4' takes more than 5000ms.
2023-11-16 09:04:53.343 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@672ed5' takes more than 5000ms.

When system performance is bad, meaning OH stops responding, I can actually even not connect to the OH console. Is this normal in this case or indicating something?

Most of the cases when JVM process becomes unresponsive is due to memory leak. Then most of CPU time is spent on memory management and not on actual program execution. However, then you should see GC threads in the listing.

Thank you for the hint. I posted also screenshot and memory looks not exhausted at all!

Can you double check memory limit for openHAB process? I see that your operating system has enough of memory and most of it is free.

How do I do this?

You can look at command line options of Java process (cat /proc/$PID/cmdline), you can verify /etc/default/openhab and setenv, oh2_dir_layout scripts within your local installation.
Anything which indicates -Xmx or similar is explicit setting of memory limit. If you haven’t changed that then you run on defaults which were fine until you switched to 4.0.

BTW. are you using javascript rules?

Hi,
thank you for the ideas.

cat /proc/$PID/cmdline
coherent_pool=1M 8250.nr_uarts=0 snd_bcm2835.enable_headphones=0 snd_bcm2835.enable_headphones=1 snd_bcm2835.enable_hdmi=1 bcm2708_fb.fbwidth=0 bcm2708_fb.fbheight=0 bcm2708_fb.fbdepth=16 bcm2708_fb.fbswap=1 smsc95xx.macaddr=E4:5F:01:35:78:B4 vc_mem.mem_base=0x3f000000 vc_mem.mem_size=0x3f600000  console=ttyS0,115200 console=tty1 root=PARTUUID=45d866a6-02 rootfstype=ext4 fsck.repair=yes rootwait

In /etc/default/openhab I find

EXTRA_JAVA_OPTS="-Xms192m -Xmx768m -XX:+ExitOnOutOfMemoryError"

I can just guess about this numbers and no idea what they really mean or what a better value would be. What are the OH4 values here?

Regarding setenv , oh2_dir_layout not sure what you mean: I checked xmx in the openhab-conf folder and nothing found.

I am not using javascript rules yet. The future improvements will probably switch from rules DSL to javascript.

Thanks and best

@Matthias_Kaufmann I’m facing very similar issues. Did you make any progress in the meantime? I’ll try to debug this using JMX. Let’s hope we’ll find the culprit soon


Edit: Just found Instable system with heavy file based config when migrating to OH 4 · Issue #3871 · openhab/openhab-core · GitHub, I think that answers it unfortunately.

No. That’s the issue he opened and mentioned here in the thread.