openHAB 5.0.1 locking up every month + my temporary remote OH4 solution

My remote cabin OH 5 server is down, and i had to spin up a backup OH4 server at home, now i am taking a note of what happened and asking on why OH5 locks up after a month. So what is the solution?

  • Platform information:
    • Hardware: n150 / 8g / 240g SP ssd
    • OS: Pepermint 12, debian 12 based
    • Java Runtime Environment: Probablly 21.x? Installed via openhabian on August 27th, 2025
    • openHAB version: 5.0.1
  • Issue of the topic: The server runs fine for about 3 weeks to a month with normal cpu load about 0.8, then suddenly it boggs donw by openhab consumming loads of CPU, cpu load like 68, and RAM is normal about 40% when and if i was even able to SSH in, cause of such high load.
  • Please post configurations (if applicable):
    • Items - about 400, i have mqtt boud items only, and mosquitto server
    • Sitemap - 1, i do have sitemap it seems to work fine no errors
    • Rules - about 50 clasic Rules DSL, some use rrd4j to gain stats, some control devices and some ingest strings
    • Services openhab cloud, jsonpath, rtl433
  • If logs where generated please post these here using code fences:
  • I may fetch logs when i get access to the machine, it is located in remote cabin and it freaking aint forkin right now it’s silent as dead and we have some snow in the way.

OH5 install

So i installed openhab 5.0.1 on August 27th, 2025, before that i had the same config running on OH 4.1 for at least a year. Upon migrating i fixed some errors and left it running until it decided to start acting up after a month, i restart, seems fine like new, after 3 weeks again no access hot PC to the touch.

I seen OH 3 and OH4 crach on high RAM usage but now OH5 seems CPU overload related, upon some type of internal error. I did look at loogs in autmn but it didn’t seem obvoius as that like rule would cause it or whatnot.

Anyway i don’t know that 5.0 beast yet but it seems it has some ghost inside like others post that it wont terminate some scripts when they hang. I supose that it is missing some type of watchdog, as some say it won’t start after restarting in some x situation.

My OH4 Experience

I do like openhab, and i am used to it running all the time as i have 3 more systems running OH4.

The only reason i thaught OH5 is fine to install back then is that i just let openhabian install wat was recomended.

Now i am running a backup server at home on OH4 on a chincy zotac box and at least i have VPN access to my pi4 mqtt server i dropped in before winter.

I did suspect RTL433 for drama, but it aint as it seemed to not consume CPU like openhab did. Now i only miss that data from wireless sensors that are offline cause of the remote server is down.

Temporary OH4 remote server

I see that spare backup server is the only thing to depend on. Also looking at keeping seperate MQTT and RTL433 +VPN server apart form OH from now on.

I really plan to just rollback to OH4 and sit on it cause its a bit more reliable untill i can see OH5 on a stage setup that it runs fine.

Verdict

OH5 locking, It sure can be my confing related, but who has time thinkering on remote cabin, just becase an OH5 upgrade brings in new squirel that eats at my system when it decides.

It needs some type of watchdog.

Cheers Matej

Well, you’re throwing quite a number of components and potential reasons for your issue into the mix.
Generally speaking, openHAB 5 is no less reliable or prone to locking, memory leak or similar errors than is OH4.
Forget about the idea of downgrading, that won’t get you anywhere in the long run.
You’re just shooting at the dark then because you don’t know where the target is.

Getting to the bottom of it is never easy or fast. Most of the time, in the end it’s some specific binding or hardware that few others users use so your 433MHz stuff sound like a prime suspect, but to find out, you have to analyze issues systematically when they occur.
Load-caused lockups are almost always related to your system setup. Whenever OH consumes more memory than the HW/OS can provide at that time. Debugging that is a story of its own, but not really OH related. If you lose access, that’s a strong sign of some system and not OH related cause. Using a standard, reliability proven setup like a openHABian RasPi is much more reliable than to use some homebrew HW+OS+system config setup. Particularly so for remote installations.
Having a watchdog doesn’t hurt, but it’s not a let alone THE solution. It’s a workaround at best,
unfortunately quite some people use this or similar procedures rather than to investigate and fix the root cause.

Long story short, get to analyze your box when in lockup state.
A very useful tool for that is documented here: Runtime Commands | openHAB, that’s often giving a hint which rule or binding to look into.

Hi Markus, thanks for the insight.

The tools are esential, i remember i used some, but its used to be quite hard to track.

I see thread info has expanded as per doc, looking forward to that showing some hints.

Well i am glad that OH4 is running fine with same config. I admit that there are some rule eŕors here and there, due to devices publishing json messages that change or go offline. I dont have all edge cases ironed out yet as device count grew, but OH4 copes fine with json and item null errors.

OH 4 It is currentlly running for months so its like OH4 engine is protected diferently and errors dont backfire into the system in my cases.:sweat_smile:

Talking about edge cases, when all goes wrong, i know that OH4 is configured to self restart when it runs out of memory, but i suspect OH5 isnt, right.

So how do i set that up natively via javaopts or it has changed?

I rather have both self healing and clearing the error cause, like every robust server restarts on error OH should.

I dont see it as too much homebrew debian 12 with openhabian on it, its debatable sure but i have had trouble with pi4 and usb storage, so i somehow outgrew pi. Now pi5 is the first with real ssd, while i dont have it.

Anyhow, what would be the recomended way to go as i say OH4 is running just fine and i just want the new setup the more capable n150 mini pc be the upgrade in performance for future expansions.

For context i have the OH to monitor garden watering, well water, outdoor lighting, electric backup heating and power shedding via smart plugs as we have only 20A supply. I also built 12V DC UPS i monotor that. Mainlly i only have my own ESP8266 modules and Tasmota smart plugs.

And i like to do lots off data analisis so there the server is running hard, like realtime power analisis for turning plugs on and off, predicting future load and such and all is cooking in rules😆, so cpu isnt idle, while rrd4j is light oposed to influx i run at home.

And posiblly i have like over 50 charts for those 30 devices😅

Oh man it would be great to just have it working, like i am glad that any stupid things i did on OH4 they rarely break it, as its not as self explanatory to handle data in rules as it is in arduino for example where all is defined by default.

I do enjoy building and mining the data but fetching ghosts not so mutch as seeing inside whats breaking in OH silentlly after weeks is a bit much without serious tools and experience, while i use OH for 6 years its still a grey box under the lid i grasp some of it.

For reference my openhab.log is silent for hours and i disabled event log to save on disk writes.

So what steps do you recomend for minimal fuss, that OH5 survives my torture.

Well, much of this like when OH crashes or hangs and if the system restarts is not about OH itself but about your system setup which is homegrown, so I can’t comment on that.

My recommendation is always to go with the mainstream standard: Raspi, openHABian.
Because then you will be having the very same HW+OS as everyone to use it, too, and all of them can potentially help you with debugging that. That’s the benefit of going mainstream.
Meanwhile as soon as you homebrew anything, only you know what you did so (essentially) only you can effectively debug it.

Certainly the way to go for a remote unit. You can even have that mirror SD cards for resilience.

On your local main system you can use a x86 Debian + openHABian system or attach some USB SSD to a RPi and move the DB there if you want to improve data mining speed.
But I also do a lot of that sort of computation on various RPi installations of mine. It’s just a question of your programming efficiency, one really barely needs anything bigger in terms of CPU and I/O power.
And openHABian takes care of optimizing the system upfront (DB and logging on ZRAM etc).

I might add two suspects

  • heavy logging: This can cause zram partitions to run out of space. So logging should never remain on debug or trace for a long time. But that has nothing to do with versions. So my first bet for you case is:
  • mqtt: postCommand: true. This can cause trouble creating loops even if it worked in v4.x environments. In general it should almost always be false

further info here:
https://community.openhab.org/t/openhab-5-1-release-discussion/167620/278

:backhand_index_pointing_up:

I’m willing to bet this is the root cause of the CPU usage.

As for the rest. :person_shrugging: Nothing mentioned actually changed in OH 5 from OH 4. OH 5 is configured to restart when it crashes. But that’s not OH, that’s systemd that does it. I bet OH 5 isn’t crashing though, it’s just not responding.

You turned off the one thing that would tell us anything useful here. Is OH processing lots of events and that’s what’s using the CPU, or is something else using the CPU? events.log would tell us that. If you had OH 5.1, events.log would even tell you the source of all those events.

Neither does OH 4. One could be implemented through systemd or other external services if you want. But OH has never had the ability to restart itself.

Can you point me to these posts?

But if it’s generating tons of events…

You don’t set that up via javaopts. It’s not a Java option. It’s the script that starts OH in the first place: openhab.service either somewhere in /etc/systemd or /var/lib/systemd. But this file didn’t change between OH 4 and OH 5.

If it’s your config it probably will never get fixed. If you don’t have time to help figure out if that’s the case or if there’s a problem, perhaps an FOSS system isn’t the way to go. You need to find a platform where you can pay someone to spend the time to figure it out.

Nothing changed between OH 4 and 5 in this reguard. What probably has changed is you’ve encountered a new bug in OH 5 or something else changed around the same time as the upgrade. But there are no fewer protections between OH 4 and OH 5. Neither really have any external protections.

Me too but OH has never claimed to offer that.

A quick log:set debug org.openhab and/or the aformentioned ttop console cmd should get you an idea what it’s doing most of its time if not hung.
If in MQTT, I bet Rich is right.

PS log:set default org.openhab to reset

1 Like
Bridge mqtt:broker:MQTT_Bridge [ host="192.168.1.100", secure=false ]
{
    Thing mqtt:topic:Vodnjak "Vodnjak" {
    Channels:
        Type number : Razdalja_Do_Vode "Razdalja Do Vode" [ stateTopic="/Vodnjak/Razdalja" ]
        Type number : Volumen_Vode "Volumen Vode" [ stateTopic="/Vodnjak/Volumen" ]
        Type number : Globina_Vode "Globina Vode" [ stateTopic="/Vodnjak/Globina" ]
        Type number : Signal "Signal" [ stateTopic="/Vodnjak/Signal" ] 
        Type number : Stevec "Stevec" [ stateTopic="/Vodnjak/Stevec" ]
    }

    Thing mqtt:topic:Sonoff4Pro_Terasa "Sonoff Terasa" {
    Channels:
        Type switch : Rele_1 "Luc Terasa" [ stateTopic="/S1_Sonoff4_Dule/Status/K1", commandTopic="/S1_Sonoff4_Dule/vklop/K1", on="1", off="0"]
        Type switch : Rele_2 "Luc Delavnica" [ stateTopic="/S1_Sonoff4_Dule/Status/K2", commandTopic="/S1_Sonoff4_Dule/vklop/K2", on="1", off="0"]
        Type switch : Rele_3 "Luc Miza" [ stateTopic="/S1_Sonoff4_Dule/Status/K3", commandTopic="/S1_Sonoff4_Dule/vklop/K3", on="1", off="0"]
        Type switch : Rele_4 "Luc Dovoz" [ stateTopic="/S1_Sonoff4_Dule/Status/K4", commandTopic="/S1_Sonoff4_Dule/vklop/K4", on="1", off="0"] 

        Type string : Status "Status" [stateTopic="/S1_Sonoff4_Dule/Status"] 
    }

// more Things bellow

}

My bad for not including some config, I still run legacy thing configuration so i don’t see that mqtt: postCommand: true would even be configured anywhere.

I can stage the OH5 system at home and load it with same config and let it cook, then inspect where it would start smoking.

It will be a property on an MQTT Channel. It will be there somewhere among stateTopic and commandTopic.

If it’s not on any Channel than that’s not the problem and we don’t have any other information to go on. This is the only thread I’m aware of that is having CPU problems with OH 5 that isn’t caused by an infinite loop in MQTT caused by incorrect use of postCommand.