I keep the original tittle here for the search but it has been updated to match the real issue:
“Upgrade to 4.3.1 => OH down”. I don’t think the upgrade to 4.3.1 per say was the issue but another issue (see below) that was made visible by the restart after the upgrade.
I was on 4.3.0 so I did not expect much of an issue upgrading to to 4.3.1… If I’d known…
I am running into several issues:
Yesterday, I tried uninstalling and reinstalling the MQTT binding. I also rebooted a few times already. Things looked better after a while and I could turn on/off lights or I saw though HabPanel that my door state was caught by OH however, it seems that NONE of the rules worked, like if the rule engine would simply not run.
I checked old and new logger, both remained very silent.
I checked updates and disk space, everything looks good.
The new logger has mostly all on the error level. I now swicthed a few items to info or debug.
Uninstalling the MQTT binding triggers a waterfall of logs
After UNinstalling MQTT, it shows (I refreshed a few times and gave it some time):
Seeing 300% CPU so it sounds like something is bloating OH. Installing the MQTT binding seems to never complete…
Everything around OH is running fine, mosquitto is running fine, Z2M does its job.
Have you cleared the cache yet? Anytime there’s an installation of add-ons type issue that often clears the problem up.
The changes to 4.3.1 (see Release openHAB 4.3.1 · openhab/openhab-distro · GitHub) are pretty minimal and not related to anything shown here. So it’s likely downgrading isn’t going to necessarily solve the problem. Or if it does solve it it’s becuase it cleared the cache or did some other similar operation.
The “HANDLER_MISSING_ERROR” is what happens to a Thing whose binding is not installed or otherwise in an error state. The “handler” in this case is the binding.
In MainUI if you go to Help & About → Technical Info under “systemInfo” you’ll see a line for the runlevel. What runlevel is your OH at? Are you using Rules DSL or an automation add-on for rules? If you are using an add-on perhaps that add-on didn’t install properly also.
After 38 minute uptime, my things are online so clearing the cache did some good.
The oh java process remains however stuck at 100%.
Not being a Java expert by any means, I am trying to troubleshoot what bit is “annoying” OH.
Suggestions welcome
startLevel means the rules engine hasn’t started yet. Rules are loaded at start level 40 and the engine starts at 50. If you are still in start level 30 that means something went wrong or OH is still loading your rules.
Since 4.3.0 OH also parses the rules at start time rather than when they are run for the first time. This adds a lot of load to the boot time and extends the boot time but in exchange we get better error checking at load time and more responsive rules. But that could account for the initial boot load.
It is relevant though, what language are you using and are you using file based rules or managed rules?
Is it still at startlevel 30?
If file based rules, do you see in the logs that it’s loading them?
I switched org.openhab to debug (otherwise the logs show nothing).
I see lots of logs but nothing really relevant.
I did see ONE .rules file loading and OH complaining about the file being erroneous or empty but that totally makes sense since the content of this one file is entirely empty, so empty for OH.
I don’t recall seeing the items/rules being loaded as I usually do see in the logs.
I am still “booting”, still high CPU, still startLevel 30. So I think that the issue with not seeing the loading of the items/rules is due to new defaults for the logger.
Validation issues found in configuration model 'alarm.rules', using it anyway:
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
It is progressing to the rest and appears to reach the end.
I fixed a few of the warning but the outcome so far is the same:
100% cpu
stuck at startLevel 30
I see a few warning but nothing that appears critical to running OH itself.
I am now a bit annoyed to start / stop the service so I am checking how to start manually to test hoping I can spot in the logs where it gets stuck.
Yes, I downgraded to 4.3.0 hoping for a “quick fix” for now but got the exact same behavior.
So I went back to 4.3.1. I am now wondering if I was really on 4.3.0 before.
I was on a recent version but that could have been a 4.2.x so I want to check the release notes and see if there are caveats from 4.2 to 4.3.