OH3.2 startup messier than before

This is more of an observation, I guess, although I am more than happy to listen to any suggestions to improve my system’s behavior.

I have had quite a few new errors from my DSL rules since upgrading to OH3.2 which I did not observe previously. Many of these appear to be coming from bindings loading slower / rules starting to run before persistence has been able to restore values / values could be read from bindings. In some cases I have had rules being triggered by an “item changed” event for one item from one binding, while the other item in the rule, coming from another binding, was still NULL. Again, that has not happened before. It has also brought up the issue with publishMQTT discussed here, which I had not come across before. Again, the likely reason is slow startup of the binding.

I have been able to address all of these errors and my rules files are now more robust, as a result. So I am good.

I just feel that, given what a great piece of software OH generally is, the startup process feels quite messy. I am probably not grasping the complexity of all this, and therefore sorry in advance for my ignorance, but I was wondering eg why rules start to run if restore from persistence has not concluded yet, or bindings have not finished loading fully yet.

You were probably just lucky. This sort of startup behavior has been there forever. If you’ve not seen it before, you’ve been lucky more than anything. Nothing changed in rules DSL nor the startup pieces between 3.1 and 3.2.

Two process are started in parallel in two completely separate parts of OH. There is no way for one to inform the other is done.

Honestly, OH 3’s startup is way more regulated and deterministic then OH 2’s ever was.

Also, consider the MQTT action That never exists until the broker thing goes online. What if broker is down? Should rules never start because Mosquito is down?

Would we be able to set the script to a higher start level, so it starts later in the chain of things? I have never looked into it.

Comment - an upgrade clears a system cache. Because it cannot use this cache at startup, and has extra work to rebuild it, the first reboot after upgrade is always messy. Take little notice of that, the important part is subsequent reboots.

Would we be able to set the script to a higher start level, so it starts later in the chain of things?

Had been thinking about that, as well. But text based DSL rules currently cannot set triggers by run level. Is that an enhancement that might still come? I am also planning to start looking into JS scripting more seriously now (which is why I was so eager to get onto 3.2, despite all the bumps in the road on my personal upgrade path), I would hope run level triggers would work there (including for text based rules).

Take little notice of that, the important part is subsequent reboots.

Yes, that makes sense. But my experience has been pretty consistent across multiple service restarts with OH3.2 (although with different errors popping up at different times). But again, got that fixed and have better code now, so would hope for me this will not be an issue in the future anymore.

Sure, I get that, and I guess it is just something you’ve got to accept and live with. But still, at my last restart I even got the below error which I read as the rule having been run even before the mentioned item was initialized.

Script execution of rule with UID 'wetter-1' failed: The name 'WetterstationRainThisMonth' cannot be resolved to an item or type; line 35, column 16, length 26 in wetter

I can’t even wrap my head around that error. This is a calculated item without its own channel, and the error is thrown by postUpdate(). Before the rule arrives at that line there is a number of similar calculated items that are being processed in exactly the same way, apparently just fine. And all these items are defined at the end of the same items file.

Bottom-line, and this is an honest question from someone who is still learning a lot, does this mean not only do I need to check every item state I read for NULL/UNDEF before I run a rule; but I actually also need to check whether items I want to write to have even been initialized? (For this one rule alone that would be ten items). And if so, would you happen to have a tip as to what is the most efficient way to implement that? Check for NULL/UNDEF the same way I check the items to be read?

I think that’s the correct interpretation, yes. It was even possible in OH2 at least, to run rules before system ‘constants’ like ON or now were created.
There are some really complicated races going on at startup, and if it were easy to address that would have done in OH1.

I have an OH2 system that absolutely will never make a clean start without intervention. As well as the kinds of issues that cause error logs, there can be more insidious effects like Group membership completing after rules are initialised - meaning Group based triggers are ruined (but error free!).

Using the technique described here made it 100% reliable for me, on my host box - this stuff is very individual.

I do not know of anybody following this method for OH3, it would require careful adaption.

I do know theses problems are exacerbated by using config files. The loading and parsing orcas is more involved and less constrained because you’ll have Items and rules and things all being loaded at the same time.

That’s always good to do anyway because they’re are plenty of other times where an Item can become NULL or UNDEF even after OH starts up. For example, if OH has been up for hours, then you reload a .items file, all those items get deleted and recreated and come up as NULL again. restoreOnStartup takes some time leaving chances for your rules to run with NULL states.

A binding can set an item to UNDEF of it didn’t know what state the device is in at any time.

You can change the states of any items too either in rules or the expire binding or manually.

You simply cannot trust that an item will always and forever have a non-NULL/UNDEF state in a rule. You either have to test for that or live with the errors when it occurs.

It’s different in different languages. For Rogers DSL it’s often easiest to see if the stateisinstance UnDefType. In JSScripting the item has an isUninitalized member that returns true when the Item is NULL or UNDEF.

Another technique in OH 3 some have found success with is to create a role that triggers at runlevel 40 that disables all the problem rules and another one that runs at runlevel 100 to reenable then. However beware that triggering rules at certain runlevels and disabling/enabling rules is not possible in Rules DSL. It’ll be easiest using UI rules.

@rossko57 @rlkoshak Thank you both for your indulgence, the explanations and the tips! New insights to chew on over the coming days. OpenHAB would not be the great piece of software it is without the extremely helpful community and the library of tips and solutions it has built over time.

Just like to add my experience with OH 3.2. What I see is that the mapdb persistency is less reliable than what it used to be. In about 40% of the startups it does not restore the values before the startup rules are triggered. I also do see occasionally the script execution errors for the startup rules that items are not loaded, as Kai was reporting. The latter is new, while in the past with 3.1 and 2.5 I did have very occasional issues with mapdb persistency. (I did reinstall my openhabian system 2 weeks ago from scratch on my RP3, to be sure that this is not some sort of issue related to upgrading the system, but the mapdb problem reappeared)

That is possible now :wink: @kstuken You wanted to know if that might still come (that’s why I tagged you).

Correct me if I am wrong, but I always thought that the concept of start levels was introduced to fix exactly this and ensure that the startup is done in an order that actually makes sense?

Yes, but it’s not going to work precisely like one might expect. As was mentioned above, run level 80, for example does not mean that all Things are ONLINE. It just means they’ve been loaded and initialized. So you can’t expect all Things to be in a workable state when this run level is reached.