Members of a group what is correct way in OH3

Peter_Novotny · February 27, 2023, 6:49am

During migration from OH2 to OH3 the groups does not work correctly. In forum many possibilities are mentioned:

Somehow members does not work in my DSL, allMembers seems to work, but returns empty set, and getMembers works, but not in startup section.

Can someone please explain to me what is difference between these 3 and how to modify OH2 DSL (was using getMembers without problem) to get all members of group?

rlkoshak · February 27, 2023, 2:03pm

There is no difference so you’ll need to post some example rules and detailed description on what’s happening. Also post logs.

Peter_Novotny · February 27, 2023, 9:18pm

For example in reading values from file (my simple persistence implementation without any DB, really primitive)

rule "RealStartup"
  when 
    Item LATE_STARTUP received command ON
  then
// other code skipped
    running_persistence_read = 1
    gPersistent.getMembers.forEach [p | 
      logInfo ("Persistence", "Persistence refreshing "+p.name+" in state "+p.state.toString())
      if ( p.state == NULL ) {
          persistence_lock.lock()
          PERSISTENCE_Args.sendCommand("-R -a "+p.name)
          PERSISTENCE.sendCommand(ON)
          while (PERSISTENCE.state == ON) {
            Thread::sleep (100)
          }
          Thread::sleep (1000)
          if ( PERSISTENCE_Out.state.toString.length > 0 ) {
            logInfo ("Persistence", "Persistence read for "+p.name+" as "+PERSISTENCE_Out.state+" ("+PERSISTENCE_Out.state.toString.length+")")
            postUpdate (p.name.toString, PERSISTENCE_Out.state.toString)
          }
          else {
            logInfo ("Persistence", "Persistence NOT read for "+p.name+" as "+PERSISTENCE_Out.state+" ("+PERSISTENCE_Out.state.toString.length+")")
          }
      }
    ]
// other code skipped
end

In this case behaviour is different:

getMembers: works
allMembers: nothing returned, never gets inside the loop
members: Java error Script execution of rule with UID 'rollershutter-2' failed: An error occurred during the script execution: Cannot invoke "org.eclipse.xtext.common.types.JvmType.eIsProxy()" because "type" is null in rollershutter" is null

rlkoshak · February 27, 2023, 9:28pm

When does this rule get triggered? It seems like it’s at or around startup? Are you certain that all your Items are loaded and ready by this point?

Peter_Novotny · February 27, 2023, 9:54pm

Rule is triggered based on my last comment is this thread Binding not yet ready in startup sequence, how to detect, which is based on nyholm solution here OpenHAB3 Start Up Issues - Race Condition? How to Sequence Startup? - #13 by nyholm as it is the only solution that was working on my system.
I have read your comment (not have link now) how difficult it is to organize start as lot of things happening in the background that openhab has no control and even worse no visibility if, in what order and when they have started/finished loading.
Fully understand this is significant limitation with high impact, but at the end it renders into significant issue for the system to work properly. Also read about the runlevels, that should help, but from my point of view they are adding another layer on the same ‘confusing’? basis, which I’m (personal subjective opinion) not sure if it increases or decreases level of randomness during system start taking into account all the various errors I have seen (including java garbage over 2 monitors )

rlkoshak · February 27, 2023, 10:20pm

The startup is deterministic. But the “randomness” comes from situations like this. Let’s say I unplug my Zwave controller. None of the Zwave Things with ever come online. Does that mean system runlevel 80 and above never be reached? Is it really better to refuse to come up at all or is it better to come up with those Things remaining OFFLINE?

Also, a deliberate choice was made to load the Rules before the Things. No matter what, your rules are going to start running before the Things are even loaded. But notice that there are two runlevels after the Things are loaded. If you use a runlevel trigger and trigger on runlevel 100, for example, maybe you don’t even need this hacky work around. (System started triggers the rule at runlevel 40, long before Things are loaded).

Though any Thing can go offline for any reason. So it’s better to build a system that cares about the state of Thing rather than the system runlevel.

As for the Group behavior, I’m not sure where to look. I don’t think this is a case where the Items are still loading when the rule finally fires. GroupItem.getAllMembers() hasn’t changed since before 2019 and probably since long before even that so it’s not because of an obvious change in the code or behavior.

It’s not clear where the error is comming from though. It’s clearly not able to convert the type of something to a form it can use (that’s usually what null errors like that mean). Are you certain this rule is the second one in the file named “rollershutter.rules”?

Add logging statements each step of the way until you can identify exactly which line it’s failing on. The error may be related to why getAllMembers isn’t working as expected.

Peter_Novotny · February 27, 2023, 11:17pm

I tried exactly as you are mentioning: Check if thing is ONLINE before any interaction is started, but received only lot of errors. Few examples list of random errors seen in last time, it is worth to mention that the number after rollershutter- is pure random, the mentioned rules have never been run they include loginfo that was not printed. Also for example one night until 2:00 in the morning it was repeating the ServiceLocatorImpl has been shut down. I was crazy, then I went sleep and next day there was another java garbage, and from that night I have never ever seen the error again (without changing any file)

2023-02-26 14:11:55.290 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-6' failed: sleep interrupted in rollershutter
2023-02-26 14:14:50.636 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-1' failed: An error occurred during the script execution: Couldn't invoke 'assignValueTo' for feature JvmVoid:  (eProxyURI: rollershutter.rules#|::0.2.0.2.0.3::0::/1) in rollershutter
2023-02-26 14:28:40.606 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-1' failed: cannot invoke method public org.openhab.core.thing.ThingStatus org.openhab.core.thing.ThingStatusInfo.getStatus() on null in rollershutter
2023-02-26 18:05:47.421 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-11' failed: The name 'S_light_southffxSLD' cannot be resolved to an item or type; line 583, column 50, length 19 in rollershutter //here the line and column number will not match to git as file was changed
2023-02-26 18:10:49.501 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-10' failed: An error occurred during the script execution: Cannot invoke "org.eclipse.xtext.common.types.JvmType.eIsProxy()" because "type" is null in rollershutter
2023-02-26 18:13:32.637 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-6' failed: An error occurred during the script execution: Cannot invoke "org.eclipse.emf.ecore.resource.Resource.getContents()" because "resource" is null in rollershutter
2023-02-26 18:13:32.653 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'rollershutter-6' failed: Script interpreter couldn't be obtained in rollershutter
java.lang.NullPointerException: Cannot invoke "org.apache.cxf.transport.MessageObserver.onMessage(org.apache.cxf.message.Message)" because "this.incomingObserver" is null
java.net.UnknownHostException: community.openhab.org
Ljava.lang.StackTraceElement;@1fe6d7e OTHER VERSION null
java.lang.IllegalStateException: ServiceLocatorImpl has been shut down
2023-02-26 22:19:46.821 [ERROR]: #011at sun.reflect.GeneratedConstructorAccessor153.newInstance(Unknown Source) // this is interesting because the java garbage intro is missing, first record in log looks like this, but the java lang intro is missing
#011at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) // same as above

You can see the order of rules and also the commented attempt to wait for Thing being online here HomeAutomation/src/OpenHabConfiguration/rules/rollershutter.rules at OH3_GW · IcePlanet/HomeAutomation · GitHub

It is worth to mention that even the recommended approach is not race conditions safe because it is not atomic operation. In another words we know that in time of check it was ONLINE, but we do not know status at time of working with that thing. In one line of code status is checked, but when next line is executed status can be different, for this reason method of notifications (like interrupts) or atomic operations should exists to prevent the random race conditions errors.

In my code where on link above where is risk of race condition I use locks to prevent this. It introduces level of atomicity, but also lot of serial (instead of parallel) execution which I decided to accept for my usage. Unfortunately for Things locks strategy obviously can not work.

rlkoshak · February 28, 2023, 12:23am

Without the “garbage” :shrug. This seems to be associated with the connection between the client (i.e. web browser running BasicUI or MainUI) and the web server (i.e. Jetty) based on other threads that have posted about it. Probably what happened was the browser connection timed out or was closed or something.

No, it’s not random, it’s the ID of the rule that generated the error. The number indicates the rules position in the .rules file. rollershutter-2 means it’s the second rule defined in rollershutter.rules.

Depending on the what the error is, the error might be the only thing that the rule was able to do. It never got a chance to actually log anything, even if it’s the first line of the body of the rule.

The sixth rule in rollershutter.rules has a sleep. For some reason that sleep was interrupted. Does this rule take longer than 20 seconds to run? If so, OH killed the rule assuming it was stuck. Did I mention that sleeps are and have always been a bad idea?

    for ( tmp_item : tmp_group ) {
      while ( (auto_runner_lock.isLocked()) || (auto_runner_lock.getQueueLength() > 0 )) { Thread::sleep (47) }
      tmp_item.sendCommand (ON)
      Thread::sleep (321)
    }

This looks like a bad idea if tmp_group has a lot of Items or you have a ton of rules waiting for the lock.

This one comes from the first rule in rollershutter.rules. It’s a little hard to tell without narrowing it down to the specific line but it looks like there was an attempt to initialize or assign a void to a variable.

This one might be the cause of the previous error (though that would be odd). It doesn’t look like you used the getThingStatusInfo but instead got the Thing from the ThingRegistry and something failed there? Or maybe you used the wrong Thing UID and it couldn’t find the Thing?

No logs, no code

This one’s easy enough. The Item S_light_southffxSLD doesn’t exist. You reference it in the 11th rule in rollershutter.rules.

Not sure what this one is. I’ve never seen eIsProxy() before but it looks like a type conversion error. But another huge rule. Without knowing which line :shrug.

The rest of the errors are weird. They are all coming from something in core. But with all these locks and sleeps it’s possible you’ve hit some limit that is unexpected and breaks something in core.

Keep in mind that the assumption has always been that rules only take a few hundred milliseconds to run. No more. Some of these rules could take minutes to run. OH isn’t designed for that.

In the recommend approach, if the Item changed away from ONLINE, the second rule will run. So even if the status changes after the check, all that will happen is the status Item might flicker ON and then a hundred milliseconds back to OFF. On a fast system it’ll be tens of milliseconds.

For something whose primary intent is just to keep a bunch of errors out of your logs that’s a pretty fair trade to avoid the inherent dangers in using renetrant locks in an attempt to force an event driven system to behave like a sequential system. For example, are you aware that in Rules DSL the finally clause is not guaranteed to run, meaning you can easily hit a case where the lock never unlocks. In OH 2, if this happened more than 5 times all your rules would simply stop. At least in OH 3 only the one rule will stop.

And there is no way to make checking a Thing’s status an atomic operation from a rule. Even with locks, the rule can’t prevent the Thing from changing state after you check it. And in practice there is no need to. Things can’t actually change state that quickly and in general things don’t actually happen that fast in a home automation system either.

Finally, in OH 3, there really is no need to use locks for a given rule anymore either because only one instance of a rule can run at a time, unlike OH 2 where multiple instances of the same rule can run at a time. If the rule is running and gets triggered again, the triggers will be queued up and worked off in order in serial, not in parallel as threads become available.

Peter_Novotny · March 1, 2023, 8:25pm

thanks for info will try to comment on some items.

Garbage is my working ‘nick’ for error message that takes 2 or more screens and there is no useful information that I can take and work on fix. I understand that for professional programmers it is useful, but for me not.

From one point of view you are right that this is not clear solution. Taking into account the limitations of low power of Raspberry pi 2B, and limitation on parallel executions it was working conclusion to not even start threads waiting for the same lock in parallel (as they would only occupy execution thread waiting for lock). Please see this as nice example of the race condition, because if this rule is running the lock might be open, but until it sends the command another rule might have triggered rulle taking the same lock.

This is new information for me, are the 20 seconds hardcoded or can be setup?

I do not want to argue with you, but at least on startup there was absolutelly not reason to start rules that the number has indicated, taking into account various other errors it seems to me like during startup there is unitialized variable with random content - only my personal observation.

Few days ago we have discussed possibility of logInfo as good debug tool. In case the rule makes crash without writing any loginfo the benefit for debugging is limited.

It’s not that easy, what you write is wish how it should work, reality is different it exists, simple search in items shows

Number S_light_southffxSLD "Sun Light Dark" (gS_TRIGGER_ALL, gPersistent) // 0:Sun 1:Light 2:Dark

This is problem of troubled start, that that we are discussing in another thread, as soon as bad start is present these random errors are being thrown.

:shrug, but I’m in the same situation, this appears in log, without any line number, there are no logInfo from this rule and looking in events the rule has no reason to trigger, then I can also only :shrug, reboot and pray that next time it will start with easier error

Because of my situation at home where some rollershutters do not have manual overrides any more (my stupidity), I must changed back to 2B (=OH2) and it will take me some time till I again setup 3B (=OH3) for testing, sorry no practical news this time. Reason is that currently I have only one physical set of radios (NRF + 433) in a form or correct pinout for raspberry pi and this was moved from 3B back to 2B, I need to prepare new set, test on 2B and then when I’m sure the radios work move them to 3B for testing (not sure about possible interference between 2B and 3B.

rlkoshak · March 1, 2023, 9:00pm

But you can’t control that in your rules. The threads start anyway. All the locks do here is guarantee that you have a bunch of threads up, active, running, and doing nothing but consuming memory and CPU cycles waiting for access to the lock. I’m really kind of astonished this didn’t completely lock up your OH 2.

Hard coded. OH has always been designed around the expectation that rules run fast and exit. I’m not sure if this timeout existed in OH 2, but there once you have five rules waiting for the lock none of your rules will be running anyway.

During startup, what ever trigger is defined on that rule is occurring. You can see that in events.log (I’d point it out if you ever post it). That’s how OH works. An event happens and rules that trigger on that event run.

Indeed, and with a logInfo there it tells us whether there is something wrong with triggering the rule at all or if there is a problem inside the body of the rule.

Every piece of information helps.

But did it exist when the rule was triggered? These errors occur during startup. You’ve mentioned an RPi 2 (incredibly useful information we could have used up front BTW). The recommended hardware in the docs for OH 3 is an RPi 4. Given this, it seems highly likely that Items are still being loaded after the rule engine is started.

Take your time and take the advice of @Max_G on the other thread and do this one rule at a time. And when you do take a step back and consider you approach. You can’t force OH to become single threaded. Any attempt to do so is usually going to backfire.

Don’t worry about trying to prevent Time of Check, Time of Use (TOCTU) problems. First of all, you can’t because Item states etc are outside of your control in rules. There’s nothing you can do to prevent OH from updating and Item even while your rule is running. Any attempt to try to force this will not only fail, it’s likely going to backfire. And even if so, this is a lot of effort to avoid an error that is very unlikely to occur. My OH system has three to 10 events per second and I’ve never seen it. Any individual Item simply isn’t going to be able to change that fast where this is likely to happen. And if if does happen, the effects are going to be minor and eventually corrected almost immediately thereafter.

Instead, what you should do in your rules:

Use the implicit variables. These tell you the state of the Item or the command that actually caused the rule to trigger. There’s no TOCTU problems possible when using this.
Understand and use the fact that in OH 3 any individual rule can only have one instance running at a time. This is different from OH 2 where if one single rule triggers five times, five copies of it are running in parallel. Use this to your advantage. For example, coupled with the implicit variables, it doesn’t matter if the Item changes state out from under you. You’ve got the state that triggered rule, use that. The rule will likely be triggered again once this rule exits on the new state.
Make your rules as independent as possible. Limit how much it depends on global variables. Definitely don’t share locks between rules (in OH 3 I’ve yet to see a case where a reentrant lock was required or even where one didn’t cause more problems than it solves).
Make your rules run fast. Avoid sleeps. Use Timers so your main rule can get in and get out as fast as possible.
Use events and/or timers to sequence events, not loops and sleeps.
Take advantage of the capabilities and features OH offers. Store stuff in Items instead of global variables. Take advantage of persistence’s restoreOnStartup to populate your Items with what ever state they had when OH stopped last before the rule engine starts. Use Timers to schedule things. Instead of waiting, trigger on the event. Etc. Put Items in a Group and use the “Member of Group” triggers or sendCommand to a Group to command a whole bunch of Items.

Peter_Novotny · March 1, 2023, 9:42pm

Sorry, seems like I have not provided good information:

I have 2 physical Raspberry pi computers running openhab:

Raspberry PI 2B, Raspbian 9.11, OH 2.5.1-2 (working)
Raspberry Pi 3B+, Raspbian11.6, OH 3.4.2-1 (troublemaker)

Using Basic UI in both cases. Both computers share one set of NRF and 433 communication modules connected to gpio. Now creating 2nd set. To stabilize power there is 47 uF capacitor added to NRF power input pins. 433 will not get it because it increases noise in my thermal sensors transmissions up to loss of signal.