[SOLVED] Rules don't trigger since 2.5M2

Hi all,

I’m running Openhab 2.5M2 on an Ubuntu MiniITX machine with a fairly high count of connected devices. The setup was usually quite reliable with the 2.5M1, but creates a lot of headache since 2.5M2…

Rule which worked before do not trigger anymore, even if the contents are more than simple.
AN example of the last rule not firing:
Alexa changes the Item “Vacuum” to ON:

Switch Vacuum   "Vacuum"     ["Switchable"]     { expire="1m,OFF" }

This can be observed in the logs:
2019-08-17 11:07:25.760 [ome.event.ItemCommandEvent] - Item ‘Vacuum’ received command ON
2019-08-17 11:07:25.774 [vent.ItemStateChangedEvent] - Vacuum changed from OFF to ON

The rule which listens for received commands of “Vacuum”, however, does not trigger, the logs do not show the logError-output:

import java.util.concurrent.locks.ReentrantLock

val ReentrantLock lock_echoVacuum  = new ReentrantLock()

rule "Echo Vacuum"
when 
    Item Vacuum received command
then 
    logError("Vacuum", "Rule triggered")
    lock_echoVacuum.lock()
    try {
        if (receivedCommand == ON) {
            xiaomi_Robot_actionControl.sendCommand("vacuum")
        } else if (receivedCommand == OFF && xiaomi_Robot_actionControl.state != "pause") {
            xiaomi_Robot_actionControl.sendCommand("pause")
        } else if (receivedCommand == OFF && xiaomi_Robot_actionControl.state == "pause") {
            xiaomi_Robot_actionCommand.sendCommand("dock")
        }
    } finally {
        lock_echoVacuum.unlock()
    }
end

This, however, is just an example. After restarting openhab, it may be random other rules that are not working anymore, rules not working before the restart may be fine again. The general setting is always the same, the item changes in openhab, bute the rule doesn’t react.

I’ve cleared cache multiple times after restart, restarted openhab and the machine multiple times - no change (Except for a changing set of not working rules).
During startup, the log shows the known error messages of mqtt binding not being able to parse the zigbee2mqtt messages correctly, which is not related to this issue. Besides that, nothing related to the rules.

As I’ve followed the recent bugs & workarounds for the Milestone build, I’ve even tried to increase the org.quartz.threadPool.threadCount in quarz.properties to 30, to no avail.

Any help would be highly appreciated!

Thanks,
Chris

Can’t really help there other than to advise to get rid of using locks. The rules DSL in OH isn’t really thread safe and never has been so you were probably just lucky it worked in previous versions and
you cannot expect this to work in the future.
Check out design patterns how to organize your rules with a need to lock.

1 Like

Lock problems wouldn’t stop your example rule triggering and logging.

A good idea, but note that others with more obvious boot sequence problems have found that a second reboot after allowing new cache to built sorts some things out.
Make sure openhab.log has entries like
- Loading model 'my.rules'

Thanks for your feedback.
The lock is not “required” to successfully execute this rule, but was intended to prevent multiple alexas executing the same rule.

I will check if I can remove the locks here and there, but as @rossko57 mentioned, this should not stop the rule from executing. I’d expect to have at lest the logError before the rule may even crash from the lock.

Edit:
I’ve removed all locks from the rule file, but the rule still does not get triggered.

Thanks, but I can rule that out as well. Today morning was the second or third restart of OH since reboot and cleared cache.

The startup log looks pretty fine tbh, it even loaded all items before loading the rules.
This rule file has been loaded as well:
2019-08-17 07:58:36.471 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model ‘alexa.rules’

In the meanwhile, I’ve found out that other rules in the same file don’t trigger as well. Changing the file contents, so that OH reloads the rule, doesn’t change anything either, though the log says
2019-08-17 13:27:15.787 [INFO ] [el.core.internal.ModelRepositoryImpl] - Refreshing model ‘alexa.rules’

Any more hints?
Have there been any changes wrt loading or triggering rules since 2.5M1?

Not necessarily but they could.
And assuming the OP has not changed anything about his rules that’s not too unlikely an assumption.

Logging might have changed. Try debug level for org.eclipse.smarthome.model.script.Vacuum

I tend to confirm this.
Going through the logs in more detail, I noticed a few things that made me curious:

  1. One rule of this file got executed once, afterwards all rules of this file were skipped.
  2. When searching for all files where ReentrantLocks were used, this immediately remembered me of all the rules that failed in the past days
  3. After removing all ReentrantLocks from all rules and restarting OH, everything seems to be working fine. However, it may just take some time to trigger a rule that won’t get executed…

I will keep an eye on this issue…

Thanks for pointing me to this direction @mstormi!

See Why have my Rules stopped running? Why Thread::sleep is a bad idea for a full discussion. The focus is more on Thread::sleep, but the problem is even more pronounced with locks because of an error occurs and a lock doesn’t get unlocked, all your rules will stop.

1 Like

Actually, I’ve seen this thread a while ago already, that’s why I used try{} finally{} in all rules with locks.
TBH, I thought that the engine skips rules with active locks instead of waiting, I may try to add the early return statements.

Until now, everything is still running fine (even without the locks, I’ll probably keep it like that).

Thanks again for all your help!

There are no optional triggers, so the rule should(!) always start.
You might code for locking; whether your code waits or skips is up to you, your choice.
Example

A regular xxx.lock() always queues, hence the risk of a seizure when enough queued rules have grabbed threads.

Finally is not guaranteed to run under certain types of errors. Hence the inherent unsafeness with using locks.

It does not. If it needs to aquire a lock it will block on that rule until the lock is available, consuming one of the Rule runtime threads.

It is possible to have a Rule ignore the event when the lock is held (i.e. check if the lock is available and if not return) but that approach does’t support all use cases.

In truth, you really really shouldn’t be using locks. There is almost always a safer way to do it in Rules DSL.

I’d like to qualify that - you really really shouldn’t be using locks unless you code defensively.
I do have locks - e.g. a horrible sequence that grabs a camera snapshot and processes image file with captions, blah - and needs to act strictly “one at a time”.

But the code assumes every Item may be NULL or UNDEF, every variable nonsense or null, every transaction may fail, and deals gracefully with each case.
Think of everything so that try-catch is not needed. The root of the issue is not locks, its the error handling.

In that first example, I would be suspicious of
if (xiaomi_Robot_actionControl.state != "pause")
might error if state NULL (which is not a string)
if (xiaomi_Robot_actionControl.state.toString != "pause")
will I know always work.

If you cannot write bomb-proof code, do not use locks :wink: It requires some imagination and paranoia,

It’s impossible to code completely defensively. For example, if you have a type error inside a try block, your finally will never run. So, for example, if you forget to check for NULL and UNDEF before casting a Number Item’s state to a Number, the Rule will crash, finally will never be called and your lock will never get unlocked.

There are other cases I’ve encountered where the finally never get’s called but haven’t had time to dig really deep into it to see what about those errors are common. But one that comes to mind is if the Rule triggers before OH is ready for that Rule to run. You can get an error like “No such symbol NULL” on your if statement and if that is checked after the lock is locked, your Rule will never run again. If the Rules block and wait to acquire the lock, none of your Rules will run after a very short while.

Given this, I posit that it is impossible to think of everything. OH’s startup has too many problems and Rules Engine has too many problems that we as Rules developers have no control over. That is why I now still maintain that ReentrantLocks should not be used in Rules under any circumstances.

If you need to process all of the events that trigger a Rule, use the Gatekeeper Design Pattern (the Queue example) which pushes the lock into the library where any errors are handled for you and the lock is guaranteed to become unlocked.

If you want to skip the Rule if there is an instance of the Rule already running, use an AtomicBoolean which lets you check and set the boolean flag in a thread safe transaction, again pushing the actual locking into the library code where the lock will be guaranteed to be unlocked.

1 Like