Rules engine died, but came back to life

Hi OH community,

Running into a strange issue. Hoping someone can help?

My setup on OH 2.5M3 has been running fairly stable for several months. Suddenly, a few days ago the rules stopped running. I’m not new to OH, so after restarts and general log reviews didn’t solve the problem, I cleared the cache and tmp folders. This still didn’t solve the issue. Finally, I resorted to my plan C, which was updating to 2.5 release build.

After letting the system load for 30 minutes after the upgrade, the rules still didn’t come back and by then it was late, so I just left it out of frustration.

The next day, the rules worked!

Can anyone share ideas on what may have happened? Very unsettling…

Thanks!

What kind of rules? Vaguely similar symptoms reported some time ago with NGRE rules, should be fixed.

How do you know the rules didn’t run? Could it be lack of trigger events? Could it be lack of actioning results?

The rules are just traditional .rules files, same as I’ve run since OH1.8.

I knew the rules weren’t running since lighting (zwave and hue) wasn’t working, even after several restarts - even for the most basic rules. Karaf would show the triggers (i.e. zone1 changed to open), but the expected related actions never showed in the logs as they normally would.

I also ruled out zwave and hue bindings because I was able to control them as normal from mobile apps.

Even more confusing: i have a cron rule in the lighting.rules file which prints to log every 10 minutes since I’ve had issues with the rules engine stopping before (but not permanently, like in this latest instance)
This part of the rules file was working, but none of the other ones were!

When you’re trying to analyze, you have to look deeper. That superficial effect could be because triggers aren’t happening, rules aren’t executing, or commands aren’t being actioned.

Massive clue. Rules triggered from cron run using a thread from the “timer” pool, not a thread from the usual “event triggered” pool.
I’d suspect one or more of your rules hanging up, waiting on something else forever.

Ah, both very good points, thanks! Particularly the second one. I didn’t realize cron rules have a different thread pool.

Any suggestions on where to start debugging? I’m particularly confused as none of the rules had changed for months when the problem started. I won’t rule out an edge case problem that may have been hiding there for a long time, but I’m a little lost as to how to trace it.

Some things I tried already:

  • restart openhab
  • verify no hanging rules were open by running in karaf
shell:threads --list |grep "RuleEngine" |wc -l 
  • delete and recreate lighting.rules with just one simple rule (didn’t fix the problem)
  • created new test.rules file with one simple rule (didn’t fix the problem)
  • deleted /cache and /tmp folders (didn’t fix the problem)
  • upgrade openhab (fixed the problem, but not until the next day)

See the massive thread for advice and tools

“It worked last week” is common enough, but it doesn’t help us.

You know the rule engine has not died, because it’s running your cron rule regularly, right?
Need to focus on what’s different about the rules you are expecting to run, but don’t. That would be -
(a) the events that should trigger them - you have a clear view of events in your events.log
(b) as said, they draw an execution thread from a different finite pool

Editing (reloading) rule files in-flight is not I think going to free any resources, so won’t help identify suspects.

Thanks for the pointers, see what I can find (if I can replicate the problem)