Unreliable rule triggering (cron) with openhab 2.3

rlkoshak · September 14, 2018, 4:18pm

I can’t say for certain that the finally can be reliably relied upon or not. I have some anecdotal evidence dating back to the 2.0 time frame that you can’t, but I’ve no recent experiences to say whether this is still the case and I didn’t have the time or expertise back when I did see this to fully explore the problem.

Anyway, I agree with Markus. You can program Fortran in any language and this looks like programming Java in Rules DSL. I can’t say if they are causing any problems per se, but that does stand out. And if you are running on an RPi there is some evidence that using primitives (.intValue) greatly increases the amount of time it takes to parse the .rules files for some reason (NOTE: I have more evidence for this then I do for finally not always running).

Anyway, as I documented here ((OH 1.x and OH 2.x Rules DSL only] Why have my Rules stopped running? Why Thread::sleep is a bad idea), any time I see Thread::sleep, locks, executeCommandLine, or sendHttp*Request calls I immediately think we have a running out of threads problem.

Another problem could be that you have too many cron triggered rules trying to trigger at the same time. There are only two threads available for cron triggered Rules so that could be a problem.

But why are you using a lock in a cron triggered Rule in the first place? What problem are you trying to solve with this lock? Are these the only two Rules that use this lock?

It isn’t. There really is no way the lock could be persistent across reloads of the same .rules file let alone a reboot of OH. Well, actually there is one way I know of. I’ve seen it reported that sometimes Timers that are created and then the .rules file is reloaded the Timers stick around and still trigger. So if you had a reference to your lock inside the body of such a Timer then it could stick around on the reload of the .rules file, but not a restart of OH.

I suspect what is happening is there is some edge case or unexpected data taking place during the Rules that trigger on System started in those cases that is preventing the lock from becoming unlocked.

Personally, were this my code, I’d focus on writing this in a way that either doesn’t require the locks in the first place or in a way that centralizes the lock in one Rule (see Design Pattern: Gate Keeper for example). The code will be less brittle and less complex overall as a result.

If you do want to debug this problem you can first look to see if you are in fact running out of cron triggered Rules threads (you only get 2 by default). After doing what Markus recommends you should be able to see if Rules are trying to trigger or not. Then you can see how many of your threads are in use using the commands Scott posted here.

Why did this code work before and doesn’t work now? I cannot say. Maybe quartz had more threads on those earlier versions and 2 is just not enough the way this code executes. Maybe the finally isn’t running (in which case I’d be very interested to know that is the case) so the lock isn’t being unlocked. Maybe it is just a change in timing and this has always been a problem but the timing is different when later versions of OH run.