Rules not completing randomly

Hi Everyone,

I am not sure if this is the best place to put this. I have a pretty large system (461 items, 148 rules, 156 things). I have noticed with some my more complicated rules, the rule just stops processing and doesn’t complete the rule. Not firing certain things and making my system very unstable. This started after upgrading to OH3 but the only thing that stands out are the timers. Has anyone else noticed this and do you have any way to solve it? For now I use expire but that gets even more messy.

Josh

Not without a whole lot more information.

The size of your system is not that large actually so I doubt the size has anything to do with this.

Do you have long running rules? You can see this in MainUI’s Rules page. When a rule is running it changes from IDLE to RUNNING.

Do the events on the Items continue to occur (see events.log) and the rules seem to not run? Or do the event stop happening?

Errors in openhab.log? In particular out of memory errors?

What’s the CPU/RAM situation on the machine when the problem occurs?

Thank you for your reply.

Just noticed my event log isn’t recording. Going to need to look into that one. Nothing in the logs that stands out. It seems like the rule just stops processing and what ever the remaining tasks don’t process. Longest rule is maybe 30 seconds so i don’t think its overly long times. No out of memory issue that I can tell. I am on a xubuntu mini pc machine with 4gb ram 4 core celeron.

That’s really long. Most rules should complete in less than a second. If a rule triggers while it’s already running, that trigger gets queued up. Over time you can have hundreds or thousands of triggers queued up waiting for their turn to run, effectible making that rule useless.

You’ll need to get events.log working because you need to correlate the rules and events that trigger them.

Without the rules and a whole ton of logs though :person_shrugging: . Typically rules don’t just stop working and they don’t just stop in the middle of the rule.

This does sound like threads getting used up.

In OH 2 I’d definitely agree. But in OH 3 each rule get’s it’s own thread. So even if you spam a long running rule, only the one rule and therefore the one thread would be impacted. So if it is a thread problem, the source is outside of the rules.

So instead of using timers I should in fact be using expire or is there something else more elegant?
Here is an example of one of the rules I am running. I have to have delays as parts of the system do their thing physically.

rule "Living Room Cable"
when
	Item LR_CBL changed
then
	if (LR_CBL.state == ON){
		HDBTString.sendCommand('3B8.')
		createTimer(now.plusSeconds(2), [|
			LRhdmi.sendCommand('Cable3')
			if (LR_TV_Power.state != ON){
				LR_TVPower.sendCommand(ON)
				createTimer(now.plusSeconds(3), [|
					LR_TV_KeyCode.sendCommand('key_return')
					if (LR_SZPower.state != ON){
						LR_SZPower.sendCommand(ON)
						LR_SZSource.sendCommand('SAT/CBL')
					}
					else if (LR_SZSource.state != 'SAT/CBL'){
						LR_SZSource.sendCommand('SAT/CBL')
					}
				])
			}
			else {
				return;
			}
		])
	}
	else if (LR_CBL.state == OFF){
		LR_ArtTimer.sendCommand(ON)
	}
end

Is there a better way to handle this?

additionally these worked fine in OH2 it was after upgrading the began to fail if that helps. Same for the event logs. The last one was reported before the upgrade and I never noticed and I can’t figure out how to fix it.

No one is saying you should not use Timers. We are saying that you shouldn’t use sleeps or other long running commands (e.g. executeCommandLine) on a rule that triggers rapidly.

But now that we know you are using timers that raises other questions. Is it the Timers that fail to run? Or is it the rules that fail to exit?

Who said that would help?

Great :slight_smile:

I don’t see any special problem with that. The base rule will be over in milliseconds.
If your triggering Item were changing every second, it would spawn more and more anonymous timers over time - but you’d know about that, I’m sure.

The only oddity is the use of return; in the Timer code. It’s pointless and cannot work, as there is nowhere for a Timer to return to.
but
It’s just possible that is messing up garbage collection - so I’d definitely get rid of that.

Sorry just heard 30 seconds in a rule was a long time so thought maybe I was doing something wrong. Still getting used to programming openhab and some of my techniques maybe a little long winded but its getting better as I go lol.

I have removed the return; from the rule.

I have seen a timer or 2 fail. I will do a few tests and see what happens when I notice a failure. Anything in particular I should be looking for?

The way the timers work -

Rule creates independent timer, scheduled for future.
Rule completes and exits.

Rule runtime =milliseconds.

Later, system scheduler says “aha, that task is due”
The timer code is now run, completes and exits within milliseconds.

Timer runtime = milliseconds.

The traps people fall into involve wait loops and such, where the code does not complete and exit. (Generally, they fall in the trap by trying to avoid using timers!)

It’s important to understand how timers work. A timer will schedule a chunk of code to execute later. So from a certain point of view it’s running outside the rule.

So, assuming the rule itself is triggering when it’s supposed to and running to completion every time it’s triggered, the problem is with the timers. That changes the whole focus of what to look at.

Add lots and lots and lots of logging. You want to see when the rule triggers. You want to see when the rule exits. You want to see when a timer triggers. You want to see when the code for a timer exits.

Make sure there are no reloads of this .rules file nor restarts of OH while testing. That will cause scheduled timers to fail and generate errors.

You also have cases where even if the code executes it won’t do anything. Add else clauses and log that out so you can tell the difference between the code not running and the code ran but the conditions made it do nothing.

Thank you guys for the help, I think I mostly sorted my issue. Still not sure the root cause and I am still missing my event logs but at least I have the rules working pretty consistently. I have combined timers with expires, longer times use the expires and then use timers for the shorter parts of the rules. As you suggested it seems like it may have been holding up things, though I am not sure why some items just never triggered at all even if delayed. At least it seems relatively stable now and will keep an eye on it and report back if I notice any failures.