Rule engine silently stops running some rules after cron jobs

  • Platform information:
    • Hardware: x86_64/2G/120G
    • OS: Debian 8.10
    • Java Runtime Environment: openJDK 1.8.0.131
    • openHAB version: 2.2 Stable
  • Issue of the topic:
    This has cropped up since updating to 2.2, I haven’t had this issue before.
    It seems like sometimes after a rule with a cron job trigger fires, the rule engine will silently stop running some rules. This state will persist for hours, and may recover on its own, but usually I have to stir it back into life again by restarting the “Eclipse SmartHome Rule Runtime” bundle.
    There’s no useful log output I can show you, because there’s no errors in the log file, it just stops processing. I have verified this by putting logInfo lines around the affected rules and seeing no output to the log file.

Here’s one of the rules that often gets dropped:

rule "House Power Calculation"
when
	Item ER_GridPower changed or
	Item ER_SolarPower changed
then
	var int HousePower = ER_GridPower.state as Number + ER_SolarPower.state as Number
	if(HousePower != null) {
		cER_HousePower.postUpdate(HousePower as Number)
	}
	if(cER_EnergyAvailable.state == ON && ER_SolarPower.state < 100) {
		cER_EnergyAvailable.postUpdate(OFF)
		if(AC_Lounge_Switch.state == ON) {
			AC_Lounge_Switch.sendCommand(OFF)
		}
	}
end

The event log shows that the items are definitely receiving updates, but the HousePower item no longer gets updated.

You Java version says openJDK, ist it Zulu or really openJDK?
If it is not Zulu, please change to Zulu or Oracle, as the prerequisits clearly say that openJDK is not recommended due to known issues.

Hi - same issue here:
All rules, which are triggered by events, such as “Item hanged” run fine, thoose which are triggered by Cron have sometimes issues and stop working. A fix is in my case a reboot of the system (which is for sure not a solution).

  • My platform is a RaspberryPi with the latest version of Openhabian on it

The rules, which are stopping are super simple, as this one:

import java.util.Locale

rule "Delegate Temp - Sensor_Lowerdeck_Livingroom_Temp" 
when 
Item Sensor_Lowerdeck_Livingroom_Temp changed or Time cron "0 0/1 * * * ? *"

then 
var Number temp = ((Sensor_Lowerdeck_Livingroom_Temp.state) as DecimalType).floatValue
var Number vCount = (Controme_Write_Counter.state as Number)+1
executeCommandLine("curl -X GET  http://192.168.10.233/set/28_FF_11_12_03_E2_66_16/"+String.format(Locale.ROOT,"%.2f/", temp)) 
postUpdate(Controme_Write_Counter, vCount)
logInfo ("Controme Sensor_Lowerdeck_Livingroom", "Controme Sensor updated  Sensor_Lowerdeck_Livingroom_Temp "+String.format(Locale.ROOT,"%.2f/", temp)+"Count"+vCount)
end 

(I included now a counter in the rule in order to see, when it stopped…)
Overall a mystery - the log files do not show anything suspicious and other rules keep on working…

There have been others, which had similar issues cron-triggers-for-rules-not-working or Rules stop processing and a general discussion about a Cron Scheduler Replacement seems to be going on. Looking back on my recent experiences - a change would be good.

I have updated to Zulu 8.27.0.7 (1.8.0_162) and Openhab 2.3-SNAPSHOT and this problem still persists. Many rules just silently failing to run.

Still on Snapshots and the rules engine still silently stops running some rules after a while. Maybe it’s leaking threads or something? Does anyone know where I should post to get some action on this?

(I have begun the process of migrating rules that fail over to Node Red, so if it doesn’t get fixed it will eventually be moot, it’s just a shame to lose almost 8000 lines of code)

perhaps this helps:
https://community.openhab.org/t/rule-execution-delayed/39821/5?u=alex2016

for me this works, modify thread count to 20

Thanks! I’ve modified it to 30 and I’ll see how it goes

I have about 3000 lines of rules and it runs smoothly. Do you have thread::sleeps? They’re bad as they consume and block available CPU threads. Maybe that’s why there’s no more left for the rules?

see

I did use Thread::sleep extensively in my rules, but as part of debugging this problem I removed almost all of them, and it didn’t help.
Even so, none of them were very long, usually between 100-500 ms.

Okay so the expanded thread pool has merely extended the time it takes for the rules engine to stop processing rules. So it seems clear that my rule set is still leaking threads at some point. Does anyone have some ideas about how to diagnose this? Is there a way to see how many threads are in use, or which rules they are being used by?

I have the same problem… My Rules will stop…

at the moment will check, if it comes when 2 Items from “when” change at the same time…
#######
Since i have only one Point an “When”, i have no problems any more.

I have been having the same problem for several weeks. My cron rules work, but event triggered rules just silently stop triggering. thread pool is set to 50.

I think the biggest problem is not that it happens as such, but that it happens silently - no errors, no messages, rules just - stop.

For what it’s worth I managed to find the rules that were causing my system to leak threads.

It was a “slow fade” thing that I’d cobbled together from some examples I’d found around the place. If I had to guess I’d say the problem was the nasty hack using the while loop to tick the dimmer up or down each time a timer expired. The faders worked but I’m guessing they left a thread open each time they ran, so after a couple of days I’d just run out of threads.

rule "Lounge Lamp Fader"
when
	Item fadeLounge_Lamp received command
then
	Thread::sleep(100)
	var Timer LL_Timer = null
	var Number fadeTarget = fadeLounge_Lamp.state as Number
	var Number Dimmer = Lounge_Lamp.state as Number
	var Number DimmerCheck = Dimmer
	var boolean LLcancel = false
	
	if (fadeTarget > 1 && fadeTarget < 12) fadeTarget = 12
	logInfo("Fader","Starting slow fade on Lounge Lamp. Current: " + Dimmer.toString + "%, Target: " + fadeTarget.toString + "%")
	
	while (Dimmer !== fadeTarget && !LLcancel) {
		if (Dimmer == DimmerCheck) {
			if (Lounge_Lamp.state as Number - Dimmer < 3 && Lounge_Lamp.state as Number - Dimmer > -3) {
				if (fadeTarget - Dimmer > 0) {
					Dimmer += 1
					if (Dimmer < 12) Dimmer = 12
				} else {
					Dimmer -= 1
					if (Dimmer < 12) Dimmer = 0
				}
				Lounge_Lamp.sendCommand(Dimmer)
				LL_Timer = createTimer(now.plusSeconds(fadeSpeed)) [ |
					DimmerCheck = Dimmer
				 ]
			} else {
				LLcancel = true
				logInfo("Fader","Lounge Lamp fade cancelled. Dimmer = " + Dimmer.toString + "% Lamp = " + Lounge_Lamp.state.toString + "%")
			}
		}
	}
	fadeLounge_Lamp.postUpdate(Lounge_Lamp.state.toString)
	logInfo("Fader","Lounge fade complete. Lamp = " + Lounge_Lamp.state.toString + "%")
end

I’ve replaced it with a flow in NodeRED which works better anyway, but if anyone cares to rework it into something that wouldn’t leak threads I’d be interested to see it.

Anyone else diagnosing this issue, I’d recommend having a look for while loops and try removing them temporarily and see if your thread pool stabilises.

I still think we need better tools for diagnosing this problem beyond “remove a rule and wait to see if it’s fixed”

2 Likes