Rules stop execution

dascrip · August 8, 2019, 11:47am

Platform information:
- Hardware: CPUArchitecture/RAM/storage: Raspberry 3 B
- OS: what OS is used and which version: jessie
- Java Runtime Environment: which java platform is used and what version: openjdk version “1.8.0_152”
- openHAB version: openHAB 2.4.0-1
Issue of the topic: Rules stops execution after a while.

After running for months with a stable configuration, it started to happen, that rules stopped beeing executed. This could happen after hours or just some minutes after restart - and it is not that it stops for all rules only a few (but it seems always the same). The openHab system stays operational and responsive all the times.
There is no indication in the logs, so that I could start to check from here - they just stop.

I haven’t applied major changes to my rules/items in the last weeks, due to holiday season (never touch a running system), except for applying the latest updates by apt-get (belly feeling vise I would say, it started after and apt-get update

I already went through the article:

and eliminated all Thread::sleep, which in fact only concerned two rules.

I tried to understand, what is loaded or blocking and applied some commands which I took from different articles - but I have the feeling, that the system is not overloaded:

shell:threads --list |grep "RuleEngine"
4842 x pipe-grep "RuleEngine"                                                                        x RUNNABLE      x 74       x 60

shell:threads --list |grep "safeCall"
4277 x safeCall-88                                                                                   x TIMED_WAITING x 4588     x 4100
4345 x safeCall-89                                                                                   x TIMED_WAITING x 4364     x 3870
4648 x safeCall-94                                                                                   x TIMED_WAITING x 2091     x 1860
4761 x safeCall-96                                                                                   x TIMED_WAITING x 1280     x 1190
4790 x safeCall-97                                                                                   x TIMED_WAITING x 840      x 790
4877 x safeCall-98                                                                                   x TIMED_WAITING x 121      x 120
4902 x pipe-grep "safeCall"                                                                          x RUNNABLE      x 14       x 10

shell:threads --list |grep "thingHandler"
177  x ESH-thingHandler-1                                                                            x TIMED_WAITING x 55458    x 51400
178  x ESH-thingHandler-2                                                                            x TIMED_WAITING x 59569    x 55110
179  x ESH-thingHandler-3                                                                            x TIMED_WAITING x 53985    x 49780
180  x ESH-thingHandler-4                                                                            x TIMED_WAITING x 51523    x 47770
181  x ESH-thingHandler-5                                                                            x TIMED_WAITING x 57060    x 52880
4901 x pipe-grep "thingHandler"                                                                      x RUNNABLE      x 233      x 190

Honestly I have no idea where to start here to narrow down the cause of the issue.
Could someone advise, at least to understand why the rules stop to work (i.e. error code)?

BTW: Something I notice, when I restart openHAB, persistent values changed back to their former values from time to time (i.e. I have a status of an item, which I change manually and it is always in the status before I changed it after restart).

Thanks in advance.

rlkoshak · August 8, 2019, 2:27pm

What is common about the Rules that stop? Are they all cron triggered or triggered by Astro Channel events? If so it’s important to realize that those Rules get executed out of a different thread pool.

Maybe put a logging statement at the start and end of these Rules that stop running. Then watch the logs. Do you see one or more start running but never finish?

If you are running 2.4, there has been no change to OH since January of this year. The apt update would not have

This would be completely unrelated and it is unclear what you mean. Are you saying that the restoreOnStartup policy in your .persist files is not working?

dascrip · August 8, 2019, 7:45pm

Common is, that all of them would be triggered by an item status. All the cron triggered rules run from my perspective. I had some rules with Astro Binding, but I disabled at the first shot, as they where the latest implemented rules and I thought they caused the issue.

I will do this even it would be a lot of work, as I have many rules ;-> Any other chances to see, if a rule stuck during operation?

For sure, no update to openhab, but update to other packages - would it be possible, that there is a correlation to the issue?

rlkoshak · August 8, 2019, 10:02pm

There are some commands in the Why have my… post to see how many threads are running, but it won’t tell you what Rule(s) are stuck.

Anything is possible but it’s unlikely.

NCO · August 10, 2019, 6:15am

I have a similar issue I did not have for months.

But in my case all rules work fine BUT the cron rules.
I detect this by:

// Check whether the cron rules are working properly
rule "Does cron work?"
when
	Item Astro_Sun_Azimuth changed
then
	if(!System_started.changedSince(now.minusMinutes(45),"jdbc")) {
		if(ActTime.state != NULL && ActTime.previousState(true,"jdbc").state != NULL && !ActTime.changedSince(now.minusMinutes(60),"jdbc")) {
			logInfo("system.rules", "ActTime (" + ActTime.state + ") did not change since 60 min - is there a cron Problem?")
			sendTelegram("OH_TeleBot", "ActTime seit 60 min nicht aktualisiert: " + ActTime.state.toString + "\nCron Probleme?")
		}
	}
end

I don’t use Thread::sleep at all (after reading “long running rules” from @rlkoshak months ago

Anyway. I don’t have a clue what’s causing this.

rlkoshak · August 10, 2019, 3:20pm

Cron triggered rules run out of a different thread pool that only allows 2 Cron triggered rules, or Astro triggered rules, or Timers to run at a time, so it’s even easier to run into trouble. On the newly released 2.5 ME that had been increased to 10. But check rules and Timers for potential long running code or code that never returns.

NCO · August 11, 2019, 6:07am

Thanks - I didn’t know that.
I will check whether I can use another approach instead of cron and will check your sensorReporter stuff.

So, if all long running rules like my cron rules would end, the cron should start working again (theoretically)?
Wouldn’t it make sense to report, that the number of allowed threads has been exceeded?

rossko57 · August 11, 2019, 10:15am

Mayyyybe - the disaster is not that you’ve run out of threads. That’s a normal enough event at busy times, it’s all nicely queued, and when one of the running jobs finishes in 5 or 10 milliseconds we’ll get a turn next.
The disaster is the job that doesn’t stop.