Rules stop processing

bbubbat · March 15, 2017, 12:25pm

Don’t get me wrong. I’m not saying that my other rules do not stop. Just that last time it happens, it were the time based rules. Nevertheless also other rules do stop. But I don’t know when it happens and if all rules are stopping at the same time. If the rules stop, I also need to restart OH.

I just set the rule runtime to debug, so that at least I have a little insight, what happens.

log:set DEBUG org.eclipse.smarthome.model.rule.runtime

Don’t know if this is enough for tracing the problem, but at least it’s a beginning.

steve1 · March 15, 2017, 12:35pm

@Moxified and @bbubbat

@Kai will probably clarify what he meant, but my guess is that he meant the issue was created for the wrong repository. The rule engine is in the Eclipse SmartHome codebase but the issue was created in openhab-core.

bbubbat · March 15, 2017, 12:37pm

Yes, that was my fault. I wasn’t aware of this.

Kai · March 15, 2017, 12:49pm

Yes right.
And my second statement meant that an issue saying “rules stop working” won’t really help if there is no way to reproduce it in any way. The only report with a reproducible scenario is eclipse/smarthome#3132, which has been fixed - so my hope would be that your problem has the same root cause and thus will be fixed with this as well. So please try the very latest OH distro snapshot and see if it helps.

Moxified · March 15, 2017, 12:53pm

OH, gotcha. Thanks for clarifying.

@bbubbat I can’t definitively lock it down to cron rules. I have a cron rule that runes every 10 minutes so there is always one relatively close to it stopping. Especially because my rules aren’t overly busy.

I can see in at least one case, the mentioned script fires at say 7:20, then at 7:30 and then 7:40 it doesn’t fire but at 7:46 a different (non-cron) rule does fire as the last rule to fire.

I already have esh-model-rule-runtime as well as esh-model-script-runtime set as debug. They don’t offer that much more.

@Kai, thanks! I try to avoid snapshots but I’ll have to look into switching from stable. I fully get the need to reproduce… I just can’t find anything other than time that does it.

I looked at my cron and it does have a “?” but it seems in that issue they were having trouble with specific day of week? I don’t fully follow what they are saying.

My cron: Time cron “0 */10 * * * ?”

bbubbat · March 15, 2017, 1:48pm

@Kai I use the latest official docker 2.1.0-SNAPSHOT published two days ago. So I assume that the changes are already in it. All eclipse smart home bundles do have the version number 0.9.0.201703131829

Do you know how I can trace the problem? Which packages should I set to debug?

Kai · March 15, 2017, 1:55pm

I’d like to ask @kgoderis for help as he has implemented the CRON library - he should be the best person to tell, what might be wrong or how to best debug it.

kgoderis · March 15, 2017, 4:18pm

@bbubbat @Moxified There was a bug with Day Of Week in the cron expressions which got fixed with https://github.com/eclipse/smarthome/issues/3132. I fed “0 */10 * * * ?” into the test method related to that issue, and it seems to run fine over here

@bubbat Can you share the cron expression you are using?

Karel

Moxified · March 15, 2017, 4:29pm

List of mine:

Time cron "0 0 23 * * ?"
Time cron "0 0 5 * * ?"
Time cron "0 0 7,22 * * ?"
Time cron "0 */10 15-22 * * ?"
Time cron "0 10 23 * * ?"
Time cron "0 */10 * * * ?"
Time cron "0 */30 * * * ?"
Time cron "0 0 6 ? * 1"
Time cron "0 0 21 ? * 1"
Time cron "0 45 5 ? * MON-FRI"
Time cron "0 0 15 ? * MON-FRI"
Time cron "0 30 7 ? * 7"
Time cron "0 30 21 * * ?"
Time cron "0 0 0 * * ?"
Time cron “25 0/5 * * * ?”

Sorry, there were more than I anticipated… The only one that I have seen around a lockup was the Time cron “0 */10 * * * ?” BUT, that doesn’t mean one of the other ones wasn’t doing something in the background.

kgoderis · March 15, 2017, 4:31pm

@bbubbat @Moxified

If you want the full output with respect to the scheduling of the cron expressions, then the best thing you can do is to start up the runtime in debug modus, connect to the console (via ssh) and issue

log:set TRACE org.eclipse.smarthome.core.scheduler

Karel

kgoderis · March 15, 2017, 4:36pm

@Moxified I tried 5-6 of your expression and the generated execution times are correct over here

bbubbat · March 15, 2017, 4:43pm

ist of cron rules I use:

Time cron "0 * * * * ?"
Time cron "0 0 23 * * ?"
Time cron "00 15 06 ? * MON-FRI"
Time cron "00 30 06 ? * MON-FRI"
Time cron "0 0 1 * * ?"
Time cron "0 0 9 * * ?"
Time cron "0 30 5 ? * MON-FRI"
Time cron "0 0 7 * * ?"
Time cron "0 0 22 ? * SUN-THU"
Time cron "0 0 23 * * ?"

Moxified · March 15, 2017, 4:45pm

I appreciate it. They all worked fine in OH 1. I pretty much pulled the includes out of the rules files and dumped them on my OH2 vm. NOW, I’m still on stable. Would some of these rules fail if I was on stable?

I have upped the log level.

Edit: I watched when my 10 minute cron hit and I didn’t see anything new logged. Just a DEBUG from runtime.internal.engine.executerulejob that I have been getting all along. Is this to be expected?

Thanks!

bbubbat · March 15, 2017, 5:03pm

Okay, TRACE is up and running. As soon as I have a result, I will let you know.

bbubbat · March 15, 2017, 5:45pm

@kgoderis @Kai Right now no rule is working. Even though I get a trace output from my first cron rule

18:44:00.215 [TRACE] [cheduler.ExpressionThreadPoolManager] - Expression's '0 * * * * ?' next execution time is 2017-03-15 18:45:00.000
18:44:00.215 [TRACE] [cheduler.ExpressionThreadPoolManager] - Scheduling the task 'org.eclipse.smarthome.core.persistence.internal.PersistItemsJob@6a358e75' to execute in 59880 ms
18:44:00.216 [TRACE] [cheduler.ExpressionThreadPoolManager] - Task 'org.eclipse.smarthome.core.persistence.internal.PersistItemsJob@6a358e75' has now 1 Futures
18:44:00.216 [TRACE] [cheduler.ExpressionThreadPoolManager] - Putting the monitor thread 'ESH-persist-Monitor-1' to sleep for 59880 ms

So the scheduler seems to be working, but not the rules at all.

bbubbat · March 15, 2017, 5:51pm

I just added a new rule

rule "ping"
when
	Time cron "30 * * * * ?"
then
	logInfo("Ping", "Ping")
end

The output I got is

18:46:32.811 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'test.rules'
18:46:32.838 [DEBUG] [e.internal.engine.RuleTriggerManager] - Scheduled rule 'ping' with cron expression '30 * * * * ?'

That’s it. Nothing more. While the first cron is still running every minute this on is not.

Moxified · March 15, 2017, 5:54pm

I’m not getting ANY Trace entries… My rules are still working as of now.

bbubbat · March 15, 2017, 6:08pm

After restarting the docker container the rules seem to be up and working.

The weird thing is I do get the following output every minute:

19:06:00.402 [TRACE] [rthome.core.scheduler.CronExpression] - Expanded to 2 candidates
19:06:00.403 [TRACE] [rthome.core.scheduler.CronExpression] - Candidate 0 is 2017-03-15 19:06:00.000
19:06:00.403 [TRACE] [rthome.core.scheduler.CronExpression] - Candidate 1 is 2017-03-15 19:07:00.000
19:06:00.404 [TRACE] [rthome.core.scheduler.CronExpression] - Final candidate 0 is 2017-03-15 19:07:00.000
19:06:00.405 [TRACE] [cheduler.ExpressionThreadPoolManager] - Expression's '0 * * * * ?' next execution time is 2017-03-15 19:07:00.000
19:06:00.406 [TRACE] [cheduler.ExpressionThreadPoolManager] - Scheduling the task 'org.eclipse.smarthome.core.persistence.internal.PersistItemsJob@372f925a' to execute in 59833 ms
19:06:00.408 [TRACE] [cheduler.ExpressionThreadPoolManager] - Task 'org.eclipse.smarthome.core.persistence.internal.PersistItemsJob@372f925a' has now 1 Futures
19:06:00.409 [TRACE] [cheduler.ExpressionThreadPoolManager] - Putting the monitor thread 'ESH-persist-Monitor-1' to sleep for 59833 ms

which I guess is related to my first cron rule. But I don’t get any TRACE output from my ping cron rule. Just

19:07:30.004 [DEBUG] [ntime.internal.engine.ExecuteRuleJob] - Executing scheduled rule 'ping'
19:07:30.009 [INFO ] [.eclipse.smarthome.model.script.Ping] - Ping

But maybe I misinterpret the TRACE output.

steve1 · March 15, 2017, 6:12pm

I’d guess that lines related to persistence are for periodic item persistence and not related to your rule triggers.

bbubbat · March 15, 2017, 6:22pm

Oh, yes, you are right. I need to learn to interpret the output.

So why do I get the TRACE output of the persistence cron schedule but not from the other cron scheduler?