Cron triggered rules are executed twice

Tags: #<Tag:0x00007f2fbe57ae00> #<Tag:0x00007f2fbe41be10>

Hi,

today I created the following minimalistic test file (most imports are not needed, but for a first test I wanted to duplicate the imports from the files the error occurred). After that, I restarted openHAB completly:

from core.rules import rule
from core.triggers import when, CronTrigger
from org.eclipse.smarthome.core.types import UnDefType
from org.joda.time import DateTime
from org.slf4j import Logger, LoggerFactory
import math
from time import time

#import personal.mylib
#reload (personal.mylib)
#from personal.mylib import logTestVariable 

from personal.mylib import checkPostUpdate, checkSendCommand, checkSendCommandLog
from personal.mylib import getChangedSince, getHistoricState, getMinimumSince, getMinimumTimeSince, getMaximumSince, getMaximumTimeSince, getAverageSince
from personal.mylib import notifyMe


NULL  = UnDefType.NULL
UNDEF = UnDefType.UNDEF


log = LoggerFactory.getLogger("org.eclipse.smarthome.automation.measurement_rules.py")



@rule("Test cron trigger 1 - Jython")
@when("Time cron 41 7 0/2 * * ?") 
def testCronTrigger1(event):
    log.info("test cron trigger 1 was triggered ")


@rule("Test cron trigger 2 - Jython")
@when("Time cron 25 14 0/2 * * ?") 
def testCronTrigger1(event):
    log.info("test cron trigger 2 was triggered ")


@rule("Test cron trigger 3 - Jython")
@when("Time cron 30 21 0/1 * * ?") 
def testCronTrigger1(event):
    log.info("test cron trigger 3 was triggered ")

This resulted in the following log output:

Trigger 1:

2019-08-22 14:07:41.171 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 was triggered 
2019-08-22 16:07:41.170 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 was triggered 
2019-08-22 18:07:36.131 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 was triggered 
2019-08-22 18:07:41.132 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 was triggered 
2019-08-22 20:07:41.132 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 was triggered 

Trigger 2:

2019-08-22 14:14:25.197 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 2 was triggered 
2019-08-22 16:14:25.197 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 2 was triggered 
2019-08-22 18:14:20.157 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 2 was triggered 
2019-08-22 18:14:25.157 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 2 was triggered 
2019-08-22 20:14:25.157 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 2 was triggered 

Trigger 3:

2019-08-22 14:21:30.219 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 15:21:30.219 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 16:21:30.219 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 17:21:25.179 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 17:21:30.180 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 18:21:30.181 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 19:21:30.181 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 
2019-08-22 20:21:30.181 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 3 was triggered 

So, in the time the rules where active until now, each of the rules was triggered twice once.

It’s very hard to imagine what might (mis)fire a consistent five seconds before the appointed time, but it must be a clue.

I’d cut down to one test job and enable quartz debug logging, see if the setup is weird.
I’ll place a bet it’s when the next due time is calculated at execution time, it plants a five-second-short in error, then when that arrives calculates the next due correctly (i.e. 5 secs later)

I did a “log:set DEBUG org.quartz.core”, but no hint regarding an error in the logs… I assume, that is, what you meant by “enable quartz debug logging”?!

Interestingly the error does not occur on every trigger.

I wasn’t really expecting it to announce an error, you’d see that anyway. It doesn’t know it’s messed up.
If those logs show you the rescheduling activity at each execution, then I’d think you have enough to open a github issue.

This is an excerpt from the log at the time the 18:07 cron job was triggered twice (the messages regarding Enocean are from the triggered rule).

2019-08-24 18:07:27.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-24 18:07:27.286 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-24 18:07:32.285 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 0 triggers
2019-08-24 18:07:32.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-24 18:07:32.287 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-24 18:07:33.184 [INFO ] [smarthome.automation.system_rules.py] - Received KNX Watchdog reply
2019-08-24 18:07:37.247 [INFO ] [home.automation.measurement_rules.py] - Enocean I humidity sensor communication OK!
2019-08-24 18:07:37.286 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-24 18:07:37.286 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 0 triggers
2019-08-24 18:07:37.318 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-24 18:07:41.433 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 was triggered 
2019-08-24 18:07:42.050 [WARN ] [home.automation.measurement_rules.py] - Enocean I humidity sensor communication failure!
2019-08-24 18:07:42.285 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 0 triggers
2019-08-24 18:07:42.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-24 18:07:42.319 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-24 18:07:47.286 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 0 triggers
2019-08-24 18:07:47.286 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction

Disappointing log really; it doesn’t mention anything about executing your rule or scheduling the next run.
I expected at least to see Calling execute on "some rule" … the lack of that is odd.

TRACE logging level would give us more, but I think perhaps you have a busy system and this extra work might throw more side effects. I note there is some jumbling of order in the log snippet, suggesting a fair bit of traffic from time to time at least.

However, poking around in old similar github issues, quartz may be the wrong place to look. It’s being the “clock” but something else does the scheduling. All terribly complicated, lists of candidates for scheduling, beyond me.

That used to be org.eclipse.smarthome.core.scheduler - but I’m guessing that is renamed in 2.5M2?? Need some proper developer advice here I think.
@kgoderis ?

I will try trace logging, system load should not be a big issue.

Today I created a DSL rules file with a cron trigger (all my other rules are Jython), so far this trigger works as expected, so this might be Jython and not a openHAB core problem. I will monitor this in the next few days.

There’s nothing specific to Jython when it comes to the trigger in your rule, so JavaScript, Groovy, UI based rules, etc. would behave the same. This could be an issue in the NGRE, but I have a lot of cron based rules and have never observed this.

1 Like

I activated the log level “TRACE” for Quartz, but that does not give any additional information.

I expected at least to see Calling execute on "some rule" … the lack of that is odd.

In addition I added the same cron trigger, that shows the problem, as a trigger for a DSL rule. In the following log the message “Calling execute on…” is printed for the DSL rule, but not for the Jython rule.

The DSL rule is executed only once, the Jython rule twice with the usual 5s delay between the function calls (these are the lines with “test cron trigger 1…”):

2019-08-26 18:07:32.285 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:32.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-26 18:07:32.319 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:37.149 [INFO ] [home.automation.measurement_rules.py] - Enocean I humidity sensor communication OK!
2019-08-26 18:07:37.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-26 18:07:37.285 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:37.318 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:37.568 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 Jython was triggered 
2019-08-26 18:07:41.001 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:41.002 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job DEFAULT.test_cron_dsl.rules#Test cron trigger 1 - DSL#41 7 0/2 * * ?
2019-08-26 18:07:41.006 [INFO ] [smarthome.model.script.test_cron_dsl] - test cron trigger 1 DSL was triggered 
2019-08-26 18:07:41.568 [INFO ] [home.automation.measurement_rules.py] - test cron trigger 1 Jython was triggered 
2019-08-26 18:07:41.923 [WARN ] [home.automation.measurement_rules.py] - Enocean I humidity sensor communication failure!
2019-08-26 18:07:42.286 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 0 triggers
2019-08-26 18:07:42.286 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction

Okay, makes sense. Does however suggest that quartz is treating the jobs differently, which is a surprise.

I think you need the debug for org.eclipse.smarthome.core.scheduler - or whatever it might be called now - to see the scheduling activity.

5 seconds early is so damn weird, good job it seems to be repeatable! I’d lay money it is the 16:07:41 event that messes up, but cannot see any magic about that time.

1 Like

Thanks for the hint, I will do some more testing when I am back home!

1 Like

Yesterday I updated to 2.5M3 und initially had the impression, that this solved the issue, as the error did not occur at the first 18:07 trigger after the update. But today the rule in question was again triggered twice at 18:07.

I tried to set the log level to TRACE, but did not succeed in getting more information, probably because I used the wrong names (I tried “org.eclipse.smarthome.core.internal.scheduler”, “org.eclipse.smarthome.core.internal.scheduler”, “org.openhab.core.scheduler”, “org.eclipse.smarthome.core.scheduler” and “org.openhab.core.internal.scheduler”)

For the time being I will implement a workaround, that checks the last time a rule was triggered.

I think you should raise a Github issue for this, as it is repeatable. Need some expert advice on how/what to trace.

I would open an issue, but I am still not sure in which repository. The bug occurs only with Jython rules, but after a quick lock in the Jython library sources I am quite sure, that it is not in these sources.

Any advice would be appreciated!

It would be an issue for openhab-core, but if you’ll wait, I’ll take a look. I have not been able to reproduce this. I am wrapping up some major releases and should be done by this weekend.

1 Like

Thanks for your effort! If you need further details please let me know, especially because not all of my cron triggers are showing this behavior.

As a workaround I have written a small function, that checks the time since the rule was triggered last, so that is nor very pressing for me.

Thanks Scott for all you contribute!

1 Like

DSL uses quartz, jython uses new rule engine Cron trigger. Based on your examples it looks like the step part in the Cron expression could have a problem. Because all the ones with the step have the problem, the other not. I did some tests and from that it looks the calculation of the Cron expression is correct. Also for the steps. This seems also logical given the behavior. Because how it works is. When a Cron task is finished the next iteration is scheduled. It does this by calculating in how many seconds to schedule the next call. For some reason either this calculation goes wrong and it’s scheduled to early or the scheduler itself does something wrong. So because of this the task is run to earlier. But then it calculates the next task and that will be the time it should have actually run. Which results in the task run twice. The core schedule part doesn’t contain much logging options, so this doesn’t give us any options there. It could be the number of threads available for the scheduler is to low in this case. But this is just a wild guess. So one option might be to increase the number of threads. I don’t know out of my head where to configure this, but it might be worth a try. I’ll also will have a second look at the scheduler code itself, to see if I can find anything suspicious.

Is there a possibility that your system clock is being updated? What hardware are you using?

I am using a PC Engines APU4 Board with the AMD GX-412TC CPU and Debian 10, with all updates installed. The system has been rebootet several times after I first noticed the bug

The system time is synced via a ntp server. I thought about that, but it seems very unlikly, that this causes the same step of several seconds every day at the same time.

The error occured after updating from 2.4 release to 2.5M2, bevore that it never occured.