Cron triggered rules are executed twice

Tags: #<Tag:0x00007f7454d9f380> #<Tag:0x00007f7454d9f178>

Disappointing log really; it doesn’t mention anything about executing your rule or scheduling the next run.
I expected at least to see Calling execute on "some rule" … the lack of that is odd.

TRACE logging level would give us more, but I think perhaps you have a busy system and this extra work might throw more side effects. I note there is some jumbling of order in the log snippet, suggesting a fair bit of traffic from time to time at least.

However, poking around in old similar github issues, quartz may be the wrong place to look. It’s being the “clock” but something else does the scheduling. All terribly complicated, lists of candidates for scheduling, beyond me.

That used to be org.eclipse.smarthome.core.scheduler - but I’m guessing that is renamed in 2.5M2?? Need some proper developer advice here I think.
@kgoderis ?

I will try trace logging, system load should not be a big issue.

Today I created a DSL rules file with a cron trigger (all my other rules are Jython), so far this trigger works as expected, so this might be Jython and not a openHAB core problem. I will monitor this in the next few days.

There’s nothing specific to Jython when it comes to the trigger in your rule, so JavaScript, Groovy, UI based rules, etc. would behave the same. This could be an issue in the NGRE, but I have a lot of cron based rules and have never observed this.

1 Like

I activated the log level “TRACE” for Quartz, but that does not give any additional information.

I expected at least to see Calling execute on "some rule" … the lack of that is odd.

In addition I added the same cron trigger, that shows the problem, as a trigger for a DSL rule. In the following log the message “Calling execute on…” is printed for the DSL rule, but not for the Jython rule.

The DSL rule is executed only once, the Jython rule twice with the usual 5s delay between the function calls (these are the lines with “test cron trigger 1…”):

2019-08-26 18:07:32.285 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:32.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-26 18:07:32.319 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:37.149 [INFO ] [] - Enocean I humidity sensor communication OK!
2019-08-26 18:07:37.285 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction
2019-08-26 18:07:37.285 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:37.318 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:37.568 [INFO ] [] - test cron trigger 1 Jython was triggered 
2019-08-26 18:07:41.001 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 1 triggers
2019-08-26 18:07:41.002 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job DEFAULT.test_cron_dsl.rules#Test cron trigger 1 - DSL#41 7 0/2 * * ?
2019-08-26 18:07:41.006 [INFO ] [smarthome.model.script.test_cron_dsl] - test cron trigger 1 DSL was triggered 
2019-08-26 18:07:41.568 [INFO ] [] - test cron trigger 1 Jython was triggered 
2019-08-26 18:07:41.923 [WARN ] [] - Enocean I humidity sensor communication failure!
2019-08-26 18:07:42.286 [DEBUG] [rg.quartz.core.QuartzSchedulerThread] - batch acquisition of 0 triggers
2019-08-26 18:07:42.286 [DEBUG] [org.quartz.core.JobRunShell         ] - Calling execute on job MapDB_SchedulerGroup.Commit_Transaction

Okay, makes sense. Does however suggest that quartz is treating the jobs differently, which is a surprise.

I think you need the debug for org.eclipse.smarthome.core.scheduler - or whatever it might be called now - to see the scheduling activity.

5 seconds early is so damn weird, good job it seems to be repeatable! I’d lay money it is the 16:07:41 event that messes up, but cannot see any magic about that time.

1 Like

Thanks for the hint, I will do some more testing when I am back home!

1 Like

Yesterday I updated to 2.5M3 und initially had the impression, that this solved the issue, as the error did not occur at the first 18:07 trigger after the update. But today the rule in question was again triggered twice at 18:07.

I tried to set the log level to TRACE, but did not succeed in getting more information, probably because I used the wrong names (I tried “org.eclipse.smarthome.core.internal.scheduler”, “org.eclipse.smarthome.core.internal.scheduler”, “org.openhab.core.scheduler”, “org.eclipse.smarthome.core.scheduler” and “org.openhab.core.internal.scheduler”)

For the time being I will implement a workaround, that checks the last time a rule was triggered.

I think you should raise a Github issue for this, as it is repeatable. Need some expert advice on how/what to trace.

I would open an issue, but I am still not sure in which repository. The bug occurs only with Jython rules, but after a quick lock in the Jython library sources I am quite sure, that it is not in these sources.

Any advice would be appreciated!

It would be an issue for openhab-core, but if you’ll wait, I’ll take a look. I have not been able to reproduce this. I am wrapping up some major releases and should be done by this weekend.

1 Like

Thanks for your effort! If you need further details please let me know, especially because not all of my cron triggers are showing this behavior.

As a workaround I have written a small function, that checks the time since the rule was triggered last, so that is nor very pressing for me.

Thanks Scott for all you contribute!

1 Like

DSL uses quartz, jython uses new rule engine Cron trigger. Based on your examples it looks like the step part in the Cron expression could have a problem. Because all the ones with the step have the problem, the other not. I did some tests and from that it looks the calculation of the Cron expression is correct. Also for the steps. This seems also logical given the behavior. Because how it works is. When a Cron task is finished the next iteration is scheduled. It does this by calculating in how many seconds to schedule the next call. For some reason either this calculation goes wrong and it’s scheduled to early or the scheduler itself does something wrong. So because of this the task is run to earlier. But then it calculates the next task and that will be the time it should have actually run. Which results in the task run twice. The core schedule part doesn’t contain much logging options, so this doesn’t give us any options there. It could be the number of threads available for the scheduler is to low in this case. But this is just a wild guess. So one option might be to increase the number of threads. I don’t know out of my head where to configure this, but it might be worth a try. I’ll also will have a second look at the scheduler code itself, to see if I can find anything suspicious.

1 Like

Is there a possibility that your system clock is being updated? What hardware are you using?

I am using a PC Engines APU4 Board with the AMD GX-412TC CPU and Debian 10, with all updates installed. The system has been rebootet several times after I first noticed the bug

The system time is synced via a ntp server. I thought about that, but it seems very unlikly, that this causes the same step of several seconds every day at the same time.

The error occured after updating from 2.4 release to 2.5M2, bevore that it never occured.

I moved my code over to Jython and have this problem as well. I have rules set to run once a day at midnight, once a week on Monday at midnight, and once a month on the first at midnight.

They are all triggering roughly 15 to 30 seconds before midnight and then triggering again at midnight.

I have created a script to ignore the first one if it is not more than 12 hours away from midnight, and this has allowed me to use the correct one. But it is very odd that this is needed when it was not before I moved away from DSL to Jython.

from datetime import datetime

def MillisecondsUntilMidnight():
    n =
    return 86400000 - ((n.hour * 3600000) + (n.minute * 60000) + (n.second * 1000) + (n.microsecond / 1000))

@rule("Daily Cron")
@when("Time cron 0 0 0 ? * * *")
def DailyCron(event):
    if MillisecondsUntilMidnight() < 43200000: return
    # do something here

I am running it on a QNAP TS-251+ with 8GB RAM:

I am using openhab 2.4.

I very much like coding in Jython though. It is so very much better than coding in Xtend.

That does seem to be another confirmation that something is up with NGRE scheduler.

It might be helpful if you can narrow down more precisely the range of the “earliness” trigger. @juelicher is getting a fairly consistent 5 seconds - if you get something of a different order, it could be a clue.

From what @hilbrand says, the new scheduler is completely different, which I did not realize.

I have been logging the times when they execute, so that shouldn’t be an issue for me to get the times. The earliness does not seem to be consistent for me though. Although I suspect the longer interval schedules may trigger earlier then those with smaller intervals. But I’ll have to look at them again to be sure.

When I get around to it, I’ll post the times that they have been triggering.

Here are the numbers that I have gathered so far.

Daily: Time cron 0 0 0 ? * * *
2019-09-29 23:59:43.547
2019-09-30 00:00:00.017
2019-09-30 23:59:24.621
2019-10-01 00:00:00.014
2019-10-01 23:59:43.731
2019-10-02 00:00:00.011
2019-10-02 23:59:24.621
2019-10-03 00:00:00.109

Weekly: Time cron 0 0 0 ? * MON *
2019-09-29 23:59:43.551
2019-09-30 00:00:00.015

Monthly: Time cron 0 0 0 1 * ?
2019-09-30 23:59:08.145
2019-10-01 00:00:00.020

That is 16, 35, 16 and 16 seconds early for daily, 16 seconds early for weekly, and over 51 seconds early for monthly. The time stamps I am taking look like they are accurate within 200 milliseconds.

It never seems to not run early, at least with the three triggers I have set up.

1 Like

I’d recommend opening a github issue, you’ve got clear data and it seems repeatable.

1 Like