Scheduler Delayed // Openhab 3.0.2 and 3.1.0M5

Hi rossko57,

sorry for the delay, I did not any time in the last days. To answer your question: yes, realtime events show exactly the time expected. There is no problem.

In the meantime, I let the test rules run for a while. The result is a constant offset of around 50mins every 12h in case of the cron-based rule (test1) and the timer (rule2). In case of a sleep, the delay sums up (which makes sense, as you said the sleep will block the next execution.

Here are the logs:

2021-06-07 04:11:12.124 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model '_test_v33.rules'
2021-06-07 12:33:55.135 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-07T12:33:55.130275 This is triggered every 12h. Output directly created at (2021-06-07T12:33:55.134921).
2021-06-08 00:50:41.881 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-08T00:50:41.862912 This is triggered every 12h. Output directly created at (2021-06-08T00:50:41.879210).
2021-06-08 01:21:44.123 [INFO ] [org.openhab.core.model.script.TEST3 ] - 2021-06-07T12:33:55.131886 This is triggered every 12h. Output was created 12h -5mins ago by sleep (2021-06-08T01:21:44.119127).
2021-06-08 01:27:09.933 [INFO ] [org.openhab.core.model.script.TEST2 ] - 2021-06-07T12:33:55.131901This is triggered every 12h. Output was created 12h ago by timer (2021-06-08T01:27:09.932709).
2021-06-08 12:51:13.326 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-08T12:51:13.319895 This is triggered every 12h. Output directly created at (2021-06-08T12:51:13.324269).
2021-06-08 13:45:46.417 [INFO ] [org.openhab.core.model.script.TEST2 ] - 2021-06-08T00:50:41.878486This is triggered every 12h. Output was created 12h ago by timer (2021-06-08T13:45:46.413538).
2021-06-08 14:11:31.719 [INFO ] [org.openhab.core.model.script.TEST3 ] - 2021-06-08T01:21:44.134761 This is triggered every 12h. Output was created 12h -5mins ago by sleep (2021-06-08T14:11:31.716040).
2021-06-09 00:50:49.176 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-09T00:50:49.172903 This is triggered every 12h. Output directly created at (2021-06-09T00:50:49.175299).
2021-06-09 01:45:50.939 [INFO ] [org.openhab.core.model.script.TEST2 ] - 2021-06-08T12:51:13.320780This is triggered every 12h. Output was created 12h ago by timer (2021-06-09T01:45:50.936622).
2021-06-09 03:00:45.355 [INFO ] [org.openhab.core.model.script.TEST3 ] - 2021-06-08T14:11:31.727688 This is triggered every 12h. Output was created 12h -5mins ago by sleep (2021-06-09T03:00:45.352675).
2021-06-09 12:51:50.775 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-09T12:51:50.763997 This is triggered every 12h. Output directly created at (2021-06-09T12:51:50.771649).
2021-06-09 13:46:33.560 [INFO ] [org.openhab.core.model.script.TEST2 ] - 2021-06-09T00:50:49.172833This is triggered every 12h. Output was created 12h ago by timer (2021-06-09T13:46:33.559511).
2021-06-09 15:51:34.847 [INFO ] [org.openhab.core.model.script.TEST3 ] - 2021-06-09T03:00:45.367316 This is triggered every 12h. Output was created 12h -5mins ago by sleep (2021-06-09T15:51:34.846909).
2021-06-10 00:52:28.921 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-10T00:52:28.909731 This is triggered every 12h. Output directly created at (2021-06-10T00:52:28.918762).
2021-06-10 01:48:33.919 [INFO ] [org.openhab.core.model.script.TEST2 ] - 2021-06-09T12:51:50.767935This is triggered every 12h. Output was created 12h ago by timer (2021-06-10T01:48:33.916571).
2021-06-10 04:43:01.009 [INFO ] [org.openhab.core.model.script.TEST3 ] - 2021-06-09T15:51:34.850310 This is triggered every 12h. Output was created 12h -5mins ago by sleep (2021-06-10T04:43:01.006730).
2021-06-10 12:54:40.906 [INFO ] [org.openhab.core.model.script.TEST1 ] - 2021-06-10T12:54:40.895853 This is triggered every 12h. Output directly created at (2021-06-10T12:54:40.901546).
2021-06-10 13:51:21.483 [INFO ] [org.openhab.core.model.script.TEST2 ] - 2021-06-10T00:52:28.917020This is triggered every 12h. Output was created 12h ago by timer (2021-06-10T13:51:21.482107).

Hmm, I am running out of ideas… Could it be that other rules cause side effects when they block etc…?

That shouldn’t be the case in OH 3. Each rule has it’s own thread so anything short of your CPU, memory, or file IO causing a general slowdown on the machine for everything, which doesn’t seem to be the case here, other rules shouldn’t be able to block scheduled rules. And if that were the case, I wouldn’t expect the delay to be consistent.

At this point the best I have to offer is to file an issue on openhab-core and hopefully someone familiar with the code might have more ideas.

Let’s concentrate on the timer-creating TEST2 rule results

So the first timestamp represents the “now” apparent to the rule when triggered by cron

2021-06-07T12:33:55.131901
period 12h 17m
2021-06-08T00:50:41.878486
period 12h 1m
2021-06-08T12:51:13.320780
period 11h 59m
2021-06-09T00:50:49.172833
period 12h 1m
2021-06-09T12:51:50.767935
period 12h 1m
2021-06-10T00:52:28.917020

So after the first disaster, this looks like acceptable if poor performance.
However, I’m not convinced this is the right way to look at it.
What we asked for was second zero, minute zero, when hour = 00 or 12.
i.e. based on clock reading, not elapsed time.
In other words we’re consistently around 50 minutes late.

Second timestamp from createTimer now+12

2021-06-07T12:33:55.131901 2021-06-08T01:27:09.932709
12hr 32min
2021-06-08T00:50:41.878486 2021-06-08T13:45:46.413538
12hr 55min
2021-06-08T12:51:13.320780 2021-06-09T01:45:50.936622
12hr 55min
2021-06-09T00:50:49.172833 2021-06-09T13:46:33.559511
12hr 55min
2021-06-09T12:51:50.767935 2021-06-10T01:48:33.916571
12hr 55min
2021-06-10T00:52:28.917020 2021-06-10T13:51:21.482107.
12hr 51min

This is rubbish.
Note that we’re not really requesting a ‘duration’, with createTimer we set a future date/time.

There was previous weirdness with double-triggering of cron (it fired a few seconds early, so recheduled “next” just a few seconds ahead). I think the root of that was Pi with no RTC and poor Java substitute timekeeping.

The mitigations implemented for that look like they ought to take care of this new problem too, and should be in snapshots or milestones since March. Really sure you’re running 3.1.0M5?

@hilbrand

Hello rossko57,

thanks for checking the delays. I share you view: we ask for clock reading, so we are always late - not only one time. I am completely sure that I am using 3.1.0M5, I changed the repo accordingly and doubled checked in the UI >> “About & Help” (Über & Hilfe in German).

Where - according to my understating - do not completely agree is that the fix for earlier triggers #1976 should help. As far as I read what @hilbrand wrote, I would expect:

  1. If the scheduler triggers > 2 sec before expected end time it will reschedule.

That will not help, as my triggers are always late.

  1. The scheduler keeps track of the timestamp the job should run. For recurring schedulers, like cron, when the next job is run it offsets of the calculated time, and not the actual time. This guaranties that the next scheduled time is actual the next scheduled time and not the same end time in case the scheduler would trigger to early.

The fact that cron sets an offset to the calculated time helps to adjust a “one-time” difference in case the last event fires too early / late. However, when the clock runs slow, we should observe the same offset again and again. This is what I observe.

As rlkoshak said, I am quite sure that it must be something in my setup. Otherwise the forum would be full of reports describing that issue.

But I don’t know how to figure out: system CPU utilization is low (less than 10% as long as I don’t deploy large rules), I tried to assign more memory (EXTRA_JAVA_OPTS=“-Xms1024m -Xmx2560m”, real mom usage is around 1.4G/4G, no swap happens).

But you keep showing us your clock does not run slow. If your clock ran slow, the event would fire at an logged 12:00, while it was 12:50 in the real world.

It is curious. You’d expect a calculation error to be more consistent. It’s as though the scheduler runs its own clock (and something else steals cycles from that).

Hi rossko57,

I am confused. I think (/hope :grinning:)we mean the same but from a different perspective.

When it is 12hs until the rule should be executed and it takes 12hs and 50mins, an average second in my openhab environment is longer than in real-time. The clock in the environment runs slower - I agree, something seems to steal it cycles.

I don’t believe a calculation error. Could be of course (as I have no evidence for one or the other) but the growing offset looks for me rather like the loss of cycles.

Key question is now how to debug. Here, I struggle. I tried downgrades down to 3.0.0-1 today but I observed the same behavior. :cry:

So when you trigger some external event, it gets logged as though it happened at an earlier time than reality? And as real time progresses, the lag would get greater and greater. That would be a slow clock.

HI guys,

I applied a little plain java test program today to check decency of my problem on obenhab code:

import java.util.concurrent.TimeUnit;

public class Timetest {
    public static void main(String args[]) {
        long millis=System.currentTimeMillis();
        java.util.Date date = new java.util.Date(millis);

        System.out.println("Current Time is: " + date);
        for (int i = 0; i < 10; i++) {
            try {
                TimeUnit.SECONDS.sleep(60*60*12);
                millis = System.currentTimeMillis();
                date = new java.util.Date(millis);
                System.out.println("Run " + i + ": Current Time is :" + date);
            } catch (Exception e) {
                System.out.println("Error.");
            }
        }
    }
}

The result is the following output:

Current Time is: Tue Jun 15 23:07:05 CEST 2021
Run 0: Current Time is :Wed Jun 16 11:58:38 CEST 2021

That means my problem is indeed completely independent of openhab but somewhere in between Debian or Java. I will continue to search for the root cause.

Anyway, thanks for your ideas and time!

Best, Lui

The puzzler is why, if say the clock ticks slow so that scheduled events are “late” … why would the system clock not read slow as well.

A quick internet dive threw up some stuff beyond my pay grade, but suggesting that Java scheduler does some jiggery-pokery if the system clock is changed. This jiggery-pokery is flawed apparently, and errors can accrue.
Have you got NTP configured somewhere here, and if so how frequently is it tweaking system clock? It might work better to tweak less often.

Hi rossko57,

I think you post gave me the relevant hint! In order to check whether it works better when I “tweak less often” via NTP, I deactivated NTP entirely. The result of that is that after an hour, my system clock runs 4min late - a drift similar to what I observed in java.

Now it becomes logical: whenever a time is requested, openhab takes the system time from java which takes it from the NPN synced time of the system. Thus, timestamps are precise.
However, when the system sets a timer, waits, etc, the system clock is drifting, which causes the delay also in openhab/java.

This is leading to the question: why is the time of my PI running that imprecise? The normal drift of a pi seems to be around a few seconds per day - not hours.

That’s pants, about as accurate as drumming your fingers on the table :crazy_face: Like, 67,000ppm instead of expected 200ppm or so

The java jiggery-pokery I touched on is supposed to tweak scheduled tasks to keep in step with clock adjustments, but I would imagine it is deliberately limited to less drastic tweaks than would be needed to keep pace with that.

Think you might be closing in on a hardware problem, of all things.

To be precise, it is still a software issue. The pi has no hardware clock :joy:.

Anyway, thanks for your help, rossko57!

The Pi lacks a hardware real time clock to keep independent track of time of day. It has of course a hardware oscillator or clock to drive the chips, and system software can derive timekeeping from that. On Pi3/4 the chip clock is variable to control temperature (and add confusion) but its crystal regulated which should give around 100ppm accuracy. I believe the crystal works as a reference, to trim the master oscillator. Broke crystal, master oscillator runs significantly off-spec.

Thanks for the details, I am not that deep in building PC clocks. I will organize another Pi today and give an update today or tomorrow.

I am running out of ideas: I used another Pi with the same SD card and have exactly the same drift. It seems to be some kind of configuration but have no clue what. :cry:

Don’t rule out the potential of the card itself … weird stuff happens at low level storage access.

Try an openhabian setup for a standard environment.

I now have something different in mind. I still have an old line in my boot config that reduced the CPU speed dynamically:

arm_freq_min=200

I think that is not supported since a while anymore and somehow I don’t have a good feeling with it. Trying now…

I think the real thing runs at 250 or 500. Clock time is derived from this, in complicated fashion, so yes a source of timekeeping error.

Jupp, that’s the error. Now it works. Long process but solved. :ok_hand:

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.