massive time drift in scripts

Hi everyone,

I recognized that all my scripts that have a time trigger run with a huge delay. But first some words about my setup:

  • Windows 11
  • WSL2
  • Docker Desktop
  • openHAB 4.2.3

OpenHAB runs within a docker container on WSL2; Host system is Windows 11.

The following script shall be executed at 6:00, but according to the logs it was executed at about 7:30

rules.JSRule({
  name: "Weihnachtsbeleuchtung morgens anschalten",
  triggers: [triggers.TimeOfDayTrigger("6:00"))],
  execute: (event) => {
    console.info(`Rule 'Weihnachtsbeleuchtung morgens anschalten' triggered at ${time.ZonedDateTime.now()}`);
    items.Aussen_Hof_Steckdose.sendCommand("ON")
    items.Aussen_Shelly_1PM_Carport_Steckdosen.sendCommand("ON")
    items.Aussen_Shelly_1PM_Terrasse_Theke.sendCommand("ON")
  }
});

At first I checked the time of the host system (correct), as well as the time within WSL2 (also correct), and within the openHAB docker contain (also correct).
To further investigate this I created multiple scripts in the following way (the first runs at 00:00, the next at 00:05, then 00:10 and so on)

for (let h = 0; h < 24; h++) {
  for (let m = 0; m < 60; m += 5) {
    const hour = h < 10 ? `0${h}` : `${h}`
    const minute = m < 10 ? `0${m}` : `${m}`
    const triggerTime = `${hour}:${minute}`
    const name = `Time test ${triggerTime}`

    rules.JSRule({
      name: name,
      triggers: [triggers.TimeOfDayTrigger(triggerTime)],
      execute: () => {
        console.info(`Rule ${name} triggered at ${time.ZonedDateTime.now()}.`);
      }
    });
  }
}

This leads to the following output:

2024-12-12 09:20:08.756 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:20 triggered at 2024-12-12T09:20:00.894+01:00[SYSTEM].
2024-12-12 09:25:24.730 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:25 triggered at 2024-12-12T09:25:00.897+01:00[SYSTEM].
2024-12-12 09:30:40.912 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:30 triggered at 2024-12-12T09:30:00.898+01:00[SYSTEM].
2024-12-12 09:35:57.320 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:35 triggered at 2024-12-12T09:35:00.901+01:00[SYSTEM].
2024-12-12 09:41:13.777 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:40 triggered at 2024-12-12T09:40:00.903+01:00[SYSTEM].
2024-12-12 09:46:30.443 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:45 triggered at 2024-12-12T09:45:00.905+01:00[SYSTEM].
2024-12-12 09:51:47.318 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:50 triggered at 2024-12-12T09:50:00.907+01:00[SYSTEM].
2024-12-12 09:57:04.557 [INFO ] [.automation.script.file.time_test.js] - Rule Time test 09:55 triggered at 2024-12-12T09:55:00.907+01:00[SYSTEM].

As you can see the first trigger (at 9:20) is more or less in sync with the log time (8s delay), the delay increases with each execution. Interestingly the internal “time.ZonedDateTime.now)” also drifts away. This effect can also be seen using this script:

rules.JSRule({
  name: "Timer test",
  triggers: [triggers.SystemStartlevelTrigger(100)],
  execute: () => {
    setInterval(() => {
      console.log(`Now: ${time.ZonedDateTime.now()}`)
    }, 60 * 1000)
  }
});

Each time the “now” gets printed it is more and more behind the timestamp in the logs:

2024-12-12 09:20:12.913 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:20:05.051+01:00[SYSTEM]
2024-12-12 09:21:12.926 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:21:01.903+01:00[SYSTEM]
2024-12-12 09:22:12.935 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:21:58.742+01:00[SYSTEM]
2024-12-12 09:23:12.936 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:22:55.572+01:00[SYSTEM]
2024-12-12 09:24:12.938 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:23:52.401+01:00[SYSTEM]
2024-12-12 09:25:13.059 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:24:49.227+01:00[SYSTEM]
2024-12-12 09:26:12.927 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:25:45.931+01:00[SYSTEM]
2024-12-12 09:27:13.097 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:26:42.769+01:00[SYSTEM]
2024-12-12 09:28:12.927 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:27:39.436+01:00[SYSTEM]
2024-12-12 09:29:13.113 [INFO ] [automation.script.file.timer_test.js] - Now: 2024-12-12T09:28:36.274+01:00[SYSTEM]

It seems like openHAB uses its own clock (used for triggering scripts and time constants) that is far behind the actual time (the timestamps in the logs look good to me).

Any ideas on this?

See Linux time ok, Openhab time not

maybe something related to my past issue: https://github.com/openhab/openhab-js/issues/299#issuecomment-1914807691

How are the resources on the machine? It’s been awhile I’ve messed much with WSL, does the Linux container use swap? htop should tell you. The biggest thing to look for is whether your system load is > 1.

image

That number represents the number of processes that are waiting around for some resource instead of running. When the load is high, it means processes that want to run are stuck waiting for soemthing (file IO, pages being retrieved from swap, network, etc). If your load is high, that could lead to timing problems like this.

But ultimately threre are two clocks involved:

  • OS clock
  • Java clock

Everything running on OH from the logs to the rules use the Java clock.

Please do not create duplicate posts. I’ve merged the two.

This is the result of using “top” within the container (htop is not present in the standard openHAB image)

grafik

I see no problems regarding load or swapping.

Are you sure that OH and the logs (log4 ???) are using exactly the same clock? So where does this huge difference come from?

And sorry for the duplicated post, there went something wrong with the first post. May be related to my account first needed to be verified.

Thanks for your help!

Yes. Though in JS time.ZonedDateTime comes from the joda-js library. But even there it gets the clock from the same source.

:man_shrugging: That’s what we are trying to figure out.

To determine if there is a difference between joda-js and Java let’s see what Java’s ZDT has to say.

    var JavaZDT = Java.type("java.time.ZonedDateTime");
    console.info("JODA: " + time.toZDT() + " Java: " + JavaZDT.now());

Note: when not passed anything time.soZDT() returns now.

I will say that running in Docker on top of WSL2 on top of Windows is a whole lot of extra layers which we can’t eliminate as a possible source of problems too.

I tested you script and executed it multiple times, this is the result:

2024-12-12 17:10:00.023 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:00.021+01:00[SYSTEM] Java: 1734019800023
2024-12-12 17:10:11.085 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:11.083+01:00[SYSTEM] Java: 1734019811084
2024-12-12 17:10:12.714 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:12.713+01:00[SYSTEM] Java: 1734019812714
2024-12-12 17:10:16.171 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:16.170+01:00[SYSTEM] Java: 1734019816171
2024-12-12 17:10:19.061 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:19.060+01:00[SYSTEM] Java: 1734019819061
2024-12-12 17:10:41.203 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:38.287+01:00[SYSTEM] Java: 1734019841203
2024-12-12 17:10:45.395 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:10:42.480+01:00[SYSTEM] Java: 1734019845395
2024-12-12 17:11:13.854 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:11:08.006+01:00[SYSTEM] Java: 1734019873854
2024-12-12 17:11:15.560 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 2024-12-12T17:11:09.712+01:00[SYSTEM] Java: 1734019875559

As you can see the JODA time starts to drift away. I converted the last java timestamp into a readable form, it equals the log time (17:11:15).

Of course the multiple layers (Windows, WSL, Docker, openHAB) may be the source of the problem. But the basic setup worked fine for a long time, the problems came up after switching to a new host. So this might be related to other hardware or a newer version of docker. But the system time within the container is correct, so I assume there is a syncing problem within openHAB.
I found some old threads (like this) that speak of a syncing mechanism from the system clock to the clock used by OH. Is this true?

Hmmm, its weird that the Java logged out epoch instead of a formatted date time string. Explicitely call toString on it maybe.

 console.info("JODA: " + time.toZDT() + " Java: " + JavaZDT.now().toString());

We need to see the same output from each to easily compare them. Ideally I’d convert both to a loacl time to make it even easier. And it occurs to me that as written we might introduce unexpected delays that will change the results.

All I can suggest is to rerun logging out a formatted JavaZDT time instead of epoch and file an issue on the jsscripting add-on (openhab-addons repo)…

If you want to eliminate WSL2 or Docker as a potential source for the issue you can try running OH just on Windows alone or running it just on WSL alone without Docker.

It will be easier to compare the times with the following:

var JavaZDT = Java.type("java.time.ZonedDateTime");
var javaNow = JavaZDT.now().toLocalTime().toString();
var jodaNow = time.toZDT().toLocalTime().toString();

console.info("JODA: " + jodaNow + " Java: " + javaNow);

This doesn’t apply to your machine. A RPi doesn’t have a real time clock meaning when the machine is off it doesn’t keep time. It doesn’t know what the time is until it hits an NTP server to get the current time. You are on a machine with a real time clock and, as you have shown, the OS and Linux subsystem all have the correct times anyway.

The whole system follows the same clock pulses. What’s different is how each part of the system is starting from. In that thread, what happens is OH comes up before the OS gets the correct current time from NTP. The OS gets the correct time but OH has already started and doesn’t get the new starting point. For example, if the OS time says it’s 12:00:00 when OH starts up but later sees that it’s really 5 minutes slow and it was really 12:05:00, OH will still continue on being 5 minutes off the real time even after the OS’s clock is corrected.

But one second in OH would still be one second. It wouldn’t drift around like you are seeing here.

I updated the script to you proposal, after manually executing it multiple times this is the result:

2024-12-12 19:44:38.807 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 19:44:38.803 Java: 19:44:38.803561226
2024-12-12 19:45:11.837 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 19:45:08.598 Java: 19:45:11.836736995
2024-12-12 19:45:34.058 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 19:45:27.811 Java: 19:45:34.057953814
2024-12-12 19:45:50.495 [INFO ] [nhab.automation.script.ui.5e15a0112c] - JODA: 19:45:44.248 Java: 19:45:50.494897666

Note the first execution which is more or less in sync. But with each following execution the joda time is more behind.

I doubt that this is related to the jsscripting add-on. Initially my rules used the old java DSL syntax, these rules were delayed. After that I began investigating this issue using JS rules.

now in Rules DSL is exactly the same as using JavaZDT in this rule. It’s the exact same class and it’s core Java. And these logs are showing that the Java ZDT class doesn’t have the problem.

It’s the joda date time that’s getting off and that’s in the JS Scripting add-on. The JS Scripting add-on is where the problem needs to be reported.

If this is a problem in Rules DSL also, you need to produce some rules code and logs demonstrating it. And that would really not make any sense because Rules DSL uses the exactl same classes and code for time as we are using above which is showing as in sync.

Hey,

how might this be related to the cron trigger of DSL rules? I created a bunch of rules (one for each full hour) using the following pattern:

rule "Time Test Rule 01:00"
when
    Time cron "0 0 1 * * ? *"
then
    logInfo("rules", "Time Test Rule '01:00' triggered")
end

After restarting OH the first trigger was more or less in sync, but each following execution is delayed by 5 minutes compared to the one before:

2024-12-13 13:00:32.350 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '13:00' triggered
2024-12-13 14:05:07.712 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '14:00' triggered
2024-12-13 15:10:08.857 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '15:00' triggered
2024-12-13 16:15:36.083 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '16:00' triggered
2024-12-13 17:21:32.279 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '17:00' triggered
2024-12-13 18:28:00.260 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '18:00' triggered
2024-12-13 19:34:37.733 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '19:00' triggered
2024-12-13 20:41:17.793 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '20:00' triggered
2024-12-13 21:47:57.810 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '21:00' triggered
2024-12-13 22:54:38.105 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '22:00' triggered
2024-12-14 00:01:18.313 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '23:00' triggered
2024-12-14 01:07:58.463 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '00:00' triggered
2024-12-14 02:13:02.090 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '01:00' triggered
2024-12-14 03:13:17.335 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '02:00' triggered
2024-12-14 04:13:26.493 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '03:00' triggered
2024-12-14 05:13:44.621 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '04:00' triggered
2024-12-14 06:15:17.012 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '05:00' triggered
2024-12-14 07:17:18.297 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '06:00' triggered
2024-12-14 08:19:47.567 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '07:00' triggered
2024-12-14 09:22:45.574 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '08:00' triggered
2024-12-14 10:26:10.775 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '09:00' triggered
2024-12-14 11:29:59.335 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '10:00' triggered
2024-12-14 12:34:05.892 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '11:00' triggered
2024-12-14 13:38:31.451 [INFO ] [org.openhab.core.model.script.rules ] - Time Test Rule '12:00' triggered

(and so on)

This was my initial problem so I started to test the new Javascript rules. Right now I am using an external item that holds the current (and correct) time. If this receives an update I check if something should happen at the given (external) time. But using cron triggers would be the best for time related rules.

Have you / could you sift through these results to see if something is related?

https://www.google.com/search?q=java+in+wsl+time+drift

Unfortunately I found nothing regarding Java within WSL. Only threads that the system time in WSL drifts away, but that’s not the case for me.

It’s really unclear what the problem can be. The cron timers are absolutely independed from JS Scripting or any of the rules langauges specifically. So that kind of is the opposite of the test above where the built in Java Timers kept in sync but the JS timers did not. And this is all Rules DSL too in this last case.

:man_shrugging:

If it were me I’d start by eliminating stuff to see if anything changes. First I’d run on WSL2 without Docker. Then run on Windows without the WSL2.

There’s nothing you can do from user facing configuration that would affect this. This problem isn’t being more widely reported. So there has to be something about the environment that OH is running in that is causing this. If we can figure out what that is maybe we can narrow down to the root cause and see if there’s anything OH has control over that can fix it.