Cron triggering too late (NRE, Jython rules, OH in docker)

Since this does not seem to be affecting anyone else, it is unlikely that it is a bug.

I am afraid it seems this way to me too - boohoohoo…

while tailing the log file to see in real time if the logs are out of sync with the docker and/or host system clocks.

Maybe there is a slight misunderstanding here: the timestamps in the log, are perfectly in sync with NTP time (i.e. time in logfile = time within docker container = time on host = time on my desktop machine).
The time there is not off at all, but the NGRE scheduler seems to be doing something differing from the actual time.
Does it still make sense here, to watch log entries with times which are out of sync? Because even now, the time in the logs is always in sync.

What hardware, OS, docker version, and Java version are you using?

Hardware: Supermicro X9SCL/X9SCM / Intel® Xeon® CPU E31240 @ 3.30GHz / 16 GB RAM
Virtualization layer :ESXi 6.5
Host: Ubuntu Server 16.04.6 LTS
Docker: v 19.03.12
Docker image: openhab/openhab:latest-debian (currently latest is v2.5.9) + a modification using libcap, to allow Java in that docker to bind services to port 80 (allows me not to use a port when going to the web-UI)
Java version would be the one, bundled in the standard image I use (or isn’t it?)

(Yes, I know there are quite a few layers of virtualization. But I have been running OpenHab in exactly this setup for some years and did not have similar problems.)

How did you create your container?

I am not quite sure, what is relevant here. I am using a docker-compose to set network specifics, set environment vars and link volumes and devices.

The scheduler works by calculating the next time the task is to be run and then schedules the task. I can think of 2 possible causes this goes wrong 1) there is a delay between the moment the time is calculated and the actual moment the task is scheduled. However that would very unlikely be the case here as the time difference is large. 2) if the scheduler clock runs slower than the actual time. So for example if the scheduler would count seconds and each second takes 1.1 second in real time. This would cause the next time to be 1.1x to late. When the scheduler is than running the next time it will calculated from the actual time to the next time. Which would explain why it’s always the same amount of delay. And the slower time counting would also explain why the time shift is larger for longer periods.
Since you run in virtualization this could be a cause. Do you run any other processes? Which might explain why the scheduled time could be slower as it’s process might shared cpu clocks and thus be delayed. Although I don’t know enough of the internals if that actually is the case.
One way to test it is to schedule a task in a rule for a similar time period and log the start and when the scheduled task is run. I would expect a similar time delay in that case.
I don’t know if there is a simple solution to this issue l, if it’s about delayed time.

1 Like

[ reminder, Quartz scheduler does appear to be functioning normally, alongside all this. “slow time” would have to be derived from some different clock source. ]

Historically, there have been clock drift issues in VMWare. For example, you could not run a Windows Domain Controller in a VMWare VM due to clock drift causing issues with authentication. I do not know how the two schedulers could possibly react differently to this though. Have you tried running docker directly on the host to see if that isolates the issue? The layers of virtualization that you are using appear to be a significant difference between your setup and others who are not experiencing the issue you are reporting.

Thanks again to all of you for taking the time to heklp me here. I can totally see now, that this is most likely being produced by some layer of my virtualization setup and not a bug (if it were one, then it would be a total edge case).
@hilbrand’s comment

  1. if the scheduler clock runs slower than the actual time. …

Seems highly logical to me - this is fitting to what I am experiencing: the delays grow, when the time-period that is spanned by the cron gets bigger.
So I need to probe around what I can change in the realm of virutalization. As a first test, I just shut down the second virtual machine on my ESXi host - according to the logic, that should free up processor cyclees for OpenHab host and the delays should decrease.

Is there a logical explanation, why the scheduling used by the old (DSL) rules is not impacted, but NGRE scheduling is?
Can you name what technology/function/framework is being used for either, so I can search around the net and maybe find some info on the behaviour of on a virtualized platform?
Did I pick up correctly, that DSL scheduling is based on Quartz?

I just don’t hope I have to give up on the virutalization setup, it really makes my day a lot easier when it comes to trying out new things or doing major updates.

Yep, DSL rules and (I think all) bindings with a need e.g. Astro and polling techs, use Quartz. I cannot imagine that does anything other than getting the time from Java.
However there are several Jave datetime frameworks and the type used there - joda - is now out of favour, so I imagine the new code operates with a different type.

My gut is telling me this is a problem about calculating target e.g. maths-by-sixtieths, but its very hard to see how that affects only you.
Thinks; have you weird timezone? Do system/java/openhab timezones agree? Locale, affecting time formatting? Can’t blame variable type in the case of cron.

Hi all,
I did something which I know is absolutely counter-productive for bug-fixing: I just moved forward.
Now the bug is gone (meaning: NRE triggers are punctual - with only milliseconds in delay).

What did I do?
As I really got nowhere when trying to shrink and simplify my setup, I moved into the opposite direction, taking steps I anyhow had planned to take, and updated two of the layers involved:

  • VMWare ESXi from 6.5 to 7.0.1
  • the host OS from Ubunto 16.4 to 20.4 (here I did not do an update but I set up a fresh VM with the new OS)
1 Like