Some scripts failing to initialise on system restarts [Bug in timer function]

  • Platform information:

    • Hardware: RPi 5
    • OS: Raspberry Pi OS 64 bit
    • Java Runtime Environment:
      openjdk version “17.0.14” 2025-01-21
      OpenJDK Runtime Environment (build 17.0.14+7-Debian-1deb12u1)
      OpenJDK 64-Bit Server VM (build 17.0.14+7-Debian-1deb12u1, mixed mode, sharing)
  • OH Version: 4.3.2-1

Since updating OH, I’ve noticed that some of my scripts don’t work upon restarting the system. I have them as inline scripts inside rules. Going into settings > rules, once I open the rule and click save (no other changes), the rule will then work normally.

I’ve noticed the behaviour whether the script was a blockly layout or ECMAScript.

Most of my rules with inline scripts don’t have this bug. The unaffected ones are quite simple scripts (reading item states, sending updates or commands)

The ones that are affected are implementing timers and accessing private cache, that’s the only difference I can see.

Since Blockly basically is JS, that’s to be expected.

Do these rules remain in an INITIALIZING state? Anything relevant in the logs?

I’ve had the chance to do some more testing. Describing it as “not initalising” was the wrong way to describe it. All the rules go to an IDLE state, and I couldn’t see any issues in the logs.

I had some head scratching yesterday trying to gather more info, the scripts were working normally upon a warm reboot of the system, after days of not working properly upon cold boots. On a hunch, I left it overnight and tried again this morning, and surely enough the scripts go back to not working properly

I realised that the affected scripts were actually running, but the timer functions were not working properly. I’ve attached the code below.

Basically there are a couple of items which are motion sensor states (a boolean value). The script commands an item (also a bool value) so that it sets to 1 if any of the inputs are 1, or set
to 0 after a timeout period once all the inputs go to zero (a falling edge trigger essentially).

The timeout period is set to 5 seconds in this example for testing purposes. The timeout value is stored in an item called global_motion_sensor_timeout_01, which is read every time the script is run.

When the scripts are malfunctioning, it’s as if the timeout period is set to 0, ie the output goes to 0 as soon as all inputs go to 0.

I double checked that the value global_motion_sensor_timeout_01 was indeed 5 (I even manually set it to other values just to refresh it), but the script still wouldn’t operate normally. As mentioned, if I click save to update the rule, it starts working normally again.

I have other similar scripts exhibiting the same behaviour.

To summarise:

  1. Rule / Inline Script working perfectly for months
  2. Update to OH 4.3.2-1
  3. Script works fine on warm reboot but not cold boot
  4. The rules / inline scripts are running, but the timer functions within the script aren’t working.
  5. The timer functions are timing out immediately rather than waiting for 5 seconds
  6. When script isn’t working properly, saving/updating the rule makes it work fine again immediately.

One thing that comes to mind in a cold boot vs warm reboot is that the system time is going to very out of sync by 12 hours in this case. I’m just throwing ideas out there at this point. I don’t know how they may interact with the timer functions

var output_cmd_items_0, input_items_numbers, input_items_sw, sum_of_inputs, input_idx_num, input_idx_sw, input_temp_val;


output_cmd_items_0 = 'int_00_led_controller_ch_0_auto_motion_state';
input_items_numbers = ['int_01_led_controller_input_14', 'int_01_led_controller_input_15'];
input_items_sw = [];
sum_of_inputs = 0;
if (!!input_items_numbers.length) {
  for (var input_idx_num_index in input_items_numbers) {
    input_idx_num = input_items_numbers[input_idx_num_index];
    sum_of_inputs = sum_of_inputs | parseFloat(items.getItem(input_idx_num).state);
  }
}
if (!!input_items_sw.length) {
  for (var input_idx_sw_index in input_items_sw) {
    input_idx_sw = input_items_sw[input_idx_sw_index];
    if (items.getItem(input_idx_sw).state == 'ON') {
      input_temp_val = 1;
    } else {
      input_temp_val = 0;
    }
    sum_of_inputs = sum_of_inputs | input_temp_val;
  }
}
if ((cache.private.get('previous_command')) == 0 && sum_of_inputs == 1) {
  if (cache.private.exists('motion_timer')) { cache.private.remove('motion_timer').cancel(); };
  items.getItem(output_cmd_items_0).sendCommand(1);
} else if ((cache.private.get('previous_command')) == 1 && sum_of_inputs == 0) {
  if (cache.private.exists('motion_timer') === false || cache.private.get('motion_timer').hasTerminated()) {
    cache.private.put('motion_timer', actions.ScriptExecution.createTimer('motion_timer', time.ZonedDateTime.now().plusSeconds(parseFloat(items.getItem('global_motion_sensor_timeout_01').state)), function (timer_context) {
      items.getItem(output_cmd_items_0).sendCommand(0);
      }, undefined));
  } else {
    cache.private.get('motion_timer').reschedule(time.ZonedDateTime.now().plusSeconds(parseFloat(items.getItem('global_motion_sensor_timeout_01').state)));
  };
} else if (sum_of_inputs == 1) {
  if (cache.private.exists('motion_timer')) { cache.private.remove('motion_timer').cancel(); };
  items.getItem(output_cmd_items_0).sendCommand(1);
}
cache.private.put('previous_command', sum_of_inputs);

It seems more likely that the issue is related to the timers themselves or the stuff stored in the cache.

What do you specifically mean by a “warm reboot” and a “cold boot”?

Add logging so you know what everything being used in this rule is when it’s used.

Note:

This: time.ZonedDateTime.now().plusSeconds(parseFloat(items.getItem('global_motion_sensor_timeout_01').state)) van be replaced with time.toZDT(items.global_motion_sensor_timeout_01)

1 Like

Warm reboots are rebooting an already power on and running system. Cold boots are starting a completely powered off system.

I’ve had annoyances in some of my python programs where a cold booted system will still retain the previous system time for about ~10-20 seconds until it synchronises to a time server.

Just a hunch considering we are talking about timers.

I will try out using toZDT, however I’m unsure what is being passed into toZDT in time.toZDT(items.global_motion_sensor_timeout_01)

Should I be passing 5000 milliseconds into the function, ie time.toZDT(5000)?

Edit: I see that items.global_motion_sensor_timeout_01 contains many elements, the state being one (as a string).

I used the toZDT function by passing milliseconds, which is calculated anytime the script is run:

timeout = parseFloat(items.getItem(‘global_motion_sensor_timeout_01’).state)*1000;

Then replacing the instances of time.ZonedDateTime with time.toZDT(timeout)

As for whether it fixes the issue, I will have to report back in a day or so after a few boots/reboots

I know the general definition, but OH is a piece of software, not an operating system. OH doesn’t “boot”. And there really isn’t anything that OH is going to see or have control over that is going to make it distinguish between a war boot or a cold boot of the machine.

If you are on an RPi that’s probably caused by the fact that it doesn’t have a real time clock. So while it’s offline the clock doesn’t advance until the machine comes online and NTP fetches the current time.

But NTP usually fetches the time pretty early in the boot process, long before OH tries to come up, unless you’ve modified the configuration somehow.

But even if the time were off from the real clock, it should only cause a problem if somehow OH is storing a time using the old incorrect value, then NTP kicks in, and then the rule runs and the old incorrect value is now way in the past. I don’t see that happening here.

The Item. You can get the Item using any of the following:

items.getItem("MyItem")
items.MyItem
items["MyItem"]

For time.toZDT(), from the docs:

items.Item or org.openhab.core.types.Item | if the state is supported (see the Type rules in this table, e.g. DecimalType), the state is converted

Since it’s a Number it will follow the usual rule for numbers, but I now see that probably won’t work as expected because it will treat the time as milliseconds instead of seconds. I failed to notice it’s being added as seconds and thought it was a DateTime Item.

number, bingint, java.lang.Number, DecimalType | rounded to the nearest integer and added to now as milliseconds

But any of the following will work, all of which are less code over all

time.toZDT(items.global_motion_sensor_timeout_01.numericState * 1000)
time.toZDT("PT"+items.global_motion_sensor_timeout_01.state+"S")
time.toZDT(time.Duration.ofSeconds(items.global_motion_sensor_timeout_01.numericState))
time.toZDT(Quantity(items.global_motion_sensor_timeout_01.state + "s"))

And if this Item is a Number:Time Item then my original suggestion of just passing the Item to time.toZDT() will work because it’s unit aware.

You don’t need the parseFloat. Item will give you the state of the Item as a number using numericState or if it’s a Number:X with units using quantityState.

This was just an aside and unrelated to the original problem. We still need you to log out all variables and states of all Items before they are used.

So you’re probably going to hit your head against a wall as much as I am…

I added some logging when the script creates a timer, as well as checking the timeout variable. For example:



cache.private.put('motion_timer', actions.ScriptExecution.createTimer('motion_timer', time.toZDT(timeout), function (timer_context) {
      items.getItem(output_cmd_items_0).sendCommand(0);
      }, undefined));

console.log("DEBUG: Timer created at: ".concat(time.ZonedDateTime.now()).concat(", ending at time: ").concat(time.toZDT(timeout)));

Surely enough I can see in the logs that there’s a timestamp for when the timer is created, and a second timestamp showing an endtime of +5 seconds.

Unfortunately the debug messages are the same regardless of whether the script is exhibiting the bug or not. In short, the debug message shows that a timer function is being created, and scheduled to end 5 seconds into the future, however the timer ends immediately.

Here’s the thing. I powered down the RPi for ~20 minutes, then turned it back on. The script showed the malfunctioning behaviour, despite the debug messages showing correct values.

This time, instead of going into the script and clicking save, I simply restarted the RPi. Then the script works as normal on startup. :clown_face: :clown_face: :clown_face:

Do you have suggestions for investigating anything relating to the cache? Anything that could be persistent between boots?

what’s the time used for the timer relative to the timestamp on the log statements?

Sorry it took me a while to do some digging and experiment. Yes indeed I missed an obvious discrepancy before.

The timer is being created with the correct timeout value as per my previous post, however the time being logged is different to the time being read/created by the timer function.

Here is a log entry when the script isn’t working properly. Timezone/location has been blanked by me, but the important bit as you can see there is almost an hour difference between the logging timestamp and the timer’s timestamp. Since the timer’s expiration time has already passed compared to the actual system time, it would make sense that it times out immediately.

Note that the logging time matches the actual computer time (the timer is the only incorrect time)

09:45:48.076	INFO	org.openhab.automation.script.ui.6d1d5af2a4	SCRIPT DEBUG: Timer created at: 2025-02-11T08:43:05.522+XX:00[XXX/XXX], ending at time: 2025-02-11T08:43:10.523+XX:00[XXX/XXX]

When the script is working, both the logging time and timer will match:

08:29:03.184	INFO	org.openhab.automation.script.ui.6d1d5af2a4	SCRIPT DEBUG: Timer created at: 2025-02-11T08:29:03.179+XX:00[XXX/XXX], ending at time: 2025-02-11T08:29:08.181+XX:00[XXX/XXX]

I had to experiment a few times to see the circumstances in which the times do/don’t match.

With openhab.service enabled at boot:

  • Booting system from a powered off state (after being off for >20 mins or so) results in times NOT matching
  • Rebooting a system that is already running results in times matching again
  • If times aren’t matching from a cold boot, restarting openhab using “sudo systemctl restart openhab.service” will also fix the times so that they match again

With openhab.service disabled at boot, obviously it’s unknown whether times match or not. However I let the system idle for a few minutes so that the system clock will sync. I then re-enabled and started openhab.service, and the times match.

Hopefully that helps. It’s beyond my ability to know where the issue lies, but let me know if there’s other tests I can do. I don’t have the ability to submit bug reports. I don’t think you do that through the forum but rather through github?

While I mentioned that this behaviour has only occurred since updating openhab, I remember that the java runtime environment also recently updated on the RPi.

I also suspect that the timer is out of sync by the amount of time that the system is powered off for. I will need to test that tomorrow when I can do it in a controlled manner, eg power off the system for 40 minutes, then check to see if the timer is out of sync also by 40 minutes.

Do you have -Duser.timezone setting for your EXTRA_JAVA_OPTS?

Should I have an /etc/default/openhab directory? With EXTRA_JAVA_OPTS as a file?

That directory doesn’t exist on my system.

Also, it looks like the timer is retaining the last time value when the system was powered off, then being offset by that amount (while the logging time is the correct system time)

I think it’s pretty clear that the problem is OH is coming up before NTP has synced the clock on the machine and at least parts of it are stuck using the old time even after NTP has updated the clock.

Everything appears to work on a warm boot because the machine isn’t offline for long enough to notice a discrepancy.

You can configure OH to wait for NTP before trying to start n systemd. I’ve not done this myself but some googleing shows the following Howto: delay a systemd service until the clock is synchronized — Debiania.

Another alternative is to add a battery backed up real time clock to the RPi. There are several available.

Yet another alternative is to add a delay on the OH start script or a cron that runs a certain amount of time after boot so that NTP has time to sync the clocks. This is probably the most brittle of the options.

No, you should have an /etc/default/openhab file with EXTRA_JAVA_OPTS as a field in that file. But only if you are running an apt/yum installed instance of OH. If you’ve installed some other way, you need to add that somewhere else.

But from what I can tell, I don’t think this is a timezone issue.

1 Like

OK, I’ve created my own systemd units so I should be able to figure out that solution.

Quite interesting considering that the logging time, as well as any clock widgets show the correct time, but the timer function is affected. I wonder which update was the culprit.

It might be related to how GraalJS handles time and could be fixed by

You might now be seeing this issue due to OH 4.3 starting up quicker or changing to faster hardware.

Looks like OH 5 might contain the GraalJS version with the fix.

3 Likes

Out of interest, why is it that saving the script also fixes the problem (but only for that script)?

Does OH compile the script and retain some compiled code from last boot?

I’m reading through the GraalJS code trying to get my head around how the RPi is deriving a time.

I’m guessing there’s a counter similar to microcontrollers where it will start counting once power is applied, then offset to provide a human readable date format one or more system date/times, either synced or retained

Yes.

It loads and compiles the code only once and then reuses that for each execution of the rule. It does not recompile it for every run of the rule.

You probably won’t find the answer to that in the GraalJS code. That’s an operating system level question.

https://forums.raspberrypi.com/viewtopic.php?t=244216

When Java comes up it gets the time from the OS and GraalJS gets the time from Java. The problem @MikeJMajor points to is that GraalJS was not ensuring that its own internal representation of time is kept in sync with the OS. So if the clock on the OS changed after GraalJS started running, GraalJS would continue to use the old time.

1 Like