Astro / NTP odd behavior - "losing time" overnight

RipNCode · August 12, 2020, 3:28pm

I have recently run into an odd problem where overnight, the time update seems to “stop” for a period. I presume this is an NTP issue, but the behavior is really strange.

On any reboot of the Pi (openhabian) or restart of OH, time comes up correctly. More and frequently, notably since upgrading to 2.5.6 (and now 2.5.7, but I don’t know if the version update matters or not), overnight the time must “hang” at some point such that my morning, when the time should be (for example) 8:00 am, August 12, 2020, OH believes it is 2:48 am. As a result, the Astro bindings are all off for triggers. On the other hand, I have set up a morning alarm (ala the tutorials), and that will trigger just fine at the right time - only the Astro binding doesn’t recognize the right time and won’t trigger events correctly. For example, say the alarm is supposed to go off at 7:00 am tomorrow. At night, before bed, everything is running correctly. At 7:00 the alarm goes off correctly, but as part of the alarm rule, it should open blinds if it is daylight outside. The system believes it to be 2:48 am, so it’s not yet daylight, so the blinds don’t open. Yet the alarm still goes off. The NTP time is 2:48 am (even though the alarm went off correctly).

This is really strange, and I’m not sure where to start looking for the problem. It’s inconsistent, so it doesn’t happen every day, but it does happen frequently (3-4 times per week). Alarm functions work, but the system time is wrong. Astro won’t work because it believes the time is different than it is. It’s perplexing…

I found a similar topic on this some time ago, and checked for multiple, duplicate Things, but there’s nothing like that I can find that would create an obvious conflict.

Any ideas on where to begin checking this out?

rlkoshak · August 12, 2020, 4:34pm

If the system time is off than it makes sense that astro would be off too. Unfortunately, the problem and therefore solution will be outside of OH. Look on the raspberry pi forums as something is weird with the machine. It might work to configure the rpi as an NTP client to one of the publicly available ntp servers. But that would probably only mask the root problem.

The alarm works because it uses timers instead of system time.

rossko57 · August 12, 2020, 5:10pm

What are you actually looking at, to draw that conclusion?

The alarm timers so far as I know are not triggered relatively like “3 hours hence”, but by absolute time. The quartz scheduler must believe it is 07:00? Though that does depend when you set it, if you use now+N hours methods.

Astro binding should set events for the day ahead using the same method and same scheduler - but we know when that happened, 00:01
You might confirm that looking in your openhab.log for midnight.

As a random question, do you have Items changing from NULL in your events.log during the night?

RipNCode · August 13, 2020, 2:01pm

Sorry for the delay in responding - yesterday turned into “everything’s an emergency right now” day…

FYI, this has occurred again. I rebooted the Pi and everything was smooth all day yesterday. Time was picked up correctly.

This morning, pretty much exactly the same behavior as described before, but I’m picking up some more details. To address some of your questions @rossko57, on my sitemap, the time/date item displays “Wednesday, August 12, 2020 9:29 pm” even though it is actually Thursday, August 13, 2020 08:53 am as I type this. This is the core of the issue. The Astro binding show current phase as “Night”. The alarm went off correctly. Note that the time shown by the timedate item is not updating at all. It appears that “something” goes awry at some random night time period and the item stops updating at all.

Viewing the logs, I can’t get anything that far back (it’s about an hour old - I guessing I can probably change that) to get detailed information. However, the logs do have the current timestamp showing correctly.

Checking timedatectl on the Pi shows the correct time as well.

One question, what is Astro basing its time on? By all accounts, the system time is correct, OH time is correct (coming from pool.ntp.org), only the NTP-based time/date item is wrong. Maybe I need to reinstall the binding for some reason? While annoying, I have to admit it’s an interesting problem.

RipNCode · August 13, 2020, 2:27pm

BTW, it looks as if restarting the NTP bundle has no effect, and it is listed as Active.

RipNCode · August 13, 2020, 2:45pm

One more thing and I have to run for a few hours. Restarting the Astro binding from the console updated the Astro phase correctly (changed from Night to Daylight). The time/date item is still wrong.

rossko57 · August 13, 2020, 2:55pm

We have no idea how your date/time Item gets updated. While it might be hinting something about updates in general, this is not telling us anything about time the system thinks it is.
Your log review seems to confirm this?

Likewise, “night” condition - are you talking about some Item of yours that does not get the updates you expect? I’m trying to say there is likely no problem with bindings and clocks as such, but some part of your system is seizing up.

Only having logs going back an hour is obviously useless. Have you changed settings here, or is this part of the symptoms? By default you should get days, plus older archives.

mstormi · August 13, 2020, 3:09pm

setup a cron job to issue “timedatectl” at regular intervals.
Eventually use timedatectl set-ntp true as root to sync (or 1xfalse+1xtrue, this is from memory)

If you haven’t updated openHABian recently, do so now.

RipNCode · August 13, 2020, 7:48pm

Thanks @mstormi. I’ll set up a cron job and see if that works out. Openhabian is pretty much up-to-date (updated it a few days ago). Is there any risk or benefit to changing the repository to Buster (I’m still running Stretch). Most every machine I run is running Bullseye right now, but I just left openhabian at defaults…

RipNCode · August 13, 2020, 8:17pm

So far, as best as I can tell, the logs do confirm something about system time. I’ll see if I get the same behavior tonight after @mstormi 's recommendation. What appears to be happening is that the Astro event is relying on the same information that the item update is getting wrt correct time. I should be able to capture this with logs tonight as I’ve changed some logrotate scheduling (I don’t remember changing it from defaults, but that doesn’t mean I didn’t do something at some point and “fat finger” it). It’s just that the trigger event for Astro doesn’t occur because of something else in the system related to NTP. It’s odd that most all of the timestamps were correct except for the Time/Date item using NTP, and the phase for Astro was stuck there as well (in this case, the “night” phase).

rossko57 · August 13, 2020, 8:28pm

Just after midnight, Astro “does its thing”.
It calculates events for the day ahead, e.g. sunrise, and sets up a scheduled job to execute at that time.
The same calculated sunrise value is also used at midnight to populate any linked Item.
That’s it really, after that it’s all in the hands of the scheduler. The same scheduler that runs your 0700 cron rule.

That’s what I’m driving at with logs. Does the astro event in fact occur, but the consequences that you expect, like some Item getting updated by rule, are what is getting missed.

Astro doesn’t know or care about NTP, it just uses the “system clock”. I don’t know exacly what that is - I’d guess java’s clock with openHAB’s timezone applied.

Wolfgang_S · August 13, 2020, 8:44pm

Which logs did you check ?
OH log / events log and / or linux system logs ?

Cronjob starts normally are being logged in the systemlog. On my cubietruck the OS provided cronjob runs hourly:

fgrep -i cron syslog |grep hourl
Aug 10 15:17:01 cubietruck CRON[4176]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 10 16:17:01 cubietruck CRON[8893]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 10 17:17:01 cubietruck CRON[13342]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 10 18:17:01 cubietruck CRON[17828]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 10 19:17:01 cubietruck CRON[22538]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Aug 10 20:17:01 cubietruck CRON[27384]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)

mstormi · August 13, 2020, 9:56pm

not that I know. I’m not sure but I think NTP is what changed from stretch to buster so maybe that already makes a difference.

RipNCode · August 16, 2020, 1:50pm

I believe this may be solved - I’ve waited a few days to check the behavior before posting a solution - just in case someone else sees this. I do believe now that this is an issue with Stretch and timedatectl while using more updated OH on Openhabian. It may also be an intermittent ntp update issue with debian.ntp.pool servers as I changed those as well. YMMV, but I seem to now have a stable connection and correctly updated Astro and NTP data at the Pi system level and in OH.

Note that the OH logs don’t seem to help much, at least with just INFO. As you check the period of time where the time/date items stop updating, you’ll just see a gap in time where there’s little data logged at the time the item(s) stop updating. After a period (I saw about 4 minutes of no logs), logs resume but no data on the time/date items is ever recorded and no major problems seem to appear.

I ultimately opted to not set a cron job for timedatectl and try to manually check what was happening. Thus far, no cron job is set and it is holding time correctly for 2 days.

Just some more data on symptoms… I saw that the time/date Item based on the NTP binding was falling behind 10-12 seconds behind the actual NTP time as displayed by timedatectl (crudely measure by manually counting the time). I recognize that some of this is the update of the UI time, but over time, the time in the UI drifts further. This happened almost immediately after the time/date item was correctly set after a restart of OH. At some point, the time/date items in OH just stop updating at all. Checking time with timedatectl from the command line always shows correct time/date, but the time/date items in OH just stop updating.

If you see some similar behavior, check the systemd-timesyncd service:

sudo systemctl status systemd-timesyncd

If you see the following error, this fix should work:

Network configuration changed, trying to establish connection.

Even if the system time, as checked with

timedatectl status

is correct, and it shows syncing with NTP is ‘yes’ this should work.

Edit

/etc/systemd/timesyncd.conf

and uncomment the ntp and #FallbackNTP lines. I ultimately had to add a few non-debian ntp servers in the latter line (separate with a space) to make everything work. This is why it may just be an issue with their servers and may be temporary.

Exit the editor and reload the unit files:

sudo systemctl daemon-reload

Restart the service.

sudo systemctl restart systemd-timesyncd

At this point, it should update correctly. You can recheck the status and make sure it’s connecting. In my case, it is connected to the pool.ntp.org servers, sames as set in OH. It has now been running a couple of days with no drift and no failure in Astro.

This is openhabian running OH 2.5.7 on a Pi 3B+ running Stretch. I am looking at upgrading to a Pi 4 soon, and I will test moving from Stretch to Buster. Point being, it will work with Stretch.