RRd4J failing after Daylight Saving Time switch

mattegary · March 26, 2017, 2:10pm

Hi everyone, during last night we switched to Daylight Saving Time (I’m from Italy) and because of that we move our clocks one our ahead from 2 am to 3 am.
As you know, all electronic devices do this kind of procedure automatically and instantaneously.
Well, exactly at 2am my charts of temperature based on rrd4j persistence system stop working.

I assume that it is not because the temperature value wasn’t update between 2am and 3 am (because that our didn’t even exists) but because the rrd4j storage engine failed in inserting on value per minute (as specification).

I think that some sort of fixing should be implemented.

Last but not least, one question: How can I reset rrd4j DB so that charting service would continue to work? Thanks a lot!

Matteo

sihui · March 26, 2017, 3:07pm

opus · March 26, 2017, 3:08pm

Hi @mattegary, this issue was already reported in the community (see link.)
Just restart your OH and rrd4j will continue to waork as before, only the period from the timeshift to now will not have any values.

Hansohm · March 26, 2017, 10:16pm

@opus
hmmm, you’re right, the issue was reported elsewhere. but isn’t here (Add-ons / Persistence Services) the right place for this rrd4j persistence bug?

but more important:
is there a real solution for this bug? To restart oh2 is just a workaround. would be great if a developer could fix the issue before the next daylight switch will be performed. As far as i know this is done every year…

opus · March 27, 2017, 3:41am

Yes, this section would fit the topic better.
However, the solution will be triggered by filing an issue on github. I will file that one as soon as I get feedback which part of github is the correct one ( Eclipse Smarthome or OPENHAB Core). The bug did show for you and me in relation to rrd4j however it seems it is triggered by the scheduler.

mattegary · March 27, 2017, 7:08am

PRobably the cause is the same (DST changing) but the implementation of a fix depends on the persistence service been used. RRD4J has problem with DST because it needs at least on input value per minute, so in spring it missed data for an entire hour, while in autumn it has doubled value upon the same hour.

A more general fix could be implemented in order to handle the loss of data in any moment of the year, perhaps for a power loss on the openhab instance or other problem.

Hansohm · March 28, 2017, 10:11pm

I have massive problems since the DST issue from sunday morning. Today (tuesday) at 03:00h rrd4j stopped storing new values and openhab2 wasn’t working any longer. Had to reboot this morning. At 14:00h (which is 02:00 PM) the same. No more persisting and openhab out of order.

opus · March 29, 2017, 3:41am

Are you sure it is related to the switch to DSL?
In my case the restart completely overcame this issue and none of the other posters in here reported such a problem.
Could you show your exact rrd4j setup?

Hansohm · March 29, 2017, 7:12am

I´m not completely shure. But 02:00 AM and 02:00 PM seemed to be related to 02:00 AM from DST…

My rrd4j setup:
Strategies {
everyMinute : "0 * * * * ?"
every5Minutes : "/5 * * * * ?"
}
Items {
gTemperatur, gHelligkeit*, gHelligkeit_AUSS: strategy = everyMinute, restoreOnStartup
gTemperatur_AUSS*: strategy = every5Minutes, restoreOnStartup
gWind_AUSS*: strategy = every5Minutes, restoreOnStartup
gHeizStatus*: strategy = every5Minutes, restoreOnStartup
gLichtStatus*: strategy = everyMinute, restoreOnStartup
}

Maybe the “every5Minutes” strategy is the problem? But it worked well for about 5 month with OH1 and 1 month with OH2.

imhofa · March 29, 2017, 8:48am

rrd4j needs the strategy “everyMinute”, because of the compressing strategy…

Hansohm · March 30, 2017, 10:06pm

ok, I changed that to get sure. All strategies are “everyMinute” now.

PtrO · March 31, 2019, 3:14pm

FYI: Though en couple of years later, the problem (may) persists. Today NL/CET–>CEST at 31mar19 02:00 CET went to 31mar19 03:00 CEST and I got a lot of registration failures.

After Winter changed to Summertime, the rrd4j databases apparently miss lot’s of information which remains until a restart.
Advice, as long the omission is not solved, perform a restart after a DayLight time-changes. Next time, I consider to set openHAB on hold (stop/start) during the time-change period (01:00 until 04:00) which give enough time to let by/pass things.

opus · March 31, 2019, 5:20pm

Please post the complete problems observed.
FYI my rrd4j databases are running smooth without touching them after last nights shift.
My logs do show no Warning nor Error, although I do not have setup any special logging for rrd4j

noppes123 · March 31, 2019, 6:34pm

Since RRD4J stores everything in UTC time in its database and only converts it when reading or displaying it, that is exactly the behavior to be expected.
If there are problems, an issues should be created on Github. Could you please do that Peter (@PtrO) with logs and everything? Thanks.
After all, in Europe we will probably be stuck with Summer/Winter time at least until 2021, and other areas in the world will continue to do so.

PtrO · March 31, 2019, 11:00pm

Well there’s not much to say and report other then right after summertime went active (clock goes from 02AM to 03AM, my RRD4J databases, went nuts.

When I query the the RRD4J databases, using Java inspector, almost all registrations after 03:00 CEST until my OH restart are simply stored as NaN (not a number).
The failed NaN entries directly after Summertime are identified with a CEST timestamp e.g registrations before have “CET” in the Timestamp. The timestampvalues itself have NO gaps in their sequence…
So Timestamp interval value 1553994000 is indicated by inspector as CET and the next one at interval 1553994240 (in the 4 minutes interval section) has the erroneous NaN.
The same goes for the other interval (14, 60, 720 and 10080 minute interval) tables.

Thinking on things, I guess that the Average-Calculation when wrong as the time interval (base on wallclock difference) was more then the database interval of the minuted one, gving a NaN value that was subsequenly propagated in the other tinterval sections.
After OH restart, new values were again recalculated whiel respecting the clocktime.

However, some “item-values” in databases that I’ve explicit “post-update” by my own rules, were mostly not affected.
Furthermore, as said, no messages, no failures other then empty values in databases that are filled by “items” from “bindings”.

This lasted until I, this after noon 16h27 cest) restarted openHAB. Since then, values in the databases are again normal registered.

FWIW: I run openHAB version 2.2.0 on my QNAP TS509 and use “org.openhab.persistence.rrd4j-1.14.0-SNAPSHOT.jar” via /addons.

My rrd4j.persist settings are:

    Strategies {
    	everyMinute : "0 * * * * ?"
    	default = everyChange
    }
    Items {
    	* : strategy = everyMinute, everyChange, restoreOnStartup
    }

opus · April 1, 2019, 6:06am

Have a look into this old thread. Christoph Weitkamp reported no issues while using OH2.3 over the DST shift in March’18, I observed the issue back then only because I was on OH2.2 (stating OH1.2 in the post was an error).
You are still on OH2.2, guess what!

PtrO · April 1, 2019, 1:55pm

Thanks very much !!! Wonderful how you pinpoint me to the direction.

So to conclude, last year this behavior was already observed per issue #3185 explaining the the “time jump” that disrupted the timerelated persistence stategies.
The issue was solved by fix #5299. As I can derive, “they” fixed this - in summary - by adding a TimeZone parameter for the internal cron-expression scheduler (in program " CronExpression.java". This whoe construction was 3 months ago replace/merged by a newly designed (scheduler) version, earlier created (as far as I can see) by/from @Hilbrand.

Anyway, its clear to me what caused this and now I need to (look for an) upgrade of my QNAP implementation.