RRD4J, historicState(), and NaN gaps in data

rossko57 · August 6, 2020, 9:36pm

I’ve got an openHAB development system on an old laptop; this does not run 24/7
I’m generating simulated numeric meter readings and use RRD4J persistence, with the usual everyMinute settings and default consolidation strategy.

Because the system is not 24/7, I have gaps in the persisted data, naturally enough. Here’s a sample from rrdInspector, 2nd archive where it’s been consolidated to every 4 minutes.

No surprises so far.

If I now try to recover values using historicState() in rules, I get into trouble.
Yes, I realise the older data may have accuracy compromises because it has been averaged over time, not the problem here.

I don’t really know how historicState() decides which archives to look in, but tested here with a target time that is not in the first every-minute archive, but is in the second 4-minute archive.

Requesting historicData() for 11:59 gives me the expected object with state 3806.75, and timestamp 11:56
I say expected, because what I would expect is for searching for 11:59 exactly to “fail”, but then look for preceding data instead.

Likewise, seeking for 12:02 returns data stamped 12;00

Requesting historicState() for 12:10 returns null.
That’s not what I expect. As I understand it, if there is no record at the specified time instant, historicState() should grope back in time until it finds a record, making the assumption that data persisted until the given time.
In short, return the next-oldest data.

It’s arguable that the NaN records do represent “the next-oldest data”.
But those NaN records have been created, synthesized, by the rrd4j aggregation process. No data was actually recorded for those times at all.

Note that whatever it is that the BasicUI simple Chart widget does to access the same gappy dataset, it does not choke on it and draws a chart using the “ignore NaN and assume last value remains valid” strategy.

I think this is a bug, but welcome discussion. @opus perhaps?

opus · August 7, 2020, 6:02am

Upfront I have to say that I am an interested user of rrd4j and your statement: I don’t really know how historicState() decides which archives to look is true for me as well.
The samples you found getting a result both are within a period were values have been stored, the value that does not have a directly proceeding value in the timestep. For me it LOOKS like historicState looks into the first archive that fits and only returns what is stored as a value for the requested timestep, in other words it does not go backwards until it finds a value but takes exactly that value that is stored fot the “touched” timestep.I never dared to look that deep into the code.

rossko57 · August 7, 2020, 10:22am

Thanks for taking the time to look.

Yep, that’s the thing. Is it supposed to reach back for a valid value? (as it would in any other database).
I think I’ll lodge a formal bug later.

It may be that this is something that is not easy to overcome, because of the way rrd4j works, in uhh “fabricating” NaN data to span gaps.

Most users would never see it unless they target a timeslot corresponding to a reboot or such.

EDIT -

github.com/openhab/openhab1-addons

[rrd4j] historicState() does not work as expected with 'gaps' in database

opened 11:43PM - 11 Aug 20 UTC

Rossko57

Version OH2.5 , rrd4j running with default config. rrd4j compacts data over t…ime. Generally Item states are persisted at 1 minute intervals into an archive that covers only 8 hours. The next archive is auto created with averaged data at 4 minute intervals covering 24hrs, next archive 14 minute intervals covering a week, etc. These archives may be viewed as boxes of defined timeslots or cells. if no data was recorded for a given timespan (there is a gap in the data) any archive may have one or more cells populated with **NaN** representing 'empty'. Example data with gaps/NaN in this discussion thread https://community.openhab.org/t/rrd4j-historicstate-and-nan-gaps-in-data/103342 While my non-fulltime development systems has more gaps than most, gaps of several minutes can arise for any user simply during upgrade or reboot. This is all working as expected. Persistence offers method **myItem.historicState(someInstant)** for use in rules, to recover the recorded data at a past instant in time. The returned object includes stored state, and timestamp of the record. In most cases, there will never be a record for the _exact_ instant requested - the persistence service instead fetches the immediately preceding record in time, which may be some minutes or days earlier, but is presumed to be the state still in force at 'instant'. For rrd4j this works as expected only where the next-previous record is a valid record. But if the target instant falls in a "gap", the next-previous record may be NaN. In this case persistence service looks no further and returns null to the user rule, as though no data is available. The expected behaviour here would be as for other persistence services - historicState() should reach back in time as far as is necessary to find the next valid data, passing over NaN cells. Alternatively, returning an historicState object with state UNDEF but the correct timestamp of the NaN cell would at least allow users to make there own management arrangements. I do not know if this issue belongs with rrd4j add-on or more generally in the persistence framework.

rossko57 · August 7, 2020, 9:48pm

Just for info, I made a kludgy workaround for my purpose.

I’m actually recording meter readings. Yes, influxdb would be "better’ for that, but I already use rrd4j for temperature charting etc. I’m aware of the limitations due to data consolidation, but it’s good enough for my purpose.

So, my task is just to get meter reading for yesterday 00:00 and today 00:00, and calculate yesterday’s daily consumption. This goes wrong if the historicState() function I’d usually use to retrieve these data points hits a NaN record, as described earlier.

My cheat is to use minimumSince() instead. This skips over NaN, and because my meter is only incrementing, the first valid record it finds will be the lowest. It’s NOT the same, but good enough for my needs for the time being.

rossko57 · January 2, 2021, 10:49am

I do not yet have a viable OH3 system to verify if this problem, err, persists.

Since my github issue report was against 1.x persistence extension as used in OH2, I am concerned it will now get lost but that the issue may still exist in OH3. If anyone can verify or encounters this in OH3, we should make a new issue report.

I am a bit concerned since for most users, this will appear like a transient error and not be understood. They will have few NaN gaps in the rrd4j database, and are likely using now.minushours type queries that will likely work when run the next day etc.

opus · January 2, 2021, 4:20pm

I have a .rrd file which is build by the OH3 default persistence and which has a short timeframe with NaN
Doing some requests for historicState around the time when only a single value is persisted gve the following:

2021-01-02 17:15:37.077 [INFO ] [.core.model.script.TestHistoricState] - Value(Day Before): null
2021-01-02 17:15:37.099 [INFO ] [.core.model.script.TestHistoricState] - Value: 16.12.20, 22:45: CPU_Load -> 0.1
2021-01-02 17:15:37.121 [INFO ] [.core.model.script.TestHistoricState] - Value: 16.12.20, 22:45: CPU_Load -> 0.1
2021-01-02 17:15:37.140 [INFO ] [.core.model.script.TestHistoricState] - Value: null
2021-01-02 17:15:37.157 [INFO ] [.core.model.script.TestHistoricState] - Value: null
2021-01-02 17:15:37.178 [INFO ] [.core.model.script.TestHistoricState] - Value: null
2021-01-02 17:15:37.198 [INFO ] [.core.model.script.TestHistoricState] - Value: null
2021-01-02 17:15:37.217 [INFO ] [.core.model.script.TestHistoricState] - Value: null
2021-01-02 17:15:37.238 [INFO ] [.core.model.script.TestHistoricState] - Value(Day After): 17.12.20, 21:00: CPU_Load -> 0.1

First and last calls where for a day before and after, all others are at 15 minutes steps (same as the steps in the archive). During this time only the value for 22:45 is in the database, all others are NaN.

In other words: No change!