deltaSince or historicState sometimes returns null on openHAB 3.2

For historical data rrd4j is used. The data is saved every minute. But deltaSince or historicState sometimes returns null. Any ideas why that is happening?

test.rules:

rule "Energie"
when
    Time cron "0 * * * * ?"
then
    var Number delta1 = Smartmeter_Energie.deltaSince(now.minusHours(1)) as Number
    var Number past = Smartmeter_Energie.historicState(now.minusHours(1)).state as Number
    var Number now = Smartmeter_Energie.state as Number
    var Number delta2 = now - past
    logWarn("test.rules", "Energy last hour: " + delta1 + " kWh (deltaSince) " + delta2 + " kWH (histotricState)")
end

rrd4j:

Items {
	Smartmeter_Energie					: strategy = everyChange, everyMinute
}

Smartmeter_Engerie.json (60.5 KB)

The timestamp of your failing log entry is 8 seconds out of alignment with the rule trigger. That’s a clue that one of the retrievals has taken a long time, or more likely the deltaSince has just timed out.

Question; has a reboot happened in the previous hour or so? rrd4j will place “NaN” into time-series pigeonholes where no per-minute persisting can have taken place, which can cause glitches later.

There was no reboot since a day. Also the system was untouched during night when this happens. I also checked the data in rrd file and also plotted the graph. I could not find a “NaN” entry.

But during cheking the log i found that i only filtered for warning. There were also a few error due to null values of historicState.

All warning have a delay off a few minutes. But the historicState seams to have no delay.

Do you know how long it waits until a timout occur? Can it be increased?

So that’s not a timeout then, and there’s no point trying to change timeout periods.

Next step is probably put DEBUG logging on persistence service.

I have set the log level to debug. But the value seams to be stored 50 ms before. What looks strange to me is that all entries have a second log entry with (again).

Remember rrd4j has an archive structure, with multiple tables for each Item, all representing the same times but more compressed. I’ve no idea if that is normal debug log, but the different messages (“again”) suggest it is expected.

Anyway, debug of that module didn’t tell us anything about read activity. I don’t know what to suggest to get ‘persistence extensions’ logging.

I would doublecheck which items are really persisted, the individual item and a group containing the item?

@rossko57 Thanks for all that information
@opus The group item is not persisted. Under OpenHAB 3.1 i had never seen these errors.

I tried to isolate the problem. Deactivated nearly every binding, transformation, ui,… Made the DSL rule also in Blockly and also in the new JS Scripting. All the same. But then i changed the timing to trigger 30 seconds later.

  triggers: [triggers.GenericCronTrigger("30 * * * * ? *")],

The last days i observed it i had never seen a null value again.

So it looks like to me, that when the rrd4j file is written at the same time when the rule is quering the data it cannot get a value and returns null.

Seems a reasonable deduction. rrd4j has a fairly big job every minute.
Are you running with default “everything you can” settings? Might improve by selective persistence.

Of course what we really want is for your ad-hoc query to go in the work queue and give a result. Maybe it’s just timing out silently.

Before over 600 items werd stored in rrd4j, but currently only 50. But the intervall the null value happens was nearly the same on both configurations.

During my test before I have run three different language rules at the same time triggering every full minute. Sometimes one failed but the other both returned values and the difference of the logging where only 100 milliseonds more or less. I can also make it working with a second try when the first time a null is returned. So not sure why it should timout on one and not on the others.

Have you tidied away the unwanted rrd files? I don’t know if will still carefully update/compress the archives as time marches on, it might keep populating the per-minute pigeonholes even if it is with NaN, Meaning it still processes 600 archives every minute.

The random nature of failure does not surprise me.
There was recent post about similar jdbc workload-related errors. It would be nice if the persistence service logged WARNs for timeouts, if that is what is happening.

Yes, I had deleted all files in the directory var/lib/openhab/persistence/rrd4j and also cleaned the cache with openhab-cli clean-cache.