deltaSince Bugs in persistence

  • Platform information:
    • Hardware: armv7l GNU/Linux
    • OS: Raspbian GNU/Linux 9 (stretch)
    • Java Runtime Environment: zulu-embedded-8.25.0.76
    • openHAB version: 2.5.1
  • Issue of the topic: deltaSince is calculated improperly off by 1 (problem A)
  • Issue of the topci: deltaSince uses improper < time instead of >= time query to influxdb (problem B)

I’m using influxdb persistence, and will have item of Number type named counterGamingHours which is incremented every hour I’m gaming :wink:

You can can observe current values in influxdb:

> select * from counterGamingHours order by time desc limit 3;
name: counterGamingHours
time                value
----                -----
1583590429531000000 153
1583586753133000000 152
1583582573161000000 151
>

So the NEWEST one is from yesterday (March 7).

BUG: if you asked deltaSince(now.withTimeAtStartOfDay, “influxdb”) openhab queries influxdb (I observed it with debug logs on influxdb layer) with a query:

select value from "autogen"."counterGamingHours" where  time < 1583622000s  ORDER BY time DESC limit 1

which actually creates two problems:

Problem A: openhab deltaSince from the above example returns “1” - WTF? :slight_smile: there is NO other data in the influxdb. It shall be 0.

Problem B: if I ask openhab for a really long period, example:

counterGamingHours.deltaSince(now.withTimeAtStartOfDay.withDayOfMonth(1),"influxdb")

I would get… NULL. Thats bug number two, which happens because in influxdb I have only values starting from March 2 (not earlier), AND BECAUSE openhab is using “<” query (quoted below) to find boundary, it returns nothing. Proper behaviours would be to use >= in this query.

Actual problematic query in case B is:

select value from "autogen"."counterGamingHours" where  time < 1583017200s  ORDER BY time DESC limit 1

I’m quite baffled that those bugs were not found/reported earlier, as literally every deltaSince returns wrong data today.

At least as it seems to happen on my system, so could you please shed some light or show some mistakes in my thinking?

I don’t think that’s a bug.
deltaSince should reach back to the point in time that you specify, and then tell you the difference between then and now.
If there’s no data for that exact instant, it would use the next earlier data (on the assumption that it didn’t change until the instant you specified).

You told us that you have no data for the 1st (and none before that either).
Important - “no data” is not zero.
Having data from after the 1st, 00:00 is irrelevant here.
deltaSince the 1st is undefined, uncalculatable, since there is no data.
Closest answer to undefined it can give is null.

Hmmm, I kind of disagree :slight_smile:
Let’s start with separating both bugs (A and B) and discuss B, which I understand you’re discussing.

Facts:

  • yes, no data before march 1st
  • lots of data afterwards.
  • I ask for DELTA since then
  • no data before, definitely means, NO impact on delta calculation, as delta per definition requires data (for mathematical delta, you’re wrong with ‘no data’ is not zero. for delta, no data = no impact on delta).

Current approach raises number of questions:

  • why not exactly euqally ‘timepoint’ ? (=)
  • why not < ‘timepoint -1 microsecond’ ?
  • why using (value) of previous timepoint which might be 4 DAYS ealier if there could be another one, 1 seconds PAST the timepoint? /consider that I have one value on Feb 25th)

Answer to them is ‘because thats how deltaSince is hardcoded’

Correct approach (>=) have in my opinion advantages:

  • addressing ability to ask for deltaSince, while NOT KNOWING starting point. Curent proposal does not allow for this.
  • ability to have proper value in number timeseries (which is all for ‘delta’ usage)
  • abilityt to properly solve off-by one case (problem A).

deltaSince(an instant in time) does exactly what it says.
If it doesn’t have data for that instant, whether it was recorded a second or a month before that instant, then there is no answer to be provided.

Example data
01:00 3.5
02:00 6.0
03:00 3.5
queries run at 03:30
deltaSince(00:30) = null, cannot be calculated, no idea what value was
deltaSince(01:30) = 0.0 past is assumed 3.5, now is assumed 3.5
deltaSince(02:30) = -2.5 past is assumed 6.0, now is assumed 3.5
deltaSince(03:15) = 0.0 past and now are 3.5

You can accept that or not, but it is a perfectly logical and consistent way to handle the query. Perhaps you misunderstand “delta since”, it does not consider any data “inbetween”, only the past instant and now.
It’s not supposed to tell about any excursions in between, that would be a different function like maxSince.

I don’t know, but …

timepoint + 1 microsecond would be better. Remember this persistence service works with many databases, they don’t all have the same granularity in timestamping.
In practice any data you previously wrote to the database would lag a microsecond or few thousand behind the measurement or calculation. We’re really not worried about that kind of accuracy.

That’s your problem entirely, for not recording the data - if you don’t trust that it remains valid between records. It’s completely a choice for you to record data every millisecond, or only when it changes, or …
You get what you ask for here.

Uh, no, the word mathematical should not be even be in the same sentence there.

Please, dont discuss in an unconstructive way. It’s better for the community to discuss the substance in a civil way.

As for the matter.

  1. It’s a mistake to assume that there is only one type of values stored in persistence (and operated by deltaSince on). This is not true. There are for example incrementing counters and more. They have different expected answer by deltaSince than simple ‘current state value’.

  2. looking at data in my original post, and naming them:
    A -> being period ealier then requesterd deltaSince(TIMESTAMP) , here < March 1st
    B -> TIMESTAMP requested
    C -> evth between now and TIMESTAMP
    deltaSince is an interest in data C and B. Not A. Not presence or lack of presence of data in A. It simply B and C.
    Hence basing result in all types of data values on fact of lack of data in A is a mistake. Especially in cases when you can give answer without looking at A.

  3. current scenario does not allow using deltaSince if you don’t know starting moment. It’s a quite a big shortcoming.

  4. AFAIK (more tests needed) this result in bug described by me as problem A.

  5. Current scenario will give wrong answer if there is a datapoint exactly at B because usage of ‘<’;

I really think you don’t understand what deltaSince is designed to return. I don’t know how to put that more constructively. I have tried to illustrate what it does, and why.

Maybe it’s worth a another example. Using only historicState() to fetch the data from a single moment.
data stored
– no data at all stored before this time
05:00 32674
05:30 99
08:00 457
results
historicState(last year) = null
historicState(01:00) = null
historicState(04:59) = null
historicState(05:31) = 99
historicState(07:00) = 99
historicState(07:59) = 99
Can you describe what results you would expect here?

All that deltaSince does is exactly the same as above, and compares that single result with “now”.

I haven’t really understood your “problem A”, so yes some practical examples of that one would be good.

It returns null if there is no data to base an answer on.
You can easily detect that there is no data by detecting the null return.
What would you like it to do instead?

EDIT - I had a think about this one, and I don’t think there is an easy way to determine from a rule when you started using a database.

You may well be right. It’s not going to matter much the way most people use persistence, but it might be significant if you are persisting say everyHour datapoints. Always room for improvement.
EDIT - it would be interesting if you looked to see if historicState() uses the same query ?