Influxdb integral() function

barneyd · March 21, 2021, 3:20pm

I’m trying to sum up on time for a switch. As ON=1 and OFF=0, the total on time should be the inegral (area under the graph).

This is my data for the last 24hrs using this query

SELECT value FROM "openhab_db"."autogen"."Oculus" WHERE time >= now()-1d

If I now try to integrate it with this query:

SELECT integral(value,1h) FROM "openhab_db"."autogen"."Oculus" WHERE time >= now()-1d

I get 10.60 hours:

However, if I calculate it manually, it totals 6.14 hours. I have confirmed this by downloading the data as elapsed time between points and summing up all of the “on” times.

I can’t see why the integral function isn’t giving me the total on time.

Going forward, I’d like to use a similar process to sum up energy use by integrating the area under an instantaneous usage graph, possibly in that case with interpolation but I don’t want to do that until I’ve got the simple on/off version working the way I expect.

BobMiles · March 21, 2021, 4:33pm

Hi,
that’s interesting… I have been thinking about your problem for a while, but here are so questions for you:
When are your item states persisted? EveryChange I suppose? What lies before 15:39? Could it be that the time before your first timestamp is counted into the integral (e. g. now-1d until 15:39)?

barneyd · March 21, 2021, 6:02pm

Thanks for the response. Yes, every change:

 Oculus: strategy = everyChange

The previous entry is a value 0 at 10:13:32
Prior to that is another 0 at 08:41:19 the previous day. Usually the values alternate between 0 and 1 but there are a couple of 0 entries due to reboots.

I assume that the integral function evaluates the value at the start of the query interval so it can properly calculate the area, rather than just starting from the first data point within the range. The query response is timestamped at the start time of the query period (even if the end time isn’t now).

I have another example where the integrated time is lower than I’d expect:

(~4h 25min) integrating to

2.73 hours isn’t the sum of the on times, it isn’t either of the on times and the difference between the actual and expected values doesn’t relate to the time between the query start and the first entry, nor the last entry and the query end.

pacive · March 21, 2021, 7:22pm

I think that influxdb tries to connect the points with straight lines before calculating the integral, so the value between each point that is different from the previous will always be (time between points) * 0,5.

In your example the time difference between the first point and the last is 21.2h, and 21.2 * 0,5 = 10.6

barneyd · March 21, 2021, 8:28pm

Your answer makes sense in that it works for both of my examples but that means integral() as a function is useless.

I could understand if it interpolated between the points in the database and in fact there is an option to enable this (it’s off by default) but why you would ever want to ignore real points in favour of a linear interpolation is beyond me.

If I look at the example, it works the way I expect, so I’m still not quite sure what is going on.

pacive · March 21, 2021, 9:09pm

It does work for most real world applications, where the more general formula would be mean(p1, p2) * time_diff(p1, p2) linear interpolation is in most cases the best one for graphs. Using the value from the previous point all the way up to the next wouldn’t make sense in most cases (think e.g temperatures, water level measurements etc).

pacive · March 21, 2021, 9:32pm

If you just want to get the sum for the last 24h use

SELECT sum(value) / 60 FROM "openhab_db"."autogen"."Oculus" WHERE time >= now()-1d GROUP BY time(1m), fill(previous)

Edit: Perhaps won’t work, might be that it returns points for each minute instead. I did have a similar function previously (calculated power consumption in kWh from a set of kW values) but have migrated to influxdb 2 and flux queries.

barneyd · March 21, 2021, 9:33pm

I see what you mean. I was thinking it was doing mean (p1,p12) * time_diff(p1, p12) in other words only using the first and last points. It fact it is arithmetically the same as interpolating between each of the points because the values are either 1 or 0.

It would still be useful if it extended the integration range to the edges of the time range because the result is unpredictable for an irregular time series.

barneyd · March 21, 2021, 10:04pm

Thanks for trying. Doesn’t seem to be working. Firstly there’s an error message only time() calls allowed in dimensions that disappears when I remove the fil(). The result then is that only the minutes containing a change have a result (either 1 or 0), the rest are NULL.

I could change the persistence strategy to store a result every minute and just sum the values but feels like a waste of storage. Alternatively I could write a rule to process the table of results from the straight (non-aggregating) query but that feels a bit like giving up on something that should be possible.

system · May 2, 2021, 2:04pm

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.