how valid is the Riemann Sum when using rrdj4 for persistence? I mean when data gets removed over time / states get merged(?), the Riemann Sum will become more and more unreliable, no? Any hint on how to configure rrdj4 to work well for at least 1month worth of data?
As with anything it depends. If the data has wide swings in values over relatively short periods of times, the Riemann sum will be more different from the sum from the original more precise data mainly because those spikes will get smoothed out over time.
However, if you have a constant value over that time period the sum should always be the same as the original because the average of the same value is the same value.
It also depends on how often the value updates. If it only updates once every 15 minutes or longer, there should be no problem with rrd4j as the default strategy keeps one value every 15 minutes for up to a year.
From the docs:
granularity of 10s for the last hour
granularity of 1m for the last week
granularity of 15m for the last year
granularity of 1h for the last 5 years
granularity of 1d for the last 10 years
You can create your own custom configurations but it’s too complicated for a tl;dr on the forum. Please see the rrd4j docs for details on how to create a custom configuration for rrd4j.
Or consider using an external database or SQLite for those Items where you need precise data from time periods older than a week ago.
thanks for explaining it very well. In my particular case I’m persisting PV power data (kW) in ~30sec intervals - depending on Modbus timings. The range can vary from 0kW to >8kW. And since it’s PV it can change rapidly with clouds. Riemann Sum is calculating Wsec which I have to divide by 3600*1000 to get kWh. So the swings are pretty wide, I think.
Speaking about rrd4j, is there a strategy on monthly base? I guess what I want is granularity of 30sec for the last month. Yes, I’m willing to spend more disk space on this and even adjust the granularity for last week/day/hour. Will read it up in the docs.
The buckets are not predefined. The one-week bucket is just how OH configured it. You can define them however you want.
It’s not super straight forward but everything you need is there in the docs to define a custom set of buckets and how the data is decimated as it ages from one bucket to the next.