Semantic of sumSince - have strange results

Hi,

I am a bit puzzled concerning the results I get by using sumSince. I have a rain gauge and wanted to store aggregated values for varying time periods in items.
Below is the excerpt of the rule doing the aggregation:
`
rule "RainFall GA WeatherStation Update"
when
Item RainFall_GA_WeatherStation received update
then

    RainFall_Last24h_GA_WeatherStation.postUpdate(RainFall_GA_WeatherStation.sumSince(now.minusHours(24)))
                      
    RainFall_Last12h_GA_WeatherStation.postUpdate(RainFall_GA_WeatherStation.sumSince(now.minusHours(12)))
   
    RainFall_Last3h_GA_WeatherStation.postUpdate(RainFall_GA_WeatherStation.sumSince(now.minusHours(3)))
                 
   RainFall_Last1h_GA_WeatherStation.postUpdate(RainFall_GA_WeatherStation.sumSince(now.minusHours(1)))
                  
   RainFall_Last1w_GA_WeatherStation.postUpdate(RainFall_GA_WeatherStation.sumSince(now.minusWeeks(1)))
  
   RainFall_Last1m_GA_WeatherStation.postUpdate(RainFall_GA_WeatherStation.sumSince(now.minusMonths(1)))

end

`
The item RainFall_GA_WeatherStation is persisted using rrd4j.

I would now expect that the values do increase monotonic, thus the aggregation holding the sum for the month is the largest. However, this is not the case: The monthly value shows 7.6 while the 24hours value show 814.9. Probably, worthwhile to mention is that I have values for only around 24 hours so far.

I think I understood something fundamentally wrong or did some other silly mistake.

The way that rrd4j works is it saves every value for a time. As those values get older it “compresses” the data by averaging a group of values and replacing them with that average. So as your data gets older it becomes more and more sparse. For example, it may store every minute for a week, then go to one value every 15 minutes for a month, then go to every hour several months back and eventually one value a day for a year back.

Given this behavior I’m guessing that it will be doing some sort of averaging and math in the background when you query for older data and because you only have 24 hours of data that averaging is bringing your sum down.

I would expect that this weirdness will clear itself up as your rrd4j database fills up over time but perhaps db4o would be a better choice in this instance.

Thanks Rich.

After reading I realized that rrd4j will average the values by default which will anyway not work with a rain gauge where need to total up the rain volume.
I saw that OpenHab now supports the definition additional rrd4j datasources where I can redefine the consolidation function to TOTAL. I will give that first a try.

If you get that working please post your config here. Lots of people face challenges with using rrd4j and seeing some more examples about how people change the configs would be helpful.

Sure, I will do. But first I need to run it for a while to see whether it works.

Ok. So here is the config which seem to work for me. Disclaimer: I am not an RRD4j expert. So my calculations below might be wrong. However, the setup seem to work for me until now.

So to give the reader the full picture.

I installed a Rain Gauge which sends me the for a given time unit how often it was tripped which equates to the amount of rain fall.

So whenever I received an update from that rain gauge it just depicts the amount of rain for the around 2 minutes.
What I wanted to show however, is the amount of rain fallen in the last hour, last three hours, last day, last week and month. I am using rrd4j in my configuration.
By default Openhab creates a datasource which uses as consolidation function AVERAGE which stores averages in the RRD4j archives which is obviously not going to work for my use case.
From Openhab 1.7 it is possible to define additional archives which is what I did.
Below the configuration is shown which I put into openhab.cfg. Key to that are the definitions of the archives.

rrd4j:RainFall.def=GAUGE,3600,U,U,300 rrd4j:RainFall.archives=TOTAL,0.5,3,192:TOTAL,0.5,288,365:TOTAL,0.5,105120,30 rrd4j:RainFall.items=RainFall_GA_WeatherStation
The first line defines the datasource which is named RainFall (this name is only used in this config file).
For the raw data I use GAUGE. I extended the heartbeat period to 60 minutes, since my sensor might not send data in short intervals. Usually, I will have at least one value within 60 minutes.
Min and Max values are set to Unknown (for the minimum probably also 0 would be ok). 300 is the step size for that datasource which in that case equates to 5 minutes.
I have defined three archives which need to be placed on one line separated by ':'
Important: All archives are set to use the TOTAL function as consolidation functions which means data points are summed up which is exactly what I want to achieve with my Rain Gauge.

The first archive holds values for 48 hours:
The step size is set to 3 multiplied by the step size (300) = 900 * number of slots (192) = 172800 secs -> 48 hours

Similarly, the second archive defines the steps as 288 with 365 slots which equates to
300288365 = 730 days or 2 years of data. Each slot holds the aggregated value of 24 hours.

The third archive defines the step size as 105120 with 30 slots. Each slot aggregates data for 105120*300 seconds which amounts to 1 year. Having 30 slots, this archive is able to store aggregated data on a year level for 30 years.

1 Like