[SOLVED] RRD4J and Granularity / Number of Datapoints

jewesta · October 26, 2018, 12:13pm

Hi all!

I am a bit lost in terms of how exactly rrd4j works and what the default configuration is. Maybe someone can help me out. I’m on a very recent (this week) snapshot build of OH2 and have set up rrd4j persistence. I installed the binding via PeperUI and configured rrd4j.persist in the following way:

Strategies {
    everyMinute : "0 * * * * ?"
}

Items {
    PersistEveryMinute* : strategy = everyMinute
}

I did not touch rrd4j.cfg which looks like this:

# configure specific rrd properties for given items in this file.
# please refer to the documentation available at
# https://www.openhab.org/addons/persistence/rrd4j/
#
# default_numeric and default_other are internally defined defnames and are used as
# defaults when no other defname applies

#<defname>.def=[ABSOLUTE|COUNTER|DERIVE|GAUGE],<heartbeat>,[<min>|U],[<max>|U],<step>
#<defname>.archives=[AVERAGE|MIN|MAX|LAST|FIRST|TOTAL],<xff>,<steps>,<rows>
#<defname>.items=<comma separated list of items for this defname>

My items file looks like this:

Group PersistEveryMinute "Persist Every Minute"
...
Number WashingMachinePower "Leistung Waschmaschine" (PersistEveryMinute) {channel="avmfritz:FRITZ_DECT_200:<...some_id...>:power"}

This basically works. I can see a WashingMachinePower.rrd file in /userdata/persistence/rrd4j/. I can also query the persistence object in rules, e.g.:

val Number avgPower5min = WashingMachinePower.averageSince(now.minusMinutes(5), "rrd4j");

This gives me numbers that seem sensible (so not 0 and not always the same value and about what would be expected).

However, I noticed that if I query the history of my samle item using the REST API I am getting unexpected results.

http://<oh2>:<port>/rest/persistence/items/WashingMachinePower?serviceId=rrd4j

Results:

{
  "name": "WashingMachinePower",
  "datapoints": "360",
  "data": [
    {
      "time": 1540466640000,
      "state": "0.0"
    },
    {
      "time": 1540466880000,
      "state": "0.0"
    },
    // ... many more "0.0" values ...
    {
      "time": 1540539360000,
      "state": "0.0"
    },
    // followed by about 50 values that make sense
    // ...
    {
      "time": 1540552320000,
      "state": "57.43000000000001"
    },
    {
      "time": 1540552560000,
      "state": "72.345"
    },
    {
      "time": 1540552800000,
      "state": "110.07000000000001"
    }
  ]
}

I noticed that a newly created .rrd file contains 0 (zero) datapoints and then grows gradually. (1, 2, 3… etc.). But it does not grow by one datapoint per minute (as configured) but at a lower frequency. How can this be? Is this depending on the update frequency of the item? So - no update for 5 minutes, no new entry persisted for 5 minutes?

I suspect WashingMachinePower.rrd grew to 360 datapoints and then stopped, because it is no longer growing. Why? 360 cannot possibly be the correct number for the documented total amount of datapoints as can be found here. Is this outdated in terms of the default datapoint strategy and if yes, what is the default in OH2?

From what I understand by reading the docs, there should be one value every minute for 4 to 8 (depending on what doc I read ;-)) hours. But looking at the API output (360 datapoints), this cannot be.

Furthermore, the values persisted are not what I would expect. The item WashingMachinePower changes about every two minutes according to the logs (ItemStateChangedEvent). I would have expected to see the logged values in the rrd4j data file because (as I understand it) a snapshot of the current value of WashingMachinePower is taken every minute and persisted. Not every value would be persisted, of course, but those that are persisted on a minute-by-minute basis should be found in the logs. (At least that’s what I assumed.) But the values persisted slightly differ from what I can see in the logs. Does rrd4j somehow record averages even for the most recent values recorded on a minute-by-minute basis?

It would be great if someone could shed some light on how exactly rrd4j in OH2 is supposed to behave (by default). Is there a file that contains the default configuration, or is this hardcoded?

Thanks a lot in advance!

Jens

opus · October 26, 2018, 12:58pm

The calculation of how many entries you will get for a rrd4j database is difficult and heavily dependent on your setup!
Since you use the default setting for the numeric entries the archives should be set to:

"AVERAGE,0.5,1,480:AVERAGE,0.5,4,360:AVERAGE,0.5,14,644:AVERAGE,0.5,60,720:AVERAGE,0.5,720,730:AVERAGE,0.5,10080,520");
(Copied out of the code, not from the documentation!)

Which means you will have 480 entries covering the last 480 minutes (archive 1), followed by 360 entries covering the last 4 * 360 minutes i.e. 24 houres (archive 2), followed by 644 entries covering the last 14 * 644 minutes i.e. 9016 minutes (archive 3) …

Your REST API request had no end- or starttime set, so it will make a request for to last 24 houres. From a rrd4j database you will be prompted with all entries of the first archive that covers the requested timeframe, in your case 24 houres. So your return will be from archive 2. This archive does not hold an entriy for each minute, its step size is 4, so you have one value for 4 minutes(calculated by the average of the four minute-wise entries). That correlates with the time-difference of consecutive readings (240000) and the number of entries in that archive (360). Since you just started this database, only those entries with actual readings hold data other then zero (the youngest datapoints are at the END!)
If you want to get the minute-wise entries you would need an REST API request which covers a timeframe of max 480 minutes starting from now!

jewesta · October 26, 2018, 3:37pm

Thank you very much, @opus! This explains many things. The AVERAGE flag for the 480 minute archive explains the values I’m seeing. And I did not know about the behavior of the REST API in concert with rrd4j queries and timeframes (or rather no timeframe…).

Again, thanks!

Jens

opus · October 26, 2018, 3:48pm

That is more specific to rrd4j as for the REST API. Using other database-types it would be different, although the time-frame selection is the same (like the default is 24 houres from now).

I am a fan of rrd4j! It doesn’t take much resources and you can customize the size of the archives easily. I’m only interested in the data of the last year, with less accuracy for older data. So I’m persisting values every minute for a whole day, next step with less accuracy is for a week, followed by the month and year. Running smooth on my Raspi3B along with OH!

jewesta · October 26, 2018, 4:53pm

I only started using rrd4j this week. It’s one of those projects that gets postponed for months or years until the amount of work to implement it gets less than the work and pain caused by the inadequate workarounds.

OpenHAB makes sensational progress and getting persistence up and running was much easier than expected. Totally agree on the usefulness of rrd4j. It is very similar to data backup strategies where datapoints are further apart the farther back in time they are. My posting wasn’t meant to criticise rrd4j, I just got confused about some implicit stuff happening behind the scenes.

I now also use MariaDB to log power and gas meter data to an SQL database once a day.

And I use mapdb persistence to restore item values after restarts. Which quite frankly I now realise is a necessity to avoid utter chaos after a reboot. Don’t know how I could do without that before.

So that’s now three different persistence startegies. One for every specialist purpose. Nice going, openHAB.