RRD4J database ignoring data older than 24h

  • Platform information:
    • Hardware: RPi4/4GB/32GB
    • OS: Debian Linux
    • Java Runtime Environment: OpenJDK Temurin 11
    • openHAB version: 3.3.0
  • Issue of the topic: RRD4J database ignoring data older than 24h

For some time now I’m trying to figure out an issue with my persistence configuration. I’m logging temperature and humidity data and so far used an SQLite DB. Since I don’t need all data points using RRD4J seemed like a good alternative. However I can’t seem to make the service log more than 24h of data. Every chart I add to my sitemap and also investigating the .rrd files themselves results in the oldest data being exactly 24h old. Running the SQLite version in parallel I get continous readings and have data reaching back over a year now.
Please have a look at my configuration in hopes of finding some kind of error. The following is an excerpt from the log set to debug for the RRD4J logger. Attached farther down are my current configuration files.

20:36:53.517 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'Geckocam_t' as value '26.6' in rrd4j database
20:36:53.522 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae8068569_t' as value '26.097' in rrd4j database
20:36:53.527 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae806d8ab_h' as value '78.72' in rrd4j database
20:36:53.532 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae8068569_h' as value '50.695' in rrd4j database
20:36:53.537 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae807011c_t' as value '24.868' in rrd4j database
20:36:53.543 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae807011c_battery_warn' as value '0.0' in rrd4j database
20:36:53.547 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae8068569_battery_warn' as value '0.0' in rrd4j database
20:36:53.552 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae80687dd_t' as value '25.83' in rrd4j database
20:36:53.558 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae8068634_h' as value '53.52' in rrd4j database
20:36:53.562 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'Geckocam_h' as value '68.0' in rrd4j database
20:36:53.566 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae807011c_h' as value '52.16' in rrd4j database
20:36:53.570 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae8068634_t' as value '25.49' in rrd4j database
20:36:53.575 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae806d8ab_t' as value '25.72' in rrd4j database
20:36:53.581 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae806d8ab_battery_warn' as value '0.0' in rrd4j database
20:36:53.585 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae80687dd_battery_warn' as value '0.0' in rrd4j database
20:36:53.590 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae8068634_battery_warn' as value '0.0' in rrd4j database
20:36:53.594 [DEBUG] [rd4j.internal.RRD4jPersistenceService] - Stored 'S2c3ae80687dd_h' as value '53.904' in rrd4j database

rrd4j.cfg

#one value every 600s
ambient_sensors_t.def=GAUGE,660,U,U,600
#store duration@intervals: 144x600s=24h@10min, 24h@1h, 7d@3h, 30d@24h, 12M@~30d
ambient_sensors_t.archives=LAST,0.9,1,144:AVERAGE,0.9,6,24:AVERAGE,0.9,18,56:AVERAGE,0.9,144,30:AVERAGE,0.9,4380,12
ambient_sensors_t.items=S2c3ae806d8ab_t,S2c3ae80687dd_t,S2c3ae8068634_t,S2c3ae8068569_t,Geckocam_t,S2c3ae807011c_t

ambient_sensors_h.def=GAUGE,660,U,U,600
ambient_sensors_h.archives=LAST,0.9,1,144:AVERAGE,0.9,6,24:AVERAGE,0.9,18,56:AVERAGE,0.9,144,30:AVERAGE,0.9,4380,12
ambient_sensors_h.items=S2c3ae806d8ab_h,S2c3ae80687dd_h,S2c3ae8068634_h,S2c3ae8068569_h,Geckocam_h,S2c3ae807011c_h

rrd4j.persist

Strategies {
	//cron expression "sec min h dom m dow"
	everyNinthMinute : "* 0/9 * * * ?"
}

Items {
	gAmbientSensorPersist* : strategy = everyNinthMinute, restoreOnStartup
}

sensors.items

Group    gAmbientSensor                 "Umgebungssensoren"
Group    gAmbientSensorPersist
Group    gAmbientSensorTemp
Group    gAmbientSensorHumid
Group    gAmbientSensorProxies
Group    gAmbientSensorBatteryWarn
Group    gAmbientSensorBatteryUpdate

Group    gThermostatBath       "Thermostat Bad"        {ga="Thermostat", alexa="Thermostat"}
Group    gThermostatKitchen    "Thermostat Kueche"     {ga="Thermostat", alexa="Thermostat"}
Group    gThermostatSleep      "Thermostat SchlaZi"    {ga="Thermostat", alexa="Thermostat"}
Group    gThermostatLiving     "Thermostat WoZi"       {ga="Thermostat", alexa="Thermostat"}
Group    gThermostatChild      "Thermostat KiZi"       {ga="Thermostat", alexa="Thermostat"}

Number      S2c3ae806d8ab_t               "Bad T [%.1f]"                                     (gAmbientSensor, gAmbientSensorTemp, gAmbientSensorPersist, gThermostatBath)     {ga="thermostatTemperatureAmbient", alexa="CurrentTemperature"}
Number      S2c3ae806d8ab_t_proxy         "Bad T Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                          {channel="mqtt:topic:S2c3ae806d8ab:temperature"}
Number      S2c3ae806d8ab_h               "Bad H [%.1f]"                                     (gAmbientSensor, gAmbientSensorHumid, gAmbientSensorPersist, gThermostatBath)    {ga="thermostatHumidityAmbient", alexa="CurrentHumidity"}
Number      S2c3ae806d8ab_h_proxy         "Bad H Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                          {channel="mqtt:topic:S2c3ae806d8ab:humidity"}
DateTime    S2c3ae806d8ab_last_update     "Bad letztes Update [%1$tA %1$td.%1$tm. %1$tT]"    (gAmbientSensor, gAmbientSensorBatteryUpdate)
Switch      S2c3ae806d8ab_battery_warn    "Bad Batteriewarnung"                              (gAmbientSensor, gAmbientSensorPersist, gAmbientSensorBatteryWarn)

Number      S2c3ae8068634_t               "Küche T [%.1f]"                                     (gAmbientSensor, gAmbientSensorTemp, gAmbientSensorPersist, gThermostatKitchen)     {ga="thermostatTemperatureAmbient", alexa="CurrentTemperature"}
Number      S2c3ae8068634_t_proxy         "Küche T Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                             {channel="mqtt:topic:S2c3ae8068634:temperature"}
Number      S2c3ae8068634_h               "Küche H [%.1f]"                                     (gAmbientSensor, gAmbientSensorHumid, gAmbientSensorPersist, gThermostatKitchen)    {ga="thermostatHumidityAmbient", alexa="CurrentHumidity"}
Number      S2c3ae8068634_h_proxy         "Küche H Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                             {channel="mqtt:topic:S2c3ae8068634:humidity"}
DateTime    S2c3ae8068634_last_update     "Küche letztes Update [%1$tA %1$td.%1$tm. %1$tT]"    (gAmbientSensor, gAmbientSensorBatteryUpdate)
Switch      S2c3ae8068634_battery_warn    "Küche Batteriewarnung"                              (gAmbientSensor, gAmbientSensorPersist, gAmbientSensorBatteryWarn)

Number      S2c3ae8068569_t               "SchlaZi T [%.1f]"                                     (gAmbientSensor, gAmbientSensorTemp, gAmbientSensorPersist, gThermostatSleep)     {ga="thermostatTemperatureAmbient", alexa="CurrentTemperature"}
Number      S2c3ae8068569_t_proxy         "SchlaZi T Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                           {channel="mqtt:topic:S2c3ae8068569:temperature"}
Number      S2c3ae8068569_h               "SchlaZi H [%.1f]"                                     (gAmbientSensor, gAmbientSensorHumid, gAmbientSensorPersist, gThermostatSleep)    {ga="thermostatHumidityAmbient", alexa="CurrentHumidity"}
Number      S2c3ae8068569_h_proxy         "SchlaZi H Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                           {channel="mqtt:topic:S2c3ae8068569:humidity"}
DateTime    S2c3ae8068569_last_update     "SchlaZi letztes Update [%1$tA %1$td.%1$tm. %1$tT]"    (gAmbientSensor, gAmbientSensorBatteryUpdate)
Switch      S2c3ae8068569_battery_warn    "SchlaZi Batteriewarnung"                              (gAmbientSensor, gAmbientSensorPersist, gAmbientSensorBatteryWarn)

Number      S2c3ae807011c_t               "WoZi T [%.1f]"                                     (gAmbientSensor, gAmbientSensorTemp, gAmbientSensorPersist, gThermostatLiving)     {ga="thermostatTemperatureAmbient", alexa="CurrentTemperature"}
Number      S2c3ae807011c_t_proxy         "WoZi T Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                            {channel="mqtt:topic:S2c3ae807011c:temperature"}
Number      S2c3ae807011c_h               "WoZi H [%.1f]"                                     (gAmbientSensor, gAmbientSensorHumid, gAmbientSensorPersist, gThermostatLiving)    {ga="thermostatHumidityAmbient", alexa="CurrentHumidity"}
Number      S2c3ae807011c_h_proxy         "WoZi H Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                            {channel="mqtt:topic:S2c3ae807011c:humidity"}
DateTime    S2c3ae807011c_last_update     "WoZi letztes Update [%1$tA %1$td.%1$tm. %1$tT]"    (gAmbientSensor, gAmbientSensorBatteryUpdate)
Switch      S2c3ae807011c_battery_warn    "WoZi Batteriewarnung"                              (gAmbientSensor, gAmbientSensorPersist, gAmbientSensorBatteryWarn)

Number      S2c3ae80687dd_t               "KiZi T [%.1f]"                                     (gAmbientSensor, gAmbientSensorTemp, gAmbientSensorPersist, gThermostatChild)     {ga="thermostatTemperatureAmbient", alexa="CurrentTemperature"}
Number      S2c3ae80687dd_t_proxy         "KiZi T Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                           {channel="mqtt:topic:S2c3ae80687dd:temperature"}
Number      S2c3ae80687dd_h               "KiZi H [%.1f]"                                     (gAmbientSensor, gAmbientSensorHumid, gAmbientSensorPersist, gThermostatChild)    {ga="thermostatHumidityAmbient", alexa="CurrentHumidity"}
Number      S2c3ae80687dd_h_proxy         "KiZi H Proxy [%.1f]"                               (gAmbientSensor, gAmbientSensorProxies)                                           {channel="mqtt:topic:S2c3ae80687dd:humidity"}
DateTime    S2c3ae80687dd_last_update     "KiZi letztes Update [%1$tA %1$td.%1$tm. %1$tT]"    (gAmbientSensor, gAmbientSensorBatteryUpdate)
Switch      S2c3ae80687dd_battery_warn    "KiZi Batteriewarnung"                              (gAmbientSensor, gAmbientSensorPersist, gAmbientSensorBatteryWarn)

my.sitemap

sitemap my label="OpenHAB" {
	Frame label="RRD4J" icon=line
	{
		Chart item=gAmbientSensorTemp label="1h" refresh=30000 period=h legend=true service="rrd4j"
		Chart item=gAmbientSensorTemp label="4h" refresh=30000 period=4h legend=true service="rrd4j"
		Chart item=gAmbientSensorTemp label="1D" refresh=30000 period=D legend=true service="rrd4j"
		Chart item=gAmbientSensorTemp label="2D" refresh=30000 period=2D legend=true service="rrd4j"
		Chart item=gAmbientSensorTemp label="3D" refresh=30000 period=3D legend=true service="rrd4j"
		Chart item=gAmbientSensorTemp label="1M" refresh=30000 period=M legend=true service="rrd4j"
	}
}

Your configured sample intervall is 600 seconds (10minutes) . That sets rrd4j to take a new value from openHAB every 10 minutes.
Your archives are set like
Archive 1: create a stored value from 1 sample using the last value. 144 boxes are created ( that covers 24 houres, 14410 minutes).
Archive 2:
create a stored value from 6 samples using the average of those samples. 24 boxes are created ( 24 houres are coverd, 24
610 minutes).
Archive 3:
create a stored value from 18 samples using the average of those samples. 56 boxes are created ( 168 houres are coverd, 18
56*10 minutes).

In other words your archive setup looks rather odd. Did you really want to use a sample intervall of 10 minutes. Your second archive cover the same timeframe as the first, that is useless for rrd4j!
For rhe missing data ( archive 3 and above should hold data for longer then 24 h), how long did you let the persistence run? The first value in archive 3 is stored after 3 houres. Did you check which values are in the different archives with the API-Exploerer?

Your used strategy (9 minutes) lets openHAB provide a new value to be persisted every 9 minutes. The above samples are taken from this provided value according the sample intervall setting. Using 60 seonds intervall with your strategy would store the same provided value until the next new value is provided. Using a smalker sample intervall with your strategy would keep a better granularity.

1 Like

The interval is based on the sensors measurement interval of 10 minutes. I know it’s recommended to set it to something under a minute but I didn’t want it to bother with duplicate values so much. Does it work the same in the case of RRD4J in that it ignores unchanged values just like usual items?

The archives were setup with the idea of having all measurements for a day and summarised ones for the week, month and year. Right now they also contain some leftovers from trying around.

I did check with the RRD4J inspector and found the archives to contain the last 24h at most as well as many NaN values. This is the reason for the high XFF setting.

1 Like

Your are using the RRDInspector, very good!
Don’t you really see no non NaN values in the higher archives?

As for rhe relation of sample intervall and used strategy:
openHAB does provide a value to be persisted as set by the strategy (everyXMinutes, onChange etc). In other words a change in state will only cause a new value to be provided if an onChange strategy is selected…
The sample intervall sets the timestep at which the (changed or unchanged) provided value is used by rrd4j.
I’d use a lower sample intervall, use one sample per box in archive one for something like an hour and create the following archives to span for your desired timeframes and granularities. Make sure that no two archives span the same timeframe, because only one of them will be used ( my guess the first one). Using the RRDInspector would be the only way to see stored values in such a (useless) archive.
This is my suggestion for an archive setup, such would NOT solve the missing data. Actually I have no clue on the reason!

I changed the strategy and archive configuration as advised, unfortunately to no avail. It doesn’t seem to mind overlapping archives but of course the point about wasted storage is valid.
What I did notice is the following:

  • Items which don’t get updated (mostly due to my sensors running out of battery power) keep their (constant, horizontal) chart line as long as no new data arrives. This might be unrelated to RRD4J but still.
  • I used to have an item in the archives configuration which was created by the web interface. I used it while building the last sensor and deleted it when transitioning its configuration to config files. The values that item provided where the only ones so far which were not affected by my 24h limit. The setup even worked after deleting the item, I guess its usage prevented it from getting properly deleted.

Maybe it is worth mentioning that I’m running OH as a docker container. Even though the machine and container are constantly online and there is no 24h thing happening I know of.

updated rrd4j.persist

Strategies {
	//cron expression "sec min h dom m dow"
	everyFithMinute : "* */5 * * * ?"
}

Items {
	gAmbientSensorPersist* : strategy = everyChange, everyFithMinute, restoreOnStartup
}

updated rrd4j.cfg

#one value every 600s
ambient_sensors_t.def=GAUGE,660,U,U,600
#store duration@intervals: 144x600s=24h@10min, 7d@3h, 30d@24h, 12M@~30d
ambient_sensors_t.archives=LAST,0.9,1,144:AVERAGE,0.9,18,56:AVERAGE,0.9,144,30:AVERAGE,0.9,4380,12
ambient_sensors_t.items=S2c3ae806d8ab_t,S2c3ae80687dd_t,S2c3ae8068634_t,S2c3ae8068569_t,Geckocam_t,S2c3ae807011c_t

ambient_sensors_h.def=GAUGE,660,U,U,600
ambient_sensors_h.archives=LAST,0.9,1,144:AVERAGE,0.9,18,56:AVERAGE,0.9,144,30:AVERAGE,0.9,4380,12
ambient_sensors_h.items=S2c3ae806d8ab_h,S2c3ae80687dd_h,S2c3ae8068634_h,S2c3ae8068569_h,Geckocam_h,S2c3ae807011c_h

As said before, I have only suggestions for the overall setup but I do have no clue on the 24h limit problem!
Could you post a .rrd file that is affected by the problem.

That’s ordinary enough. What would you have it say instead? “Zero” would be a lie, when there is no actual new information.

In general,openHAB persistence assumes the last known good value remains valid until the next good value. NULL or UNDEF states are never recorded in persistence.

People who don’t like this arrangement for cosmetic or other reasons generally arrange an ‘expire’ mechanism of one kind or another to set the Item state to whatever false value they prefer (often 0.0)

EDIT - to clarify, while openHAB never persists NULL/UNDEF, because of rrd4j pre-defined pigeonhole structure, rrd4j itself may populate any pigeonhole with NaN if there is no source data to calculate a value. All historic pigeonholes would be NaN at first creation, for example.

These are the rrd files of a single sensor. The inspector shows the 24h interval quite nicely although I’m a bit puzzled how it determines the accumulated values to be NaN.
File.io upload due to missing user privileges

These files are using the archive setup as of your initial post (with the two archives covering 24 houres).
The files do show data beyond the last 24 houres!

I tried to display older data with those files, but failed (even though data is in the archives). My present guess is that rrd4j “stumbles” over the two archives covering 24 houres. But that is a guess only!!
In order to create .rrd files using your updated .cfg you need to delete the old .rrd file, That is because the .cfg file is read only when creating the .rrd file. Presently I am not sure if a restart of openHAB or the persistence bundle is needed as well.

The reason why NaNs values are persisted can’t be determined afterwards with the .rrd file only. The openhab. and events.log might give a clue (no values persisted during those timeframes?, openhab not running,…).

Did some more research:

IMHO it is openHAB that stumbles while working with a .rrd file that has less different archive timeframes then archives!

The number of different timeframes are detected ( in your case D,W,M,Y) as well as the number of archives ( in your case 5). When fetching data for a week, openhab is looking for the second timeframe and therefore using the second archive ( which is wrong because the second archive has a timeframe of 24h).
You can verify that by selecting a year as timeframe, the corresponding (5th) archive has no data, however data is shown (the data from the 4th archive).

2 Likes

Thank you for investigating this further. In the meantime I have deleted the rrd files with Openhab shutdown. It is now restarted and already gathering data although results will only show after 24h naturally.

Here it is after deleting the old files and changing the archive count. No change I can see though, any ideas?

I have no idea what you are missing!
The uploaded file shows 4 archives (144x600s=24h@10min, 7d@3h, 30d@24h, 12M@~30d)
The first archive covers 24 houres and does have all the expected data points stored.
The second archive has datapoints starting at Aug, 8th 23:00 until Aug, 9th 20:00
The third archive has a single datapoint at Aug, 9th 02:00
The last archive has no datapoint (yet).
That is all as expected (for data which was collected for a bit less then 24 houres) and will show on the respective charts (D, W, M, Y), so what is the problem?
As said before I was able to show data in the higher archives (i.e. older then 24 houres) with your rrd file using a standard sitemap chart.

Additionally regarding the your question:

Since an .rrd file is created at its final size, the filelocation for each and every datapoint does exist from creation time. Reading the data from those points before a value has been stored there does reveal no numerical value.

To bring this to a conclusion. I finally realized my error in understanding the logic of the RRD4J database. Your description is on point and I can see it in the accumulated data, now about 2 weeks worth of it.
What I still cant seem to manage is a sitemap chart that actually shows the data. Mine never gets beyond showing 24h past which was the basis of my initial problem. It seems as if it just doesnt take the other archives into consideration, just the first.

 Frame label="RRD4J" icon=line
        {
//              Chart item=gAmbientSensorTemp label="24h" refresh=30000 period=D legend=true service="rrd4j"
//              Chart item=gAmbientSensorHumid label="24h" refresh=30000 period=D legend=true service="rrd4j"

                Chart item=gAmbientSensorTemp label="2D" refresh=30000 period=2D legend=false service="rrd4j"
                Chart item=gAmbientSensorTemp label="3D" refresh=30000 period=3D legend=false service="rrd4j"
                Chart item=gAmbientSensorTemp label="W" refresh=30000 period=W legend=false service="rrd4j"
                Chart item=gAmbientSensorTemp label="2W" refresh=30000 period=2W legend=false service="rrd4j"
        }

The questions is: How did you see the 2 weeks of data?
If you saw these 2 weeks of data using the RRD Inspector, in which archive did you see the data and which archives are in the rrd file?
Using the API Explorer from the ManUI you could have a look as well. openHAB would be using the same calls as the API Explorer, because of that I would expect that you see the same number of stored values for all the different periods you showed above. Note that the API Explorer does use the last 24 houres as a default period unless you specifiy somethig else.

What kind of chart do you get when calling for a month?

Did some more testing using your last posted rrd4j.cfg.

Suprisingly I get the same result as you. Although the RRDInspector does show data in all archives openHAB does fetch data only from the first archive. That happens when using a chart to display the data as well as when using the API Explorer.

I must have missed the point that only archive 1 was used when stating that:

So I prooved myself wrong on that (have been fooled by rrd4j again!)

I’ll have to dig deeper in order to understand the reason of that! Using an archive with a 60 second or less step-size doesn’t show that behaviour.

Another update:

The problem is/was the use of two different consolidation functions,
By the use of the same consolidation function for all archives the problem is gone.
Using a different one only for archive 1, which uses 1 sample per box, doesn’t make a difference anyhow (each consolidation function would return the same value).

It does work! The consolidation function needs to be the same for all archives. Maybe this should be mentioned in the documentation but at least I’m finally enjoying a working setup.
Thank you very much for all your help!

1 Like