All persistence stopped saving data suddenly (mapDB not active)

Hi there,
I’m running openHAB 4.0.2(now 4.0.3) on a Synology NAS DS716 in a docker container.

For some weeks I have now the problem that the RRD4J persistence stopped saving data suddently. After I restart openhab everything works fine for some hours and then it happens always again.

Here you the marked the time when I miss the values. Then I restarted and after a while values are missing again.

This happend for all kind of items.
This item I have choosen for debugging.

I don’t find an error in the openhab.log and added some info logs in my rules which are always get called when the item changes:

2023-10-15 15:33:04.118 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 15:33:04.119 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:9.6 LuftfeuchteRelativ_Outdoor:72
2023-10-15 15:33:04.121 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:9.6, LuftfeuchteRelativ:72, Taupunkt:4.8, LuftfeuchteAbsolut 6.6
2023-10-15 15:33:04.167 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 15:33:04.169 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:9.6 LuftfeuchteRelativ_Outdoor:72
2023-10-15 15:33:04.172 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:9.6, LuftfeuchteRelativ:72, Taupunkt:4.8, LuftfeuchteAbsolut 6.6
2023-10-15 16:33:04.626 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 16:33:04.627 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:9.1 LuftfeuchteRelativ_Outdoor:72
2023-10-15 16:33:04.630 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:9.1, LuftfeuchteRelativ:72, Taupunkt:4.3, LuftfeuchteAbsolut 6.4
2023-10-15 17:33:05.051 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 17:33:05.053 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:8.55 LuftfeuchteRelativ_Outdoor:76
2023-10-15 17:33:05.056 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:8.55, LuftfeuchteRelativ:76, Taupunkt:4.6, LuftfeuchteAbsolut 6.5
2023-10-15 17:33:05.206 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 17:33:05.207 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:8.55 LuftfeuchteRelativ_Outdoor:76
2023-10-15 17:33:05.209 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:8.55, LuftfeuchteRelativ:76, Taupunkt:4.6, LuftfeuchteAbsolut 6.5
2023-10-15 18:33:05.491 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 18:33:05.492 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:7.19 LuftfeuchteRelativ_Outdoor:76
2023-10-15 18:33:05.494 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:7.19, LuftfeuchteRelativ:76, Taupunkt:3.2, LuftfeuchteAbsolut 6
2023-10-15 18:33:05.641 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 18:33:05.642 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:7.19 LuftfeuchteRelativ_Outdoor:82
2023-10-15 18:33:05.644 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:7.19, LuftfeuchteRelativ:82, Taupunkt:4.3, LuftfeuchteAbsolut 6.4
2023-10-15 19:33:06.016 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 19:33:06.018 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:6.78 LuftfeuchteRelativ_Outdoor:83
2023-10-15 19:33:06.020 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:6.78, LuftfeuchteRelativ:83, Taupunkt:4.1, LuftfeuchteAbsolut 6.3
2023-10-15 19:33:06.090 [INFO ] [nhab.automation.script.file.klima.js] - Automatisierung TaupunktBerechnung Outdorr gestartet.
2023-10-15 19:33:06.091 [INFO ] [nhab.automation.script.file.klima.js] - TemperaturIst_Outdoor:6.78 LuftfeuchteRelativ_Outdoor:83
2023-10-15 19:33:06.094 [INFO ] [nhab.automation.script.file.klima.js] - Outdoor TemperaturIst:6.78, LuftfeuchteRelativ:83, Taupunkt:4.1, LuftfeuchteAbsolut 6.3

I see here and also in the event.log that the items gets changed always as it should. Just the rrd4j don’t get stored?

I found this topic with the exact same problem:

But the solution is not working for me because I don’t have mapDB installed.


Some test which I don’t completly understand? Are the tests all fine?
image


^

Anyone has an idea why my persistence is not working anymore?

cc InfluxDB Persistence stops permanently when Influx-DB-Server has outage for what is likely a similar issue.

I’m running openhab 4.0.3 now the problem is still there.

I did change all the ownership rights now manual like this:
root@openhab-4:/openhab# chown openhab:openhab -R *
The problem is still there.

I have a few of these in my log:

2023-10-21 11:01:12.709 [WARN ] [sistence.internal.PersistenceManager] - Querying persistence service 'rrd4j' to restore 'VS_Einfahrt_Briefkasten_AusrichtungX' takes more than 5000ms.
2023-10-21 11:01:25.073 [WARN ] [sistence.internal.PersistenceManager] - Querying persistence service 'rrd4j' to restore 'Licht_Diele_Decke_ZuletztGesehen' takes more than 5000ms.
2023-10-21 11:01:33.389 [WARN ] [sistence.internal.PersistenceManager] - Querying persistence service 'rrd4j' to restore 'Fenster_Werkstatt_Nord_Batterieladung' takes more than 5000ms.
2023-10-21 11:01:49.656 [WARN ] [sistence.internal.PersistenceManager] - Querying persistence service 'rrd4j' to restore 'TemperaturIst_Schlafzimmer' takes more than 5000ms.
2023-10-21 11:01:54.666 [WARN ] [sistence.internal.PersistenceManager] - Querying persistence service 'rrd4j' to restore 'Alarmsteuerung_Outdoor_Hauseingang' takes more than 5000ms.

Does this help anyone to solve my problem?

Added 2 days later:
Now I have more then 100 warnings like this on a restart. They kept getting more every day?
What can this mean?


It was running only 1h20min after the last restart then the persistance stopped recording again.
events.log shows the values still changed.

Last night I deleted all persistance files and restarted. 4hours later persistance stopped again.

Before there were all from user openhab and group openhab.
After openhab recreated all the rrd4j files it looked like this:

adminDSM@DS716:/volume1/SmartHome/docker/openHAB/userdata/persistence/rrd4j$ ls -l
total 491288
...
-rwxrwxrwx+ 1 dockerlimited users 542416 Oct 22 12:17 Steckdose_S01_Switch.rrd
-rwxrwxrwx+ 1 dockerlimited users 542416 Oct 22 12:15 Steckdose_S02_Switch.rrd
...
adminDSM@DS716:/volume1/SmartHome/docker/openHAB/userdata/jsondb$ ls -l
total 3140
...
-rwxrwxrwx+ 1 dockerlimited users   46247 Oct 12 13:13 org.openhab.core.items.Item.json
-rwxrwxrwx+ 1 dockerlimited users    9206 Oct 14 17:41 org.openhab.core.items.Metadata.json
-rwxrwxrwx+ 1 dockerlimited users   75707 Aug  5 23:18 org.openhab.core.thing.link.ItemCha

My docker variables are:

I checked this on bash:

adminDSM@DS716:/volume1/SmartHome/docker/openHAB/userdata/jsondb$ id openhab
uid=1033(openhab) gid=100(users) groups=100(users),65537(dialout),65536(openhab)
adminDSM@DS716:/volume1/SmartHome/docker/openHAB/userdata/jsondb$ id dockerlimited
uid=1028(dockerlimited) gid=100(users) groups=100(users)

I didn’t changed this for ages. Do I have a problem with rights and ownership?
Shouldn’t all files be from user openhab and user openhab?
I’m very confused now.

ah okay.
If I run a bash in the docker container user and group are still openhab. So this is all right then.

Still searching for the cause of my problem.

I have exactly the same problem in my installation. Maybe it has something to do with the number of persisted items. I had openhab openHAB 4.0.3 running for some time without issues. Now with some additional items rrd stops persisting values after random times. After restarting openhab it works again for a while.
I am curious if this has already been fixed:

Hi mhett,
it can be the reason because I also used it a long time with no problems and just added many more items. But I’m not sure if this is the reason. You might be right that the fix from the older bug report is still not active in 4.0.3 because this could be a good explanation.

I added a bug report yesterday to github. Because I couldn’t find a solution here or in discord.
It would be great if you could add some debugging information from your case there so that the maintainer have it easier to find the bug.

Since I upgraded from OpenHAB 3.4 to OpenHAB 4.0, I noticed that I have the same issue…

(see the following screenshot taken at 8h20)

The current version of OpenHAB is 4.0.3. It is hosted within a Docker container on my NAS (Synology). I have to stop and restart the container to fix the problem. It seems that it is an issue with the persistence service (RR4D4J, the only one I have installed) since data were not recorded. When I restart the OpenHAB container, data are again stored but there are missing data (see the following screenshot)

I had a look into the log file and did not see a warning/error when OpenHAB stopped storing the data (around 4h42 this morning) except during the starting phase where I had such warning for some of my items (7 among 576 items in total).

2023-11-07 08:21:40.125 [WARN ] [sistence.internal.PersistenceManager] - Querying persistence service ‘rrd4j’ to restore ‘ZWaveNode028FGR222RollerShutter2_Elec
tricmeterwatts’ takes more than 5000ms.

Any advice to fix the problem ? Or should I upgrade to 4.1 to fix this issue ?

There is no fix yet. Bud I did open an issue (Link I posted before). You can watch this to see when an bugfix will be available. There you find my workaround also which reduces the load on the persistence.

Thanks for this workaround. Does it mean that I should identify the items (among the 576) that need to be persistent to avoid storing everything ? My current config in mapdb.persist:

Strategies {
  default = everyUpdate
}

Items {
  gTimepicker* : strategy = everyChange, restoreOnStartup
  Chauffage_auto : strategy = everyChange, restoreOnStartup
}

I changed this config by replacing everyUpdate to everyChange as a default strategy. This should reduce the load to the persistence service. I restarted the container. Will see if this change will solve the problem. Otherwise, I will identify which items need to be persistent to reduce further the load.

I reduced the amount of items having the everyMinute strategy till this bug is solved.
I picked the items only where I read the graphs.

I also use mapdb now for the restoreOnStartup.

mapdb.persist

Strategies {
        default = everyChange
}
Items {
  * : strategy = everyChange, restoreOnStartup
}

rrd4j.persist

Strategies {
        everyMinute     : "0 * * * * ?"
        every5Minutes   : "0 */5 * * * ?"
        every15Minutes  : "0 */15 * * * ?"        
        everyHour       : "0 0 * * * ?"
        everyDay        : "0 0 0 * * ?"
        default = everyChange
}

Items {
  * : strategy = everyChange

  //Für Klimacharts & Heizung:
  LuftfeuchteAbsolut, LuftfeuchteAbsolut_Outdoor, TemperaturenIst, TemperaturIst_Outdoor, LuftfeuchteRelativ_Outdoor : strategy = every5Minutes, everyChange
  TemperaturenIst*, TemperaturSoll*, LuftfeuchteRelativ*, Ventile* : strategy = everyMinute, everyChange

  //Anwesenheit:
  AnwesenheitJemand, AnwesenheitPerson1_Delayed, AnwesenheitPerson2_Delayed : strategy = every5Minutes, everyChange

  //Für "Öffnungen in den letzten x Stunden" jede Minute den Zustand speichern.
  Oeffnungen* : strategy = everyMinute, everyChange
}

Thanks for having pointed me that I did have any configuration for the rr4dj persistence service… :wink:
I will right away create one !
Thanks again

I have now the folllowing config for the rr4dj persistence service:

Strategies {
        everyMinute     : "0 * * * * ?"
        every5Minutes   : "0 */5 * * * ?"
        every15Minutes  : "0 */15 * * * ?"        
        everyHour       : "0 0 * * * ?"
        everyDay        : "0 0 0 * * ?"
        default = everyChange
}

Items {
  * : strategy = everyChange
}

I will later use the strategies for some specific items. Now by default, data are stored when they change. This should work but… eventually not…

The chart page stopped showing the data when I restarted OpenHAB at 12:00:

This chart visualizes data from 3 items:

ST30Temperaturesalon_Temperature
ZWaveNode035ZMNHJDFlushPilot_Dimmer
RdC_temperature_requise

I then had a look in userdata/persistence/rrd4j to see whether the corresponding persistent files have been recently updated:

-rw-r--r-- 1 9001 9001 755692 Nov  8 13:30 ST30Temperaturesalon_Temperature.rrd
-rw-r--r-- 1 9001 9001 542416 Nov  8 13:27 ZWaveNode035ZMNHJDFlushPilot_Dimmer.rrd
-rw-r--r-- 1 9001 9001 755692 Nov  8 13:30 RdC_temperature_requise.rrd

But does it mean that new data have been stored in them ? Maybe not, because half an hour after, timestamp have changed but not the size… My concern is that I would like to have the chart to display continuous value even if they do not change (like before when default strategy was everyUpdate).

I also checked another item related to my solar panel. The chart representing its evolution (power) works fine:

The last value is from 14:27 (the time I wrote this message).

In conclusion, I will monitor this in the following days.

I am making little progress on this issue. Some items work fine in term of persistence. Some do not work. In my previous post, the following item

ST30Temperaturesalon_Temperature

has a persistence issue. After 12:01, nothing is displayed in a chart although the timestamp of the corresponding rrd4j file shows that the file is updated. So, I decided to have a look into it using the rrd4inspector. And I found the reason while it is not displayed:

Is it normal to have NaN into the database instead of a correct value ? If not, where it comes from ?

UPDATE: I decided to have an every minute persistence strategy for all the items and then my OpenHAB system went back to normal behavior.

I have the same Synology NAS 716+II with 8GB Ram and have the same issue. I’m running OH 4.1.0. Is your workaround working? Is there a fix?

@HenMus: I’m still using the workarounf and did not had any trouble with this. Good luck that it works for you too.

As I am also fighting with issues like the ones described here and knowing that you kind of fixed it with your workarounds by modifying/limiting the strategies, may I also provide my findings:

Some background first:
Like @helipus I am also running OpenHAB (4.1.2 meanwhile) in a Docker on a Synology (DS918+).
At some point (I think already shortly after updating from 3.4.4 to 4.0.2 or so) I realized the graphs in OpenHAB stopped showing the statistics a couple of hours after the last restart.
So I added InfluxDB as a second persistence service (leaving RRD4j the default) and was happy to see that there all values are kept up-to-date even after days and weeks runtime. As I had a Docker with InfluxDB already anyway (and another one with Grafana) that was on my to-do list anyway.

Maybe I could simply have removed RRD4j persistence and/or switched default to InfluxDB without ever missing it, but as I had enough diskspace and (like I assumed) performance to run both I kept it like that. And to be honest, I didn’t want to break something I might have not yet understood in all of its aspects neither. As I started with OpenHAB somewhere last year I first rather concentrated on adding all my things and items into the model in my spare time, instead of how to optimize persistence (or even pick the best persistence service for my need).
So that is how it ran so far and with every update I was curious if the issues with the RRD4j data might have been fixed.
As it has not and as I had some time over the easter days, I now searched throught the community and GitHUB for others experiencing the same and that’s how I found this topic.

I also had those “restore … takes more than 5000ms” messages in my logs after a restart of OpenHAB or after removing and readding a thing.
I also had the impression, that the rrd4j persistence stops saving data some hours after a restart again and that a restart was needed to fix it.

But now, that I enabled Trace logging for RRD4j persistence it looks slightly different.
Filtered on just one of the items to better show what happens:


So for me it looks like it has a serious backlog and getting worse.
See the first entry logged at 2024-04-03 18:22:48:
it is storing a value sent at 2024-04-01 14:40:14 (timestamp 1711975214)
and the last one logged at 2024-04-03 18:29:15:
it is storing a value sent at 2024-04-01 14:42:00 (timestamp 1711975320)
So in ~6.5 minutes it was able to only store values of ~1.75 minutes (from meanwhile almost two days ago!). And in deed, the OpenHAB graphs show the values up to that very timestamp.

While those that are being logged with a current timestame 1712161xxx (=the white ones) were all discarded, due to “new value arrived with same timestamp”, but strangely that one is not logged here at all. Maybe it will in 2-3 days, when the processing finally reaches that timestamp. Who knows.

Looks like I can observe the above for every item. A little difference to the energy meter values above though can be seen for the values that are not reported that often:


There (with the same delay in timestamps) it looks like each one is processed twice, which looks strange to me as well but could be as expected.

Sorry for the long text. But wanted to draw a bigger picture than a text like “maybe it doesn’t stop saving completely although it appears to, but creates a massive backlog” could have done.

1 Like

This might help the developers a lot. best would be if you add it also to the issue in github were they will be looking at: [rrd4j] Persistence stops saving data suddenly (mapDB not active) · Issue #15794 · openhab/openhab-addons · GitHub

Sure.
I thought the topic here is already linked there to give the details.
But commented there now as well with the main findings and linked to here as well.
Let’s see if it helps. Looks like it didn’t really get much visibility so far.

1 Like