RRD4j stops writing Data and Stuck at restart

FranzS · January 3, 2022, 7:24am

Hi all,

I’m using RRD4J and MAPDB.
Last two days I faced a similar problem. Round about 22:00 no RRD4j files are written any more as well as as any map db files.
Trying to restart the bundles lead to:

openhab> bundle:list | grep -i per
102 x Active   x  80 x 2.16.0.v20190528-0725 x EMF XML/XMI Persistence
195 x Waiting  x  80 x 3.2.0                 x openHAB Core :: Bundles :: Model Persistence
196 x Active   x  80 x 3.2.0                 x openHAB Core :: Bundles :: Model Persistence IDE
197 x Active   x  80 x 3.2.0                 x openHAB Core :: Bundles :: Model Persistence Runtime
210 x Stopping x  80 x 3.2.0                 x openHAB Core :: Bundles :: Persistence
308 x Active   x  80 x 3.2.0                 x openHAB Add-ons :: Bundles :: Persistence Service :: MapDB
309 x Stopping x  80 x 3.2.0                 x openHAB Add-ons :: Bundles :: Persistence Service :: RRD4j

As well as as the follwong error messages are written: (But only on trying to restart the files) otherwise no error in the logfiles.

2022-01-03 08:12:55.646 [ERROR] [ence.internal.PersistenceManagerImpl] - bundle org.openhab.core.persistence:3.2.0 (210)[org.openhab.core.persistence.internal.PersistenceManagerImpl(217)] : waitForTracked timed out: 7 ceiling: 6 missing: [],  Expect further errors

2022-01-03 08:16:17.292 [ERROR] [ence.internal.PersistenceManagerImpl] - bundle org.openhab.core.persistence:3.2.0 (210)[org.openhab.core.persistence.internal.PersistenceManagerImpl(217)] : DependencyManager : invokeUnbindMethod : timeout on close latch PersistenceService

That’s how the logfiles look like:

root@raspi:/openhab/userdata/persistence# ls -ltr rrd4j | tail -10
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 SonosOneBad_Mute.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 SonosBeamFernsehzimmer_Loudness.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 KlimaBad_Power.rrd
-rw-r--r-- 1 openhab openhab 755692 Jan  2 22:00 ZigbeeHueWirtschaftsraumMotionSensor_Temperature.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 ZigbeeHueBadDeckenleuchte_LevelControl.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 ZigbeeHueBadDeckenleuchte_ColorTemperature.rrd
-rw-r--r-- 1 openhab openhab 755692 Jan  2 22:00 VacuumOG_ErrorID.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 LightTemperature.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 VacuumOG_DoNotDisturb.rrd
-rw-r--r-- 1 openhab openhab 542416 Jan  2 22:00 VacuumUG_DoNotDisturb.rrd
root@raspi:/openhab/userdata/persistence# ls -ltr mapdb | tail -10
total 3584
-rw-r--r-- 1 openhab openhab 3683483 Jan  2 22:00 storage.mapdb.p
-rw-r--r-- 1 openhab openhab   33088 Jan  2 22:00 storage.mapdb
-rw-r--r-- 1 openhab openhab      16 Jan  3 08:16 storage.mapdb.t

BR
/Franz

martindk · January 3, 2022, 7:38am

I am not sure if this is the same issue - as you have investigated it a bit more in depth than I - but since upgrading to OH 3.2 through openhabian, from OH 3.1 - I also have issues with RRD4J and charting. After a restart of the system, everything will work normally for 12 hours or so - and then stop charting the items. My charts then look like this:
chart
It seems that RRD4J stops working alltogether.
Restarting the OH service - or the whole server - will reestablish charting and persistance, but no data from the period since charting stopped will have been stored.

I guess it is RRD4J - and MAPDB - persistance which totally stops working - just like @FranzS wrote ?
In my case, it is just at 23:20 that the persistance files are dated as touched last (not 22:00).

Any ideas how to investigate further / resolve ?

martindk · January 5, 2022, 9:11am

Any update / new input on this issue ?

FranzS · January 6, 2022, 1:56pm

Hi Martin,

not from my side.
After i deleted the Persistence Files it worked again.
I just wonder if there might be an issue with the year changeover ein either rrd4j or the map DB.

BR
/Franz

opus · January 6, 2022, 3:02pm

I’m still on a 3.2 Milestone version, however rrd4j and mapDB are working properly (before and after the change to 2022).

martindk · January 14, 2022, 10:10am

I still have this issue - and to try and solve it, I have set up a completely new linux server, based on Ubuntu 20.4 - and installed OpenHabian and then OH3.2 release. All looked well (raw install).

I then started moving over my config (all is in files) - and then, after the system has been running 13 hours, persistance stops. The system as such works fine - but nothing is persisted. All persistance files has the same timestamp (which is now 2 days old). As persistance services, I have RRD4J and MAPDB.

I just called up the OpenHab console, and replicated exactly what @FranzS write in the first message; when listing the bundles, everything has status “Active” - also the two persistance bundles. When doing a bundle:restart the command “hangs” and the status keeps being “Stopping”.

I have spent a lot of time on this now - and feel I keep moving in circles. To me, my OH-install is if not useless, then not as usable, if I cannot use charts and rely on data being persisted.

The strange thing is, that I have used this config/setup for years now - and not had this issue before. I also have a parallel system (in another house) where nothing is broke and persistance keeps working.

Nothing is in the logs at all. Can anyone please help to suggest how to move forward trying to solve this issue ?

opus · January 14, 2022, 4:20pm

Could you post your rrd4j and mapdb persist and config files.
Do you have any suspicious log-entries around the time when the persistence stops?

martindk · January 16, 2022, 8:05pm

No suspicious entries in the log at all - the system just continue working although the persistance services stops.

Here is my mapdb.persist:

Strategies {
        default = everyChange
}

Items {
        // persist all items on every change and restore them from the db at startup
        * : strategy = everyChange, restoreOnStartup
}

And here my rrd4j.persist:

Strategies {
    // for rrd charts, we need a cron strategy
    everyMinute : "0 * * * * ?"
}

Items {
    // persist items on every change and every minute
    * : strategy = everyChange, everyMinute
}

My runtime.cfg has:

################ PERSISTENCE ####################

#  The persistence service to use if no other is specified.
#
org.openhab.persistence:default=rrd4j

Any suggestions ?

opus · January 17, 2022, 7:01am

Can’t say if helps but:

From the docs: "NOTE: rrd4j is for storing numerical data only. "

If it were me, I’d limit rrd4j to persist numeric data only.
Is that the reason?I don’t think so, in other words, I have no clue.

mstormi · January 17, 2022, 8:25am

Please provide the minimum amount of information it takes to help you in your initial post upfront, it is very annoying if we have to ask for all the details
How to ask a good question / Help Us Help You - Tutorials & Examples - openHAB Community

Do you use openHABian with ZRAM ? If so what is your /etc/ztab and the output of zramctl ?

martindk · January 17, 2022, 8:48am

Sorry for being annoying - and I didnt provide the persistance configs as I had already double checked that these had not changed one bit since the OH installation was confirmed working flawlessly.

No, I am not using ZRAM and have no /etc/ztab to show.

I somehow feel that this is related to the OH3.2 upgrade. I cannot say why, other than that it worked without issues before and now doesnt.

As stated, I have tried to set up a completely new server and migrated my installation (config, persisted data etc) to it - and have the issue there as well.

If no better suggestions, I will try to downgrade to 3.1 and see if that works…

martindk · January 18, 2022, 12:48pm

Update: I downgraded to 3.1.0 and that gives the same problem… I am really at a loss here.

Wolfgang_S · January 18, 2022, 6:06pm

in case there is any any update done on RRD4J data organization from 3.1 to 3.2 I doubt that a downgrade would revert these changes.

Did you try to enable DEBUG logging for RRD4J ?
log:set DEBUG org.openhab.persistence.rrd4j

martindk · January 19, 2022, 9:10pm

Set DEBUG as suggested - at 17:18 today and this evening, at 21:46 the last entries from the RRD4J binding in the openhab.log is:

2022-01-19 21:46:37.958 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'Sauna_temp_rate' as value '-0.00166667' in rrd4j database
2022-01-19 21:46:39.194 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxUptime' as value '1680141.0' in rrd4j database
2022-01-19 21:46:39.299 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanUptime' as value '1680038.0' in rrd4j database (again)
2022-01-19 21:46:39.302 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanUptime' as value '1680098.0' in rrd4j database
2022-01-19 21:46:39.359 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanTotalBytesSent' as value '1.093535282E9' in rrd4j database
2022-01-19 21:46:39.469 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanTotalBytesReceived' as value '4.213340012E9' in rrd4j database

Nothing unusual with those entries - and nothing unusual elsewhere in the log. The binding simply stops…?

The only thing strange this time is, that the system only ran normally for 4½ hours - and not the 12-13 hours I have seen the other times.

rossko57 · January 20, 2022, 1:33am

Well, there is a bit more work to do with DEBUG active, especially log file I/O

This feels like a symptom of something else, a general resource constraint with threads or memory or such.
There are reports of 3.2 grinding to a halt when repeatedly editing rules for example.

martindk · February 2, 2022, 10:08am

I have tried to increase memory to the virtual linux instance I am running this on (running Debian 11.2 on my QNAP NAS using Virtualizatiopn Station). I have assigned 2 cores and 3 GB RAM to the virtual PC - which should be more than enough. OpenHab 3.2 was installed using OpenHabian - and it is a fresh install, where I have then transferred my config+persistance files over to.
Everything works - except for the halting persistance (RRD4J and MAPDB).

I am really at a loss here ? - how should I proceed with the debugging ? - or do anyone have a real idea what could be causing this ?

martindk · February 3, 2022, 5:05pm

After assigning more RAM the persistence still halts - which also causes graphs to “flatten”… The rest of the OH3.2 installation keeps running.

Still no ideas how to proceed ?

martindk · February 7, 2022, 8:02am

OK- I gave up and created a completely new linux installation, as a virtual pc under QNAP Virtualisation Station. This time based on the ubuntu-20.04.3-live-server-amd64.iso image.
On that, I installed a new OpenHabian using the newest version - and migrated my backup of items, things etc - using OpenHabians Backup/Restore - and as far as I can see, everything works fine now.
All graphs work - and seem to keep on working.

My issues described must have been linked to a broken debian linux. Sometimes it is better to start over from scratch

opus · February 7, 2022, 8:15am

That is a suggestion more people should follow!

martindk · February 7, 2022, 9:38am

And - just as I was happy - my persistance broke down again

When doing a “sudo systemctl status openhab” I get this:

● openhab.service - openHAB instance, reachable at http://openhabian:8080
     Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/openhab.service.d
             └─override.conf
     Active: active (running) since Sun 2022-02-06 22:39:54 CET; 11h ago
       Docs: https://www.openhab.org/docs/
             https://community.openhab.org
   Main PID: 2467 (java)
      Tasks: 278 (limit: 2274)
     Memory: 877.0M
     CGroup: /system.slice/openhab.service
             └─2467 /usr/bin/java -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dopenhab.runtime=/usr/share/openhab/runtime -Dopenhab.userdata=/var/lib/openhab -Dopenhab.logdir=/var/log/openhab -Dfelix.>

Feb 07 09:24:51 openhabian karaf[2467]:         at org.openhab.core.persistence.internal.PersistenceManagerImpl.handleStateEvent(PersistenceManagerImpl.java:152)
Feb 07 09:24:51 openhabian karaf[2467]:         at org.openhab.core.persistence.internal.PersistenceManagerImpl.stateChanged(PersistenceManagerImpl.java:473)
Feb 07 09:24:51 openhabian karaf[2467]:         at org.openhab.core.items.GenericItem.lambda$1(GenericItem.java:259)
Feb 07 09:24:51 openhabian karaf[2467]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Feb 07 09:24:51 openhabian karaf[2467]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Feb 07 09:24:51 openhabian karaf[2467]:         at java.base/java.lang.Thread.run(Thread.java:829)
Feb 07 09:24:51 openhabian karaf[2467]: Caused by: java.io.EOFException
Feb 07 09:24:51 openhabian karaf[2467]:         at org.mapdb.Volume$FileChannelVol.readFully(Volume.java:947)
Feb 07 09:24:51 openhabian karaf[2467]:         at org.mapdb.Volume$FileChannelVol.getByte(Volume.java:997)
Feb 07 09:24:51 openhabian karaf[2467]:         ... 17 more

So - Openhab is clearly running, but persistance stopped - and the log shows java exceptions…
I have previously had DEBUG running on persistance - but that just showed that everything worked normally - until it stopped.

So - I am at it again; do anyone have suggestions on how to investigate further?