RRD4j stops writing Data and Stuck at restart

martindk · January 5, 2022, 9:11am

Any update / new input on this issue ?

FranzS · January 6, 2022, 1:56pm

Hi Martin,

not from my side.
After i deleted the Persistence Files it worked again.
I just wonder if there might be an issue with the year changeover ein either rrd4j or the map DB.

BR
/Franz

opus · January 6, 2022, 3:02pm

I’m still on a 3.2 Milestone version, however rrd4j and mapDB are working properly (before and after the change to 2022).

martindk · January 14, 2022, 10:10am

I still have this issue - and to try and solve it, I have set up a completely new linux server, based on Ubuntu 20.4 - and installed OpenHabian and then OH3.2 release. All looked well (raw install).

I then started moving over my config (all is in files) - and then, after the system has been running 13 hours, persistance stops. The system as such works fine - but nothing is persisted. All persistance files has the same timestamp (which is now 2 days old). As persistance services, I have RRD4J and MAPDB.

I just called up the OpenHab console, and replicated exactly what @FranzS write in the first message; when listing the bundles, everything has status “Active” - also the two persistance bundles. When doing a bundle:restart the command “hangs” and the status keeps being “Stopping”.

I have spent a lot of time on this now - and feel I keep moving in circles. To me, my OH-install is if not useless, then not as usable, if I cannot use charts and rely on data being persisted.

The strange thing is, that I have used this config/setup for years now - and not had this issue before. I also have a parallel system (in another house) where nothing is broke and persistance keeps working.

Nothing is in the logs at all. Can anyone please help to suggest how to move forward trying to solve this issue ?

opus · January 14, 2022, 4:20pm

Could you post your rrd4j and mapdb persist and config files.
Do you have any suspicious log-entries around the time when the persistence stops?

martindk · January 16, 2022, 8:05pm

No suspicious entries in the log at all - the system just continue working although the persistance services stops.

Here is my mapdb.persist:

Strategies {
        default = everyChange
}

Items {
        // persist all items on every change and restore them from the db at startup
        * : strategy = everyChange, restoreOnStartup
}

And here my rrd4j.persist:

Strategies {
    // for rrd charts, we need a cron strategy
    everyMinute : "0 * * * * ?"
}

Items {
    // persist items on every change and every minute
    * : strategy = everyChange, everyMinute
}

My runtime.cfg has:

################ PERSISTENCE ####################

#  The persistence service to use if no other is specified.
#
org.openhab.persistence:default=rrd4j

Any suggestions ?

opus · January 17, 2022, 7:01am

Can’t say if helps but:

From the docs: "NOTE: rrd4j is for storing numerical data only. "

If it were me, I’d limit rrd4j to persist numeric data only.
Is that the reason?I don’t think so, in other words, I have no clue.

mstormi · January 17, 2022, 8:25am

Please provide the minimum amount of information it takes to help you in your initial post upfront, it is very annoying if we have to ask for all the details
How to ask a good question / Help Us Help You - Tutorials & Examples - openHAB Community

Do you use openHABian with ZRAM ? If so what is your /etc/ztab and the output of zramctl ?

martindk · January 17, 2022, 8:48am

Sorry for being annoying - and I didnt provide the persistance configs as I had already double checked that these had not changed one bit since the OH installation was confirmed working flawlessly.

No, I am not using ZRAM and have no /etc/ztab to show.

I somehow feel that this is related to the OH3.2 upgrade. I cannot say why, other than that it worked without issues before and now doesnt.

As stated, I have tried to set up a completely new server and migrated my installation (config, persisted data etc) to it - and have the issue there as well.

If no better suggestions, I will try to downgrade to 3.1 and see if that works…

martindk · January 18, 2022, 12:48pm

Update: I downgraded to 3.1.0 and that gives the same problem… I am really at a loss here.

Wolfgang_S · January 18, 2022, 6:06pm

in case there is any any update done on RRD4J data organization from 3.1 to 3.2 I doubt that a downgrade would revert these changes.

Did you try to enable DEBUG logging for RRD4J ?
log:set DEBUG org.openhab.persistence.rrd4j

martindk · January 19, 2022, 9:10pm

Set DEBUG as suggested - at 17:18 today and this evening, at 21:46 the last entries from the RRD4J binding in the openhab.log is:

2022-01-19 21:46:37.958 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'Sauna_temp_rate' as value '-0.00166667' in rrd4j database
2022-01-19 21:46:39.194 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxUptime' as value '1680141.0' in rrd4j database
2022-01-19 21:46:39.299 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanUptime' as value '1680038.0' in rrd4j database (again)
2022-01-19 21:46:39.302 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanUptime' as value '1680098.0' in rrd4j database
2022-01-19 21:46:39.359 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanTotalBytesSent' as value '1.093535282E9' in rrd4j database
2022-01-19 21:46:39.469 [DEBUG] [d4j.internal.RRD4jPersistenceService] - Stored 'fboxWanTotalBytesReceived' as value '4.213340012E9' in rrd4j database

Nothing unusual with those entries - and nothing unusual elsewhere in the log. The binding simply stops…?

The only thing strange this time is, that the system only ran normally for 4½ hours - and not the 12-13 hours I have seen the other times.

rossko57 · January 20, 2022, 1:33am

Well, there is a bit more work to do with DEBUG active, especially log file I/O

This feels like a symptom of something else, a general resource constraint with threads or memory or such.
There are reports of 3.2 grinding to a halt when repeatedly editing rules for example.

martindk · February 2, 2022, 10:08am

I have tried to increase memory to the virtual linux instance I am running this on (running Debian 11.2 on my QNAP NAS using Virtualizatiopn Station). I have assigned 2 cores and 3 GB RAM to the virtual PC - which should be more than enough. OpenHab 3.2 was installed using OpenHabian - and it is a fresh install, where I have then transferred my config+persistance files over to.
Everything works - except for the halting persistance (RRD4J and MAPDB).

I am really at a loss here ? - how should I proceed with the debugging ? - or do anyone have a real idea what could be causing this ?

martindk · February 3, 2022, 5:05pm

After assigning more RAM the persistence still halts - which also causes graphs to “flatten”… The rest of the OH3.2 installation keeps running.

Still no ideas how to proceed ?

martindk · February 7, 2022, 8:02am

OK- I gave up and created a completely new linux installation, as a virtual pc under QNAP Virtualisation Station. This time based on the ubuntu-20.04.3-live-server-amd64.iso image.
On that, I installed a new OpenHabian using the newest version - and migrated my backup of items, things etc - using OpenHabians Backup/Restore - and as far as I can see, everything works fine now.
All graphs work - and seem to keep on working.

My issues described must have been linked to a broken debian linux. Sometimes it is better to start over from scratch

opus · February 7, 2022, 8:15am

That is a suggestion more people should follow!

martindk · February 7, 2022, 9:38am

And - just as I was happy - my persistance broke down again

When doing a “sudo systemctl status openhab” I get this:

● openhab.service - openHAB instance, reachable at http://openhabian:8080
     Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/openhab.service.d
             └─override.conf
     Active: active (running) since Sun 2022-02-06 22:39:54 CET; 11h ago
       Docs: https://www.openhab.org/docs/
             https://community.openhab.org
   Main PID: 2467 (java)
      Tasks: 278 (limit: 2274)
     Memory: 877.0M
     CGroup: /system.slice/openhab.service
             └─2467 /usr/bin/java -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dopenhab.runtime=/usr/share/openhab/runtime -Dopenhab.userdata=/var/lib/openhab -Dopenhab.logdir=/var/log/openhab -Dfelix.>

Feb 07 09:24:51 openhabian karaf[2467]:         at org.openhab.core.persistence.internal.PersistenceManagerImpl.handleStateEvent(PersistenceManagerImpl.java:152)
Feb 07 09:24:51 openhabian karaf[2467]:         at org.openhab.core.persistence.internal.PersistenceManagerImpl.stateChanged(PersistenceManagerImpl.java:473)
Feb 07 09:24:51 openhabian karaf[2467]:         at org.openhab.core.items.GenericItem.lambda$1(GenericItem.java:259)
Feb 07 09:24:51 openhabian karaf[2467]:         at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
Feb 07 09:24:51 openhabian karaf[2467]:         at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Feb 07 09:24:51 openhabian karaf[2467]:         at java.base/java.lang.Thread.run(Thread.java:829)
Feb 07 09:24:51 openhabian karaf[2467]: Caused by: java.io.EOFException
Feb 07 09:24:51 openhabian karaf[2467]:         at org.mapdb.Volume$FileChannelVol.readFully(Volume.java:947)
Feb 07 09:24:51 openhabian karaf[2467]:         at org.mapdb.Volume$FileChannelVol.getByte(Volume.java:997)
Feb 07 09:24:51 openhabian karaf[2467]:         ... 17 more

So - Openhab is clearly running, but persistance stopped - and the log shows java exceptions…
I have previously had DEBUG running on persistance - but that just showed that everything worked normally - until it stopped.

So - I am at it again; do anyone have suggestions on how to investigate further?

martindk · February 22, 2022, 8:12am

OK. I have really been annoyed about this and have gone through multiple new-installs with restore of my configs etc. - to just end up with the same issue over and over. I have tried different linux variants (virtualized on my QNAP NAS) to same effect.

In the end, I decided to delete my persistance data - so all data in the persistance/mapdb and persistance/rrd4j folders. Then a restart of OH3.2 and cleaning up all the initialization of items which are now all null - where my rules might expect them to have a value (I know, I should deal with this in the rules, but I am lazy).

Now, my installation has been running for >1 week without any issues and my persistance and charts works flawlessly.

This leads me to suspect, that as I have - over the last few years (!) have played around with my OH installation quite a lot, many references to now non-existing items and data existed in the persistance data files/db - which might be the cause for my problems. I saw that in my rrd4j persistance data (.rrd files) there were a lot of very old items listed (several hundreds), which I have not used for a long time and which are no longer part of my .items files.

I now consider the issue closed from my side - and just wanted to capture my experience and suspicion here, as someone else might benefit from this…

system · April 5, 2022, 12:12am

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.