What persistence for most stable setup?

Hi
Without going into details, I have on several occasions and versions of openHAB (on RPI 3, 4, and 5) tried to achieve a stable and self-sustaining database and history. However, I have not been able to get it to work properly. I have tried rrd4j, mapdb, and influx.

rrd4j works reliably but does not provide a detailed history and does not support all devices (if I understand correctly).

Influx works well and gives detailed history but has repeatedly (even after new installations, faster RPIs, larger memory cards, etc.) stopped working and lost data to varying extents (from a day to several weeks). This has mostly happened on its own or in connection with a restart.

I have tried reading up and testing various tips and advice to get influxDB to work stably but have not succeeded.

Are there any tips or advice on how I can best and most easily enable a stable and comprehensive database if I try a new installation of OH5? (or adjust my existing setup).

Thanks in advance,
Joakim

This isn’t entirely correct as stated. rrd4j provides a perfectly fine history. However, in order remain at a fixed size though it employs decimation. To start it saves a value at least once per minute, even if nothing changes. After a week (IIRC) it replaces every ten entries with the average of those entries. Aftyer a month it does it again and so on. So as the data gets older, instead of having a precise record every minute, you have fewer records that are averages. But the amount of decimation is somewhat overblown I think because most of the records being decimated will be the same value anyway so it’s no big deal until the data becomes months old. Does it really matter if the value at 09:23 90 days ago is off by 0.1% of the actual value? Usually not, particularly in a home automation scenario.

But because of how it works, rrd4j can only work with numerical values so no Strings, for example. But it’s not often you need to know a non-numeric state from months ago either.

Any external database is going to have the limitation that if it’s not running, OH can’t talk to it. That is always going to be less “stable” than an embedded database.

You also have to deal with the fact that the database grows forever, meaning you need to manually clean the data up or (in the case of InfluxDB) set up a retention policy to basically do what rrd4j does, and decimate the data as it ages.

I can’t help with your specific problems with InfluxDB except to say I know it can be run reliably. But it is always going to be less stable than an embedded database.

Generally, the best setup is to use the database that best supports what you want. There’s no rule that says you can use only one. MapDB is by far the best to use for restoreOnStartup. It’s fixed size and embedded and supports all Item types.

For casual charting and tracking of recent data rrd4j is the best choice. It’s embedded yet never grows.

For precise analysis of data that is months or years old, you’ll have to use an external database. But that comes with it’s own problems.

So use more than one. Most or all of your Items can be saved to MapDB with restoreOnStartup. Most or all of your numerical Items can be saved to rrd4j for simple charting and analysis of recent data (within the last few months should be precise). For those very few Items that need very precise analysis with data that is months or years old (by precise I’m talking about values that might be off in rrd4j by .01 to .5 depending on how often it changes, it all depends on how often it changes and by how much it changes at a time) consider SQLite which is embedded but not super useful to use external tools to work with the data, or any of the external databases which support what ever tools you are using to analyze the data.

Thank you very much for your response.
What you describe sounds good. With rrd4j, I experience that statistics like lux, wind speed, rainfall, and similar data cannot be stored (I suspect they are strings even though they are essentially numerical values?). Also, for some on/off devices, the data a week or month back does not show the correct on and off times, but instead lumps it together as on or off for a longer period. Temperature and similar data sometimes have a resolution of several hours between stored values when I view the last 24 hours, even though it is set to register once per minute and on every change, as you described.

Is there anything I can adjust to make rrd4j work more accurately?

SQLite might be a good alternative for me—I don’t need to analyze the data externally, but I do prefer it to be “complete” and more detailed. (I find rrd4j to be much more coarse than what you describe.)

MapDB I use already for restoreOnStartup (but I dont really know if it works good or bad :slight_smile: )

That’s up to you. If you are using a Number Item or Number:X type Item then rrd4j will save it.

It’s all based on the Item type.

Depending on the age of the data and what you need it for rrd4j may not be the best choice for those Items.

That’s not caused by rrd4j compression. I only use rrd4j for charting. Here’s the past week for one of my sensors:

image

Over the last 24 hours I have one record per minute. Note that the data compression doesn’t even happen until the data is at least one hour old (see below) and it doesn’t drop to one record per hour until the data is five years old.

Something else is going wrong with your configuration or your Item doesn’t change as frequently as you think it does or the configuration isn’t what you think it is.

Probably, if it’s doing what you describe something is configured wrong. Without more information though :person_shrugging: I’d need:

  • events.log to show roughly how often the Item changes
  • the persistence configuration
  • the Item and Link configuration (to ensure a profile isn’t preventing the Item from changing)
  • the rrd4j.conf file if you’ve tried to override the default rrd4j configuration. See the rrd4j readme for details.

That’s because something is wrong. You shouldn’t see any noticeable degradation of the data that soon. The default compression for numeric and quantity types is as follows (from the docs).

It defines 5 archives:

  1. granularity of 10s for the last hour
  2. granularity of 1m for the last week
  3. granularity of 15m for the last year
  4. granularity of 1h for the last 5 years
  5. granularity of 1d for the last 10 years

default_numeric

That means you have up to one entry every ten seconds for the past hour. You should have at least one record per minute between an hour and a week. Your data should be at least five years old before you only see one record every hour.

Pay attention to this note:

This datasource is used for plain Number items. It does not build averages over values, so that it is ensured that discrete values are kept when being read (e.g. an Item which has only states 0 and 1 will not be set to 0.5).

The compression algorithm used here is LAST instead of AVERAGE.

Numbers with units use AVERAGE for compression.

Items like Dimmers and similar numeric Items that are not Number have an even finer grained set of archives.

  1. granularity of 5s for the last hour
  2. granularity of 1m for the last week
  3. granularity of 15m for the last year
  4. granularity of 4h for the last 10 years

default_other

The behavior you describe is not the default configuration for rrd4j as used by OH.

But also notice that it is possible to set a different set of archives on a per Item basis if the default strategies are not sufficient. But unless you’ve done so and the configuration is messed up, the defaults would not compress the data as you describe.

If your Items get restored then it’s working right. You should however disable restoreOnStartup for any other persistence you are using.