Strategy for historic persisted data

At present my openHAB2 ist online a bit over a year now in production and persistence is really active sind 10month now. The persistence strategy is to store onEveryChange on nearly all my items in a MySQL database.
The overall size of the MySQL-database is about 650MB. I know, MySQL databases can be much bigger and I don’t feel some perfomance problems (since the DB-Server is on my NAS and not on my Pi running OH2).

But, nevertheless: is there anybody, who has already thought on “compression” or better “aggregating” the data within the database? I think, I can safely delete the historic data of my old light-switches and similar data, but I’d like to keep some of the information of my sensors (like temperature, heating, …) for reference. But perhaps there’s some clever way to aggregate the values on an hourly/half-daily/… basis?

That is essentially what rrd4j does and is how it maintains its constant file sizes. As data ages, a number of values gets replaced with the average of those values. That is a relatively elegant way to reduce the amount of storage required without completely throwing away the old data.

Personally, I would write a cron job to dump all values older than a certain age for those Items that I don’t care if I have old historic data for and keep all the data for the Items that I do care. Or I would switch to rrd4j for those Items.

well thanks, that’s the solution, I was also thinking of. Just deleting all those useless items and keeping the ones, I’d like to have for comparisons - at a later stage, I could then also aggregate those if needed.
… my experience with rrd wasn’t that great, so I’ll stay with MySQL…

As much as I prefer rrd4j, it is not the solution for users that want to persist all data. rrd4j can only persist numeral data!

True, and it doesn’t support Switches and Contacts any longer. But from a practical perspective, how often will one want to store non-numerical data for a long time? You can’t chart it. You can’t do math on it. And therefore there is no way to aggregate it like OP was requesting.

And this is part of the problem as well. It is best to persist data using the appropriate engine for the data being persisted.

  • If you want to keep data forever, use InfluxDB, MariaDB, et al and let the DB grow forever or you need to write your own cleanup scripts.

  • If you want to keep data forever but are OK with the data becoming “compressed” as it gets older use RRD4J or InfluxDB with a custom retention policy (BTW retention policies in InfluxDB are really powerful for this sort of thing).

  • If you only need the most recent update for Rules or restoreOnStartup use MapDB.

I find it unlikely that one would want to use just one of the above policies for ALL Items in their setup. Obviously one can use just one but it runs the risk of a DB that is far larger than necessary, incomplete support for their given Item types, or limitations on what you can support.

3 Likes

if interested in garbage collection, see

1 Like