Guaranteed Persistence

martingruening · April 20, 2020, 12:17pm

openHAB receives some valueable data for me through is bindings. The frequency of change for the items is every few seconds. I used to persist the items into InfluxDB and can work there with the timeseries. All good.

Problems arise when the InfluxDB instance is offline for any reason. Then I will loose all the changes until the persistence target becomes online again. I would love to configure the persistence code to cache item states for a configurable duration until the persistence target becomes available again. Then the cache should be written to the persistence target. Is this possible today?

My current workaround avoids the persistence plugins and writes data to MQTT where it is consumed by Telegraf and written to InfluxDB. The MQTT broker is my caching instance in this case (using QOS=2). This is complicated and creates other problems and I really like to get rid of this.

rossko57 · April 20, 2020, 12:23pm

Strikes me you should be looking at why that is? It’s only a database, it should be reliable. If it’s important, you would protect it with UPS and the like.

martingruening · April 20, 2020, 12:30pm

It is rather reliable, but there are things like planned maintenance when patching and upgrading it. Or rebooting the virtualisation server it sits on. I experienced also some ‘out of memory’ problems with some InfluxDB versions in the past. Bad things happen, especially in distributed systems. It would be great if openHAB could handle those situations.

rossko57 · April 20, 2020, 12:38pm

It’s a home automation system, not a critical data logging system.
I think your current workaround is quite ingenious, and I doubt there is much that can be improved there.

Dibbler42 · April 20, 2020, 1:12pm

Just a short search on google give me the impression that the enterprise version of influx offern clustering and HA. Sure that is not for free, butt if the data is taht importatnt to you …

rlkoshak · April 20, 2020, 8:49pm

I think the short answer is no, this is not possible. I agree with rossko57 though, you approach is quite ingenious. You might consider filing an issue to request such a feature but this is a home automation system, not an industrial or scientific system. The effort required might be deemed not worth it, unless you are willing to put forth the effort to code it.

From a fault tolerance perspective, since this is a fault tolerance problem, adding this to OH doesn’t really solve the problem, it just moves the problem. What happens to your data when OH itself is offline? Surely preserving the data is just as important then as when InfluxDB is offline, right? What happens when Telegraf is offline now?

Fault tolerance is hard.

martingruening · April 21, 2020, 7:49am

Thanks for your remarks. All your comments are spot on. Yes, I completely agree that OH is not an ‘enterprise’ piece of software with high availalability as one of its traits. And the sensor data I receive through OH is not valueable in terms of money for me. I just want to have a continuous recording of status for my purposes. Adding an MQTT broker and Telegraf to the equation solves the problem in some way, but introduces more points failure in to the system (and they failed already, like Telegraf silently going down, causing a day of lost sensor data). In general I tried to reduce the ‘moving parts’ in my smarthome setup. This is why I love OH because I can get rid of many other workarounds.

Adding a little bit of caching to the persistence code could help OH in two ways: I could protectect you from data loss for a day or two when your persistence target wents offline and you could provide lookups from the cache even if the persistence target is offline. I will see if I can setup a development environment with an OH 3.0 snapshot to build that. Let’s see how this works out.

rossko57 · April 21, 2020, 10:03am

Don’t let us stop you, but make it optional. Remember the majority of openHAB deployments are on an all-in-one box with limited resources. i.e. the user’s “openHAB” is an overloaded Pi which is also running the only database and related services, all on one stressed SD card.

rlkoshak · April 21, 2020, 2:24pm

In addition to what rossko57 said about making it optional, be cognizant of the different ways the persistence add-ons/databases actually work. I doubt this caching mechanism can be one size fits all. For example, MapDB and rrd4j are embedded. If they are down OH itself is down. So it doesn’t make sense to cache there (not to mention the fact that MapDB only saves the most recent value in the first place. Other databases have their own quirks as well.

martingruening · April 22, 2020, 10:44am

Based on your remarks I will investigate first whether it is possible to include a ‘queue writes while InfluxDB is offline’ concept into the rewritten InfluxDB Persistence module created by @lujop for OH 3.0. Queueing could happen in memory or on disk, needs to be configurable (number of entries to queue max, 0 equaling not queueing at all which should be default)