What database to use instead of InfluxDB?

damihe · September 24, 2023, 10:07pm

I wanted to migrate my RRD-based persistence to a more sophisticated storage, so some calculations are easier (e.g. if accessed by other tools like Grafana where custom queries can be done, other persistence services are more powerful). InfluxDB was the first thing that came into my mind. But there is an issue.

InfluxData is killing the open source edition of InfluxDB effectively and forcing to either use a freeware or commercial edition (which is unlikely to be affordable for home use) to be still useful. I’m not sure, whether that’s too much vendor lock-in for me. As even the quite new Flux-Language is deprecated and the previous InfluxQL (was depecrated by Flux) is relived again with version 3 the situation quite disappointing. The remaining open source product will only suitable for short term archives as any internal possibility to compact data automatically will be removed.

What are the alternatives? Searching for Timebased databases, i came to this page showing a list of time series based DBMS’. Looking into Top 10, i see following candidates:

TimescaleDB
- Extension to PostgreSQL
- I would assume the standard JDBC-connector using a postgres-URL should work
- Maybe some handling is needed to convert from normal tables to timeseries
- Compaction is stated as “aggregation” in the docs
- Page itself looks also very commercial
Apache Druid
- Regular database, performance should be very good
- Also SQL-based, but another JDBC-connector is required
- Maybe too large for home environments
- Docs explicitly state information about compaction
- Many mathematical functions for analyzing data
TDengine
- Regular database, did not find anything about feature
- Also SQL-based, but another JDBC-connector is required
- Nothing about compaction found in my quick research
QuestDB
- Regular database, did not find anything about
- Also SQL-based, uses PostgreSQL-wire protocol, JDBC postgres URL shall work
- Compaction does not seem to be integrated in automatic manner but is described in a blog entry as sampling together with retention, just has to be done externally
- Seems to aim smaller environments as it’s basic installation is small

This list was only my quick research - I’m happy with additional input.
However currently I’m a little bit lost, do you have maybe some experience there?

jimtng · September 25, 2023, 4:22am

I’m not offering an informed opinion here, and merely asking to learn. What’s wrong with mysql/mariadb/postgresql? I’ve been wanting to migrate my influxdb data into one of them, because I hate influxql (don’t understand it). My idea was that if it’s in mysql/postgresql I’ll be more familiar with it and can use/manipulate the data more easily, even through my own script directly.

damihe · September 25, 2023, 5:49am

Nothing is wrong with classic relational DBMS. They‘re just not optimised for large sets of time based data.

InfluxDB handles that in terms of storage and data handling (e.g. you can integrate or deviate over the values in a time range just by querying for it, which is really cool) well.

You can set some indices and also partition tables, but that is much more manual work than using a database optimised for the use case.

InfluxQL is near to SQL and has only some caveats if doing irregular queries (joins, unions etc). Flux is completely different and from my sight hard to understand and now completely not worth learning it. I’m very familiar with MySQL/MariaDB and PostgreSQL. They are good for typical dataset queries and fast if the table is optimised for the queries. But data analysis beyond classical min/max/count (=simple aggregations) is hard, especially when not grouping by value (but by interval instead).

Maybe scalability is not an issue in my home setup. But i guess it could be because of openHABs integration of DynamoDB which is surely used by others.

Udo_Hartmann · September 25, 2023, 8:10am

InfluxDB1.8 is still downloadable, the documentation is still downloadable, but it won’t be maintained any longer.
The same is true for InfluxDB2.7.1 (just a guess)
So as long as there is no severe security issue, the simplest solution would be to use the preferred flavor and maybe download the packets to be able to manually install them. Maybe consider to save them as docker containers (no OS dependencies).

At the moment, the only option is Influx1 or Influx2. If it comes to unavailability of this versions, someone hopefully will spend much time and effort to create another persistence service for another database more suitable - which should include full support in Grafana, as Grafana is the reason for InfluxDB support in openHAB in the first place.

rlkoshak · September 25, 2023, 4:55pm

Let what’s supported by the tools you use like Grafana be your guide.

Keep in mind though that the “large sets” that most of these databases are designed to handle will be many orders of magnitude larger than even the largest openHAB database.

It’s at least used by one person, the one who developed the add-on. But lots of stuff get added to OH because some wants to, and not because of a specific need. I wouldn’t treat the mere presence of DynamoDB or Mongo to imply there are lots of OH instances out there with huge databases.

I rather suspect DynamoDB was added to get offsite storage of persistence than because they needed scale.

juelicher · September 25, 2023, 5:37pm

I am using PostgreSQL for many years as main database for openhab and mainly Grafana for charts. My database ist in the range of several GB, which is not big at all for such a DBMS. I have never purged any data.

This works without any problems

wars · September 25, 2023, 6:31pm

Does this also supports aggregations on the fly when generating graphs in OH?

juelicher · September 25, 2023, 7:32pm

I am not sure, what you mean with „aggregations on the fly“?

Udo_Hartmann · September 25, 2023, 8:16pm

Something like “get data from one year, build 1500 mean values and send the 1500 values” to display them as a chart.

damihe · September 25, 2023, 11:56pm

You are right, InfluxDB is even still installable in version 1.8 and currently 2.x too. That’s so far good as there is currently no reason to hurry.

However that should not be the future. Installing now an old unsupported version leads to an unupgradable setup. People still use openHAB 2.5 and have issues upgrading to current versions as the step is too large. Also it would require to archive a whole system. Docker is not feature stable enough to be really sure (there were also backward incompatible changes i remember).

My attempt was to find something thats similar powerful and usable in future, maybe not even requiring much effort from openHAB side.

Udo_Hartmann · September 26, 2023, 7:32am

Well, at least someone has to create a persistence binding,

I did not read the entire article regarding InfluxDB Community, but I had the impression, that InfluxDB Community will (more or less) be the same as it was true for InfluxDB 1x OpenSource vs. Enterprise and also InfluxDB 2.x OpenSource vs. Enterprise. So maybe it will be sufficient to update/upgrade the existing persistence service to talk to InfluxDB 3.x, and maybe (as they’re now using InfluxQL) this will be not much effort at all (implemented since InfluxDB 0.x - yes, there was an Influx 0.x persistence binding)

Just for reference, the Open Source Repo:

We continue to support both the 1.x and 2.x versions of InfluxDB for our customers, but our new development efforts are now focused on 3.x.

rlkoshak · September 26, 2023, 3:22pm

Are you referring to the Docker service itself, the InfluxDB image, or the openHAB image?

damihe · September 26, 2023, 4:00pm

I‘m referring the docker daemon itself - I don‘t remember all details, but it had to do with cgroups v1 and xfs which required some images to be rebuilt. Maybe neither openHAB nor InfluxDB has been affected, that was when docker was quite new.

Also just the step Docker 20.x to newer required conversion of containers and images as aufs support has been removed. And there is not a convenient way to do so. The easiest way was to move all persistent volumes out of dockers workdir and re-downloading all the images, which may be a problem when the vendor or Docker Hub removed the image.

rlkoshak · September 26, 2023, 4:17pm

Docker isn’t the only game in town when it comes to containers. Perhaps one of these others are more stable or suitable.

In practice, I’ve been running everything I can run as a service in Docker containers and have done so for a number of years. I’ve had no problems or complaints nor have I had to do anything special to keep them running. Maybe I’ve just been lucky? “When Docker was quite new” was a decade ago now which is why I was surprised by the “not stable” comment.

Currently running:

openHAB
Mosquitto
wyze-bridge
Zabbix
gitea
photoprism
nextcloud (custom image)
elasticsearch (custom image
redis
calibre (custom image)
postgresql
plex
minecraft-bedrock-server (I have a ten-year-old)
semaphore
geimdall
vaultwarden

In the past I’ve also run

GitLab
Gogs
Guacamole (custom image to run on RPi 4)
Grafana
InfluxDB
Librephotos
MongoDB
Nightscout (custom image as an official image wasn’t available)
Organizr
Pi-Hole
Portainer
Shinobi
ZoneMinder

I don’t remember doing anything special when I moved to Docker 20 though I vaguely remember needing to do something during one of the Docker updates. It was easy enough to script and I was done in about 15 minutes. Maybe that was it?

jonnydev13 · September 27, 2023, 12:56am

I skimmed the blog post fairly closely. It looks to me like the community edition of influxdb will still be a great option.

Edge is sort of the stripped down minimal open source core
Community edition is the open source version with extra features that make it a time series db that auto compacts and deletes and things like that similar to earlier versions.
Enterprise edition will have the features like sso and rbac and things that are important to companies with more users, which would tend to be overkill for small home users.

I don’t want to discourage your research into other options, but if you like influxdb it sounds like there is still a pretty safe path forward with it.

Side note: it sounds like what they are trying to do with influx is make it work similar to Grafana Mimir, which is a backend for storing metrics in object stores like S3 or minio. A metric database like that could store historical persistence data and works with Grafana, although I’ve only worked with it in microservice mode which is way overkill for home users.

On a related note (to grafana mimir), opentelemetry collector is a database-independent pipeline builder that could be used to pass telemetry data (like metrics) to any of the many supported backends. If it was set up as a (write only) persistence service, it would be really easy to try a bunch of different options without touching openhab config again. Not exactly a perfect fit, but I’m curious if anybody has tried playing with it.

Udo_Hartmann · September 27, 2023, 10:51am

Sounds great, though a write only persistence is not very useful from openHAB perspective.
Please keep in mind that, although InfluxDB was integrated to support Grafana in a indirect way (back then Grafana did not support [My-]SQL), but also has the benefit to provide historical data to openHAB itself. A “one way persistence” would be sort of automatic “export function” maybe the classic naming would not fit.

jonnydev13 · September 27, 2023, 1:05pm

Agreed. I realized as I was writing that last post that if openhab can’t access the data that it sends, it really limits the usefulness unless you really just want to use an external viewer such as grafana.

Also, openhab is pretty easy to configure things in. Configuring a new persistence service in openhab is basically the same difficulty as doing it in opentelemetry collector. So you’re not really gaining much.

I guess the only gain you’d get is support for services not directly supported by openhab. But again you’d have to use only external tools to access that data because openhab can’t pull it back from the collector. Overall not super useful.

HamberGo · November 23, 2023, 2:42pm

wonder how these components best composed for a home lab ?

rlkoshak · November 23, 2023, 6:03pm

I’m not sure I understand the question. Isn’t the whole point of a home lab to experiment with such things? There is no “best”. There is what works best for you.

Personally I use Ansible to set everything up. I have an aging desktop format Lenovo server running Proxmox with two VMs. One runs most of the media related services and PostgreSQL. The other runs most of the home and server automation stuff and Zabbix. I’m low on RAM so I’ve moved Heimdal, Vaultwarden, and Semaphore to my RPi 4.

Aliaksandr_Valialkin · June 27, 2024, 11:17pm

Take a look also at VictoriaMetrics as an alternative for InfluxDB. It is already used in HomeAssistant as long-term storage for collected metrics.