For those who do or want to use influxdb but are afraid that the database might grow uncontrollably, I found a way to let influxdb behave like a RRD (Round-Robin) database by grouping and downsampling data.
By default, influxdb stores everything in full resolution forever, but you can define your own retention policies and continuous queries to downsample older data.
I cannot tell how well that will work in the long run (only have been using influxdb for a few days now), but I thought I’d share how I did it:
BTW, the scripts on that site are not entirely correct, a semicolon has to be added at the end of every command. For your convenience, I’ll post my scripts, feel free to change the retention times to your liking.
And when deleting the autogen retention all data associated with that retention policy is deleted, so if you already have your database with data in it you’ll need to dump the data and re-insert it.
CREATE DATABASE openhab;
CREATE RETENTION POLICY "1m_for_24h" ON openhab DURATION 24h REPLICATION 1 DEFAULT;
CREATE RETENTION POLICY "10m_for_7d" ON openhab DURATION 7d REPLICATION 1;
CREATE RETENTION POLICY "30m_for_30d" ON openhab DURATION 30d REPLICATION 1;
CREATE RETENTION POLICY "1h_for_365d" ON openhab DURATION 365d REPLICATION 1;
CREATE RETENTION POLICY "1d_for_3650d" ON openhab DURATION 3650d REPLICATION 1;
CREATE CONTINUOUS QUERY "cq_1m_for_24h" ON "openhab"
SELECT mean(*) INTO "openhab"."1m_for_24h".:MEASUREMENT FROM /.*/ GROUP BY time(1m),*
CREATE CONTINUOUS QUERY "cq_10m_for_7d" ON "openhab"
SELECT mean(*) INTO "openhab"."10m_for_7d".:MEASUREMENT FROM openhab."1m_for_24h"./.*/ GROUP BY time(10m),*
CREATE CONTINUOUS QUERY "cq_30m_for_30d" ON "openhab"
SELECT mean(*) INTO "openhab"."30m_for_30d".:MEASUREMENT FROM openhab."10m_for_7d"./.*/ GROUP BY time(30m),*
CREATE CONTINUOUS QUERY "cq_1h_for_365d" ON "openhab"
SELECT mean(*) INTO "openhab"."1h_for_365d".:MEASUREMENT FROM openhab."30m_for_30d"./.*/ GROUP BY time(1h),*
CREATE CONTINUOUS QUERY "cq_1d_for_3650d" ON "openhab"
SELECT mean(*) INTO "openhab"."1d_for_3650d".:MEASUREMENT FROM openhab."1h_for_365d"./.*/ GROUP BY time(1d),*
Do I need to delete the autogen retention policy? I can’t see a command that would do so in your example, so is it safe to run this without losing all the data?
Blast from the past
You don’t have to. You just need to make sure that the default retention policy (where new data is put into) has the correct duration and replication set.
In hindsight, I’d advise against calling your default retention policy “1m_for_24h”. That’s fine for either influxdb or grafana, but openhab seems to have a problem with the underscores. Meaning that I currently can put data into the retention policy, but the openhab scripts for retrieving historical data (20 minutes ago or something like that) quit working as of OH 3.0. As few people renamed their default retention policy, the problem doesn’t get addressed. That obviously applies to any retention policy you want openhab to retrieve data from.
On the other hand, influxdb has had the request for the ability to rename retention policies in their books for years, and didn’t do anything, either.
@Tron So with a default influx installation I could just leave the autogen policy there and create the other but in camelCase or something, is that correct?
So does the DEFAULT at the end automatically override it or do I have to edit autogen?
What is CamelCase?
I do not know how the standard autogen retention is defined, but AFAIK it doesn’t expire so you have to modify it:
How it works is basically this: every retention policy keeps its data for the DURATION and expires data after that. So before that happens, the data needs to be condensed and put into the next policy. That’s what continuous queries do, they group that data into mean values, corresponding to the retention policy’s resolution, and put it into the next retention policy. The DEFAULT just specifies into which retention policy new data goes, which would be the first one.
thanks for sharing this.
How do I then access the Data in Grafana? Do I need to define my Dashboard so that it selects the 30m_for_30d Data
SELECT mean("value") FROM "openhab.30m_for_30d.path.to.mydata"
or can I use something like this:
SELECT mean("value") FROM "openhab.*.path.to.mydata"
And it would use the finer data for the more recent timepoints and the coarser data for the more historic data?
the former. What I do is I have separate dashboards in Grafana for daily (most important), monthly, yearly etc. data. That’s what the timeframe in Grafana is set to, then.
The influx queries then go to the appropriate time range policy.
SELECT last("mean_value") FROM "10m_for_7d"."item" WHERE $timeFilter GROUP BY time($__interval)
Note that the continuous queries iterate over the mean values from the previous retention policy, so for the 24h-data it’s “value”, for the weekly data it’s “mean_value”, monthly is “mean_mean_value” and so on.
I mentioned earlier that openhab doesn’t like underscores in the retention policy names, so I’d avoid them. Too late for me, though, since one can’t rename a retention policy in influxdb.