Delete all InfluxDB measurements except for the most recent for an openHAB item

Hi there!

This morning I was looking for a way to remove unneccesary data records (measurements) from my InfluxDB time series database.

I have a lot of Items for which I only need the last state stored so that those items get their last known value restored when restarting openHAB. That is, for those items I don’t need more than a single value for each item stored. It turns out that I have thousands of measurements that serves no purpose.

All of my Items belongs to an openHAB group that I’ve set up to make the persistence working, e.g:

Group G_PersistOnChange // Persist on change and system start up
Group G_PersistOnUpdate // Persist on change and system start up. For temp, humidity sensors etc.

First I tried out the restoreOnStartup predefined strategy which was supposed to load and initialize the last persisted state of the Item on openHAB startup. After some trial and error I realized that restoreOnStartup doesn’t store anything in the db itself. If I used it alone, nothing was saved to the db hence nothing was restored on startup. The strategy seems to work only if I combine it with another strategy like everyChange or everyUpdate.

I also learned that I can not have different InfluxDB retension policies for different openHAB items.

I thought it would be nice to be able to define on item level for how long time I want to keep the data stored. I made a small python script that can take care of deleting all measurements but the latest for an openHAB item.

purge_influx_data.py

import json
import sys

import requests
from dateutil.parser import parse # Install this by issuing "pip install python-dateutil"
import argparse
 
# create parser
parser = argparse.ArgumentParser()
 
# add arguments to the parser
parser.add_argument('-itemname', type=str, required=True, help="The openHAB item name for which all measurements except the most recent one shall be deleted")
parser.add_argument('-db', type=str, default='openhab_db', help="The InfluxDB database name")
parser.add_argument('-hostname', type=str, default='localhost', help="The InfluxDB server host name")
parser.add_argument('-port', type=int, default=8086, help="The InfluxDB server port name")
 
# parse the arguments
args = parser.parse_args()

def is_date(string, fuzzy=False):
    """
    Return whether the string can be interpreted as a date.

    :param string: str, string to check for date
    :param fuzzy: bool, ignore unknown tokens in string if True
    """
    try:
        parse(string, fuzzy=fuzzy)
        return True

    except ValueError:
        return False

url = 'http://{}:{}/query'.format(args.hostname, args.port)

params = dict(
    db=args.db,
    q='SELECT LAST(value) FROM "{}" ORDER BY time DESC LIMIT 1'.format(args.itemname)
)
response = requests.get(url=url, params=params)
response_dict = json.loads(response.text)

try:
    last_date = response_dict['results'][0]['series'][0]['values'][0][0]
except KeyError:
    print('The InfluxDB data for item: [{}] was not found'.format(args.itemname))
    sys.exit()

if not is_date(last_date):
    print('Last date: [{}] is not recognized as a date'.format(last_date))
    sys.exit()

query= 'DELETE FROM "{}" WHERE TIME < \'{}\''.format(args.itemname, last_date)
print('Running query: [{}]'.format(query))
params = dict(
    db=args.db,
    q=query
)
response = requests.get(url=url, params=params)
response_dict = json.loads(response.text)

db_result_code = response_dict['results'][0]['statement_id']

if db_result_code == 0:
    print('Query ran successfully. All InfluxDB measurements except the most recent are now deleted for the openHAB item [{}]'.format(args.itemname))
else:
    print('Query returned result code [{}]'.format(db_result_code))

I can now assign a group to all Items for which I want all InfluxDB measurements except the most recent one to be purged. After doing that I just make a simple rule that runs at regular intervals.
I use a jython rule, you might do it in any flavour

import os
from logging import DEBUG, ERROR, INFO, WARNING

from core.rules import rule
from core.triggers import when

@rule("Purge Influx Data", description="Deletes InfluxDB measurements except the most recent for openHAB items belonging to the group gPersist_Last_Value.", tags=[])
@when("Time cron 3 11 10 ? * MON *") # Run at 10:11:03 every Monday
#@when("System started")
def purge_influx_data(event):
    purge_influx_data.log.setLevel(DEBUG)
    '''
    Deletes InfluxDB measurements except the most recent for openHAB items belonging to the group gPersist_Last_Value.
    '''
    purge_influx_data.log.debug('Checking the group gPersist_Last_Value for items to delete measurements except for the most recent one')
    for item in ir.getItem("gPersist_Last_Value").members:
        purge_influx_data.log.debug('Deleting unneccesary measurements for item [{}]'.format(item.name))
        cmd = '/usr/bin/python /openhab/conf/scripts/purge_influx_data.py -db {} -itemname {} -hostname {} -port {} 1>/dev/null 2>&1'.format('openhab_db', item.name, 'MYHOST', 8086)
        #purge_influx_data.log.debug('Running script: {}'.format(cmd))
        os.system(cmd)

It does what I expect from it. It works here but if you decide to try it out you do it at your own risk and you’ll need to take a backup of your database first of course.

I’m thinking that it might be a nice idea to alter the script so that I choose to preserve data for an arbitrary number of days… thinking about it.

Please feel free to comment on this.

Cheers! :grinning:

Look here, maybe this is what you need?

1 Like

That’s true, but in many cases you can use the continuous query to move from one retention policy to another after it is collected. For example, I gather temperature data as it changes, and it goes into the 1 month retention policy. Then I use a continuous query to get the min/max daily and store those forever.

Also, you don’t have to use the persistence to store the data to InfluxDB. It is possible to POST directly from a rule. In that case you have control over the retention policy.

There’s some work on a new InfluxDB persistence binding here that mentions retention policy specification.

1 Like

Thanks! That’s exactly what I need for most of my openHAB items. I’ve added mapdb as an additional persistence service and I’ll update my items files to use mapdb persistence where appropriate.
:grinning:

Thanks. I’ll have a deeper look into that too as it might be useful for some items.