Migrate rrd4j data to InfluxDB

I suspect this is not possible but after running OH for more than 2 years I’m about to start tinkering with InfluxDB as a persistence service.

Does anyone know if it’s possible to import RRD4J data into an InfluxDB format?

Think of it. The rrd4j database is holding values for every minute only in the first archive. All other data, especially for a period up to a year, is somewhat compressed ( average, max, etc.).
How would you want to put that into a database that handles that differently. InfluxDB has a compression as well but it is differently handled.
I’d say it doesn’t work.

Theoretically, it could be done but it will be so much work as to only be worthwhile in extreme circumstances.

The problem isn’t the loss of precision caused by rrd4j compression as it is all just numbers with a timestamp. The problem is there are no tools available for working with rrd4j on the command line so you would have to write a Java program to export the data and a script to import it.

1 Like

It would still be great if anybody could write such java code.

Grafana (with influxdb) is such a great addition to openhab.
But it’s sad you lose years of data, just because there is no conversion tool. :disappointed:

Actually, I am onto something… :slight_smile:
I created a bash script that reads all available values via REST from openhab.

I am just having trouble to import the values to influxdb, because I am new to it. :slight_smile:
Hopefully I can post a working script tomorrow. fingers crossed

1 Like

Nice work! Even just getting them out to CSV or something would be great and then we can work on the next step…

Maybe this information can help you

You might find something useful in @ThomDietrich’s python script to insert a sign wave into InfluxDB which he posted as part of the InfluxDB+Grafana tutorial.

Here is the first version of my script.

The script is not pretty, but you only need it once to migrate all your data. :wink:

Possible problems:

  • I am not sure if the defined timestamps are correct to read as much data as possible
  • I am running influxdb as a docker conatiner on my raspberry pi.
    After importing four items with ~1000 values each, the container uses ~80% of my pi’s memory. :frowning:
    After restarting influxdb, the memory usage gets normal again.
    If you run the databse on a little system like the pi, I recommand stopping other services for the time being. My system got almost unresponsive last evening.

Very nice. And this could work with any OH database, not just rrd4j.

One thing this does rely on having your source persistence service be the default one in your system. If you wanted to make this a little more generic you could add the serviceId parameter to your config and REST calls to avoid this potential gotcha.

I also notice that you could theoretically put the data back into the new service through the REST API as well instead of only supporting InfluxDB. That would make this completely generic which would be pretty useful I think to the community overall.

In any case, even if you make no more modifications to it, I highly recommend posting this as its own topic in the Examples and Tutorials section (I think that is a good category) and perhaps add an entry to the FAQ that points to that new thread. That will ensure you have the maximum exposure for the script.

I never seem to think to look at the REST API to see if this sort of thing is possible. Great work!

This would greatly impact the complexity of your script, but one way to avoid this could be to chunk the data so you are not forcing InfluxDB to process 10 years worth of every minute data all at once.

That is bit of misunderstanding, that script assumes the default setup of rrd4j. The data in the last archive has a single value for one week (10080 minutes ) and holds 520 values.
Additionally the script does not take alle data from the rrd4j files. For example the last archive holds data from Now to 10 years ago, the script does take only the values from one year ago to 10 years ago!

I know that, but a minor change to specify the serviceId would make this script work for any persistence engine. So one could use it to go from sqlite to InfluxDB, MySQL to InfluxDB, etc.

My point was that the script basically imports the entier rrd4j database into InfluxDB in one transaction. This is likely why he sees the RAM usage jump so high and see his machine become unresponsive sometimes as InfluxDB struggles to parse and store all that information. If the data is chuncked into smaller sets and put into InfluxDB one chunk at a time the RAM usage should be much lower and the machine shouldn’t become unresponsive.

For example, instead of loading up all 10,000 records (not an unreasonable amount I think) in one transaction, load 1000 records at a time.

I added the serviceId. Good idea!

And I will try to split the import into smaller parts like @rlkoshak suggested.
Let’s see if that will solve the memory issue.

But I don’t think I will spent time to support other target databases.

Splitting in parts with 100 lines per file is done and seems to prevent the memory issue! :sunglasses:

Yes, your comments in view of a generic approach are absolutely correct. I hope I didn’t sound unpolite.
However when dealing with rrd4j, one has to deal with the possibility the the db was used for its speciality.
rrd4j holds several archives, which all start at the timestamp now and go back different time gaps. The code shown puts the default archives into one stream of data., although InfluxDB is able to handle the data in the same manner as rrd4j.
User with a cos tom setup of rrd4j will their desired data-structure.

I would assume that anyone wanting to move from rrd4j to InfluxDB would understand they are moving to a different type of database and their custom compression approach rrd4j uses does not exist for InfluxDB.

I suspect 99% of such users are using the data for charting anyway, in which case the difference is not that big of a deal. The charts will look the same with both.

1 Like

:+1:I’m probably overcomplicatIng it.

I created a new post in the tutorials section.