Importing data into rrd4j

rlkoshak · December 17, 2020, 6:06pm

I’ve a question about rrd4j which I think I know the answer to but want to ask all the rrd4j experts out there (@opus). Given that rrd4j is the default for OH 3 now I suspect there will be people who want to do the opposite of Migrate rrd4j data to InfluxDB.

Let’s assume that someone writes a version of @christoph_wempe’s script that is smart enough to strip out the Items that rrd4j doesn’t support and it’s smart enough to duplicate readings so we get one per minute (big ifs I know but seems doable).

This will result in inserting a bunch of stuff into the past in rrd4j. When will the data be compressed? Will the data be compressed?

I suspect there will be people who want to migrate back to rrd4j in OH 3 since the charting has become good enough to replace Grafana but some users won’t want to abandon all their old data. It’d be nice to have an answer for those users.

Many thanks!

NOTE: I tried to read the rrd docs but I couldn’t find a search string that would get me to docs that actually answers this question.

opus · December 17, 2020, 6:33pm

Good question!

I’m sorry, but such is above my actual knowledge in regards of rrd4j.
All I did till now was using the rrd-inspector to edit a .rrd file and manipulate manually EXISTING datapoints in each archive separately ( pain in the …). So writing into an precreated.rrd file is possible, the data-consolidation however has to be done manually( at least I think so).
Howto fill a rrd file from an exiting influxDB automatically, I don’t know (yet??.) Especially since influxDB could also have data customised consolidated, in other words neither the source database nor the destination database may have standard setup!

johannesbonn · December 17, 2020, 6:45pm

Hello @rlkoshak,

That’s exactly my vision for my data future in oh3, but to realize this by myself I have not enough programming skills and database knowledge. Actually I am trying to uso influxdb and rrd4j but I have starting problems to use influxdb with oh3 (see Using my old oh2 influxdb database for oh3?).
Important for me is the data history.

Thank you for triggering this!

opus · December 17, 2020, 9:59pm

Found a “Demo” in the rrd4j repo that creates a complete .rrd file with different archives using selectable timestamps. The data consolidation is done automatically while writing sequential data. That ounds promissing!
Have to digg a bit deeper.

opus · December 18, 2020, 2:22pm

@rlkoshak

Some further questions:

The code I found is from https://repo1.maven.org/maven2/org/rrd4j/rrd4j/3.3.1/ and not from a github repository. I don’t see the version required for us (the actual version doesn’t read our .rrd files!) on github.
I am not sure if I should/could post the copied code back into own github repository. Could I get your opinion on that?
Actually the configuration of the .rrd file to be written (i.e. its datasource and archive settings) are set hardcoded in the code file. IMHO a user that wants to move from influxDB to rrd4j should have enough knowledge for doing such. WDYT?
My knowledge in the use of databases other then rrd4j is very limited. How should/would we get the data (timestanp and value) out of a (influxDB) database in a sequential order? REST API? I would probably need some help on this part.
Actually the data consolidation is done while writing into the .rrd file, that does sound promissing. What bugs me ATM is that the we will not have all the data to fill the .rrd (for example we do not have the data for every minute 2 month ago). I’ll have to set up a testing environment with such data in order to see how a newly created .rrd file would look with such lacking data. In the worst case missing data points would have to be copied from the consolidated values (could work for AVERAGE, MIN,MAX,LAST but not for SUM). I’ll look into that.

rlkoshak · December 18, 2020, 3:21pm

I don’t know. It’s not clear who owns the files at that location. Often times people will do the right thing and put copyright and licence type information into jar files in the manifest file but I looked at a couple of jar files and there and it just says it was built by maven and stuff like that.

We know that rrd4j is open source with a compatible license so it should be OK to post it here. But I’m not a lawyer and can’t say for sure.

My hope was that at some point down the line someone (perhaps me) could write a script that makes some reasonable assumptions (e.g. default rrd4j config in OH) to do like Christoph’s script does, using the REST API. The only hardish part of that would be that the script would need to fill in between readings so that there is at least one per minute.

In that case, most users wouldn’t need to be all that knowledgeable about either database. They’d just need to run it and cross their fingers. And for those users who have changed the rrd4j config from the default settings, well they should know enough to be able to modify the script so it works.

Yep, to make it generic I’d use the REST API. Christophe’s script does exactly that if i recall correctly. The biggest issue is that the other databases don’t have any restriction like needing to be saved every minute, so the script will have to fill in for missing entries. If it were not for that I think the existing scvript would work out-of-the-box for this.

Awesome!

opus · December 18, 2020, 4:27pm

I wasn’t clear enough on that point. As is the code creates a setup, that could be changed to use any of the present defaults ( either the one used for OH2 or one of the two defaults on the OH3 version). I would put them all in commented lines into the code to make easy to select one ( or ask for a user selection via an input?).

rlkoshak · December 18, 2020, 4:30pm

Either should be workable. We just need to make it understandable which option is correct for a default OH 3 instance. Most will not know which one they are using.

opus · December 18, 2020, 4:34pm

Yes, and they probably don’t know how influxDB is peeristing as well. I hope the default is without a rentention policy ( if I remember the term cofrectly).

rlkoshak · December 18, 2020, 4:45pm

Well, even if it does have a retention policy, the data that is there is the only data that is there and it will be returned by the REST API the same whether there is a retention policy or not.

The retention policy tends to delete data as it gets older or decimate the data like rrd4j does. But the user would have had to set that up so they should know about how it was deleting/aggregating data as it aged. And really, if someone went through the trouble of configuring a retention policy, they probably had a compelling reason to and that reason will probably make them not want to move to rrd4j anyway.

opus · December 18, 2020, 5:38pm

I totally agree, and not only because that assumption takes away a problem!

I did reread about influxDB, it does consolidate data, however in contrast to rrd4j it does keep a single value for a time (whereas rrd4j can have as much values for a specific time as there are archives). In other words, a single REST API call will get all required datapoints!

opus · December 19, 2020, 6:01pm

Interim Report:
item_name

Recreated a .rrd from the result of a REST API call. Actually severall calls, since I only have rrd4j running. Therefore “hand”-manipulated the outcome in order to have a single value for each time and to get a “readable” syntax (to make all that working automatically for a single REST API call is work for tomorrow).
Now I need to increase the WAF again!

opus · December 20, 2020, 5:53pm

I got the code working from within the IDE (VSC), generating a .jar however fails ( Fatal Error: Unable to find package java.lang in classpath or bootclasspath ). For the moment I’m lost!
Uploading the whole rrd4j-3.3.1 package with the new java file onto github isn’t possible due to its size!

In the present setup the user has to set these static vars:

    static final String FILE = "item_name"; //required item name
    static final long START = Util.getTimestamp(2019, 0,1,0,0);  //required Format Year, Month (ZERO based!), Day, Hour, Minute (Example to read 2019 Jan 1st, 00:00Z)
    static final long END = Util.getTimestamp(2020, 11, 1,0,0); //required Format Year, Month (ZERO based!), Day, Hour, Minute (Example to read 2020 Dec 1st, 00:00Z)
    static final String OPENHAB_SERVER = "openHAB_Server"; // required! Name or IP of openHAB-server.
    static final String PERSISTENCE_SERVICE = "Persistence Service"; //required! Name of the persistence service where the data is fetched from!
    static final int ARCHIVE_SETUP=1; // required! Selection either 1 (OH2 default), 2 (OH3 default_numeric) or 3 (OH3 default_quantifiable)

Could any Java expert give me hand on generating the .jar??

rlkoshak · December 21, 2020, 3:44pm

If there’s no response here maybe a new thread under development will get the right attention. I’ve no idea how to do this either.

opus · December 22, 2020, 7:37am

Solved the big problem. The last missing is the correct loading of an external bundle (org.json) and creating the documentation to be uploaded…

opus · December 22, 2020, 8:33pm

I shouldn’t have said the last!
I can’t get the d* org.json import be loaded when creating a .jar! Read uncountable solutions, tried them all with the same outcome: Exception in thread "main" java.lang.NoClassDefFoundError: org/json/JSONObject
The code runs in the IDE, it even starts running from the .jar, but until the JSON should be read!

opus · December 22, 2020, 10:18pm

Reworked the code to get along without JSON objects.
Solution

rlkoshak · December 22, 2020, 11:14pm

Fantastic! Thanks for hard work!