Best Persistence for Graphing and Impulsed Data?

Matej_Kotnik · January 25, 2020, 1:02am

I had trouble with rrd4j when displaying short impulses or digital values on graphs, it didn’t seem accurate at all it always created awful graphs, was it misconfigured?.

Awful Graph with rrd4j

And yes I did use everySecond trigger. But the graph looked like saw teeth rather than a digital signal.
I Successfully graphed ie. temperature though, but also it can’t store strings.

Then I tried Influx DB as a persistence database, it generated awesome accurate graphs, but when OpenHab is storing to the database InfluxDB process is utilizing up to 50% of CPU… That doesn’t seem acceptable.

Nice Graph with InfluxDB

Graphs don’t show exact same thing but both should be square wave-ish.

Discussion about optimizing influx DB is here: Why is Influx DB using 50% CPU…it is too much!

Suggestions for suitable Persistance service can be discussed here.
Appreciate any tested ideas or experiences with DBs.

Thanks in advance, Matej

rossko57 · January 25, 2020, 1:25am

rrd4j by design compresses older data. It’s the whole point of it. If you do not want averaged data for past periods, then don’t use it.

Matej_Kotnik · January 25, 2020, 1:59am

Yes but it behaves like choping ie. first second of every minute of data and pull average in between so if impulse is narow in the midle of the minute it is skiped. I can’t trust it for anything else than continual measurments like temperature with slow changes.

And please don’t suggest what not to use, so anythng that works for short pulses, OK.

Thanks, Matej

rossko57 · January 25, 2020, 2:25am

What not to use; openHAB. It’s not a real-time system, it is never going to be optimal at handling per-second data changes. Pipe the data to a specialist data logger service. Use specialist charting, like grafana.

Features like the built in charting and rrd4j persistance are meant for people plotting things like greenhouse temperatures, using an under resourced Pi. Deliberately minimalist and low impact.

opus · January 25, 2020, 7:02am

I would conclude differently:
If you don’t want the compression use a custom setup that doesn’t compress! Like only a single archive.
Can’t say how rrd4j is working with every10seconds since the docs state that everyMinute is required.
@Matej_Kotnik
You are comparing graphs from rrd4j and influxdb, assuming you are using the default setup the graph from rrd4j is using compressed data, try a timeframe of up to 8 houres.

Matej_Kotnik · January 25, 2020, 1:37pm

So what specialist data logger service would work nice MySQL, Prometheus?..
I guess that InluxDB would load CPU the same as now. If I would be able to optimize it? I didn’t find the influxdb logs yet, but storing frequency seems right.

Yes, I have about 100 items but some of them are stored every 10s or every minute, still the same CPU load?

Grafana is nice and I knew it exists, only now I found out it can be incorporated in sitemaps, nice.

opus · January 25, 2020, 2:44pm

I guess you won’t get an answer that suits you as long as you don’t get your requirements straight!
Complaining about a persistence service because it creates an “awful graph” without knowing what is causing this disliked graph doesn’t help. Don’t you like the standard graph that come with OH? No problem, install grafana, study it and make your own graphs, but don’t complain about a rising CPU usage when installing more and more applications.
If it is the data that is persisted study what the persistence service has done, using a REST call for the same timeframe will show all the numerical data that is printed on the graph and decide yourself what went wrong.
Reading your last post it is not clear what you are looking finaly for, a database or a visualisation application?

rlkoshak · January 25, 2020, 5:56pm

Because most users use everyMinute plus everyChange I always interpreted it to mean “at least” everyMinute but more frequently was fine.

@Matej_Kotnik, I’m sure you are aware of what settings there are that controls how OH writes to the database, which is pretty minimal. So your options seem pretty clear to me.

Go to an InfluxDB forum to see what options you have to find tune the server and get better performance out of it.
Experiment with other databases like MySQL, PostgreSQL, MongoDB, etc. They all are supported by OH.
Move your database to run on some other machine.
If after doing 1 they recommend changes to how OH interacts with the database, file an issue and submit a PR if you after able to make OH write to InfluxDB more effeciently. Alternatively, move the wiring of this data to sooner other external script that you can write in a way to lessen the load on InfluxDB, using the above from 1.
Move the whole setup to a more powerful machine.

Ultimately what rossko57 stated (I think on the other thread) is correct. OH is not designed and is unsuitable to be used in real time, industrial, or any other case where it needs to process lots of data with guarantees in transactions, timing, and order. I think it can handle your use case as described just fine on a more powerful machine. But I wouldn’t want to guarantee it.

If you want to do the work to make it work with InfluxDB, you need to do the math on the number of transactions pet second and go to the InfluxDB forum’s to figure it how to optimize for that. It you need to experiment with other db services to see how they perform. No one else in the world is running OH + DB no on your hardware with your persistence requirements. We can’t tell you what to do. You have to experiment and find the best option for yourself.

Matej_Kotnik · January 25, 2020, 6:03pm

@opus
Ok to clarify, since 14 days ago I have only Influx DB running no other persistence services.
I am mainly looking for the database or the way to optimize Influx DB to use less CPU.
Database requirements :
- The ability for storing digital output state history, measurements like temperature, and short strings.
- persistence service with accurate data outcome (so either without data compression or with
compression that keeps accurate values
- the one with more configurable data compression and averaging than rrd4j so it shows digital signal state accurately ( 0 stays 0 and 1 doesn’t become 0.3 overtime for example))

I would be satisfied with InfluxDB and OpenHab graphs if influx DB would not take up to 50% of CPU.
I am still trying to find out the reason for the CPU load regarding that I am not seeing lots of disk activity (0 - 800KB/s in some peaks).

You say using a REST call for the same timeframe will show all the numerical data that is printed on the graph the REST API call is?

Matej_Kotnik · January 25, 2020, 6:09pm

@rlkoshak
Oh thanks, for your in-depth information, it means a lot.

I will go through the process of testing / optimizing with your guidelines and report back in a few days how things went.

Matej

opus · January 25, 2020, 6:19pm

This was talking about the usage of rrd4j. The default setup of rrd4j uses 6 archives, where the last 5 keep the AVERAGE of the values. If you do not want averages there are other consolidation functions available, like MAX,MIN,LAST. You might even setup rrd4j for no consolidation by using a single archive.
But all that is just talking about rrd4j …

Matej_Kotnik · January 25, 2020, 6:38pm

Ok, rrd4j might be usefull on RPI3 at weekend house, if configured accordingly, i may also try it out on the test rig.

m4rk · January 25, 2020, 7:02pm

Maybe of interest to the op. Here is my plot of valves open 1 and closing 0 for my heating system using Grafana Heat map chart type

Matej_Kotnik · January 25, 2020, 7:09pm

Nice, what database are you using?

Matej

m4rk · January 25, 2020, 8:15pm

Influx

rossko57 · January 26, 2020, 2:11am

Note that radiator valves do not open and close several times a minute with duration of a second or two.

Matej_Kotnik · February 3, 2020, 9:39pm

Actulally InfluxDB is my to go choice.
Regarding the trouble I had with it and the solution check out this Why is InfluxDB using 50% CPU.

Thanks a lot for advice

Matej