Multiple Sensor-data pushing into OH3 via JSON

binderth · January 3, 2021, 1:51pm

I have a few sensors, which give out a huge amount of datapoints.
I could easily write the updates so, that for each datapoint the REST API gets a separate call, ending up on like a few dozen calls after another (which at some point leads to OH-REST not getting all item updates and I had to make sure, there’s a few msecs between the calls)
on the other hand, I can use HTTP binding to fetch a JSON and then easily update item after item - but if the data is real-time dependent, that’s a no-go…

both seem a bit inefficient, is there a better way for mass datapoints, I can import into OH3? splitting the JSON in a rule is kinda double-effort, because I already concatenated the values in my scripts in the first place!

hafniumzinc · January 3, 2021, 4:39pm

How huge is huge? What’s the frequency?

If it’s JSON you’re spitting out, you could publish to MQTT? As with the HTTP Binding you would create a Thing and Channels to extract the data using the JSONPATH Transformation Service, but unlike the HTTP Binding you would receive the data as soon as it is published, rather than when the binding polls…

rossko57 · January 3, 2021, 4:54pm

You might start by reviewing this part. Why a huge amount? There’s no point in sampling room temperatures at seconds intervals for example. What is it that you are looking at, what control do you have over it?

binderth · January 3, 2021, 5:13pm

OK, to give you a feeling:

realtime: Nuki Smartlock with 8 datapoints
realtime: Ekey fingerprint sensor with 11 datapoints
every 30secs: my heating controller with 84 datapoints, such as diverse temperatures of boilers, layered storage tanks, states of valves, my solar heating, etc. pp.
every 30secs: NIU e-scooter API with 120 datapoints
every 5mins: my weatherstation with 16 datapoints
every 30secs: diverse oneWire information (water consumption, gas consumption, temperatures, wind)
…

the realtime datapoints I need for seemless integrations of hardware, which is not (or poorly: ekey and Nuki) supported via bindings, the other datapoints I need for non-vital reactions and historical analysis.

all integrations are more or less own coded and I’d like to reorganize the way I use those information within openHAB3. Currently it’s a mix of old scripts (PHP, python), which use REST-API or MQTT for each item (with a 250msec pause in between!), and some new scripts (python), which I tried to put together a JSON, and either openHAB polls the JSON and imports the values via HTTP-binding or the scripts put the JSON into a single openHAB item, which then updates the items via a rule.
The latter for the realtime-items for obvious reasons!

So, I’ve got the feeling, there must be some more elegant way to do all of this in a much more straightforward - and faster(?) way… some rules take like 2-3 seconds until I see a reaction, which is not quite the way I like to e.g. open my door

rossko57 · January 3, 2021, 7:43pm

Sounds like you’ll be running at something like 1,000 updates a minute. That’s not ridiculous, but openHAB is a home automation system, not optimized as a datalogger, so you’ll need to apply care.

Hidden overheads - OH3 defaults to persisting everything. The overhead there is writing I/O to database, more costly than any processing. You’d probably want to review what you persist, when, and to what db.
Logging has I/O costs too, and can be optimized.

Try to avoid polling, it’s just more work. If a device has something to tell openHAB, tell it. Otherwise, don’t.

That’s bonkers. Consider how you might do pre-processing in some way to whittle down to reporting stuff that changes.

Agreed, unacceptable. Multitude of possible causes though. Not even a hint yet of what hardware you are expecting to do this job.

binderth · January 4, 2021, 9:57am

Pretty good ideas in here, but just a few thoughts:

I need most of that information in openHAB, because multiple datapoints tell me, how the battery of the E-Scooter is doing and how I can charge it, or stop charging, etc.
It would be nice, that I only send changed data, but then again: why do I need openHAB as a central mean of managing my data then? So I had to introduce a seperate database for my scripts, so they only send changed updates… That seems double or triple effort, just to save a few persistence costs?

I use a MariaDB on my synology, no costs for openHAB except telling the DB the values.

Yes, e.g. many regular updates could be avoided to be logged, but first I need to have my newly migrated OH3 system stable enough.

That’s a Rasperry Pi 4, 4GB RAM on SSD harddrive. Should be enough, I don’t see any special lagging, I suspect in that special use case, that the Nuki Binding, which is very unreliable, is causing this delays.

rossko57 · January 4, 2021, 2:34pm

I don’t know, why do you? As I said earlier,openHAB is a home automation system and it is not optimized as a bulk datalogger. Obviously it is general purpose and can do many things along those lines, but it is not the best tool for managing bulk data.

rlkoshak · January 4, 2021, 11:09pm

One could just as easily ask why send data to openHAB that openHAB doesn’t need? I agree with rossko57, if there isn’t actually anything that openHAB can do with the data, it doesn’t need to know it. openHAB isn’t a data logger and, as you can see, making it work as one is going to take more work than pushing the data into a database and using analysis tools directly on that is going to be. Send data that OH needs to make home automation decisions, that’s all.

If persistence is the reason why it can’t keep up with the data rate…

Which can be significant, especially if it’s transmitting those across a network.

This gives me pause and maybe you are trying to use the term generically instead of as a term of art. I say this because openHAB is not and never will be a real time system. If you just mean “really really fast” then OH can do the job. But if you really mean “real time” you need to look elsewhere. As a term of art, “realtime” is a computer system built from the ground up such that if certain operations are not successfully completed within a given time frame that’s a major error and fault. For example, you don’t want your breaking system to not engage because someone is streaming music on the entertainment system. openHAB can’t do that.

Anyway, if you want “really fast” than you need to push. So your scripts must publish the messages. You shouldn’t need delays between messages when publishing to MQTT, especially if you use QOS 1 (at least once) or QOS 2 (exactly once). I could see messages getting lost though with QOS 0 (at most once) so if you are using that, you can probably fix the problem just by adjusting the QOS.

Were I to design a system to handle this I would:

use MQTT with QOS 1 which will be a little faster than QOS 2 though at the risk of receiving a few messages more than once
publish each piece of data to it’s own topic and only publish when there is a change worth reporting; you want to if possible filter out noise at the device if possible (e.g. the way a thermometer will bounce back and forth a quarter of a degree on every reading)
use retained messages where appropriate, especially if it means you can publish a value less frequently, for example a temperature setpoint changes only rarely so publish it as retained and no matter if OH was connected when the setpoint last changed it will still get the message when it connects.

If you’ve a bunch of rules being triggered by this mountain of data than I’m not surprised by the delay and reducing the data to only that data that OH needs when OH needs it will help with that too. In OH 2.5 I would expect that to occur because you’ve used up your 5 rules threads and that’s the backlog of events to process off. In OH 3 I would expect that to occur if you have a single rule that is being triggered really fast as in OH 3 each trigger will have to wait for the rule to finish processing the previous trigger before it get’s handled. With this much data coming in this fast I can easily see it falling behind.

My favorite acronym from Heinline’s Stranger in a Strange Land is TANSTAAFL: There Aint No Such Thing As A Free Lunch. Because openHAB is not a data logging system and it is not a real time system “why do I need openHAB as a central mean of managing my data then” and “That seems double or triple effort, just to save a few persistence costs” means you are asking for a free lunch. You can’t use OH in a way it was not designed to be used and get reasonable performance and results without doing double or triple the effort.

This may be backwards. You can’t get an OH3 system stable enough until you avoid those regular updates that could be avoided.

binderth · January 4, 2021, 11:42pm

I’ll try to wrap my head around this tomorrow.

But yes, it’s “really fast”, but also close to realtime in my experience with OH2. I had like 1200 items, a whole bunch of DSL rules and everything worked fine and smooth - even with the RPi3 and even more so since I moved to RPi4/4GB. So I don’t suspect the hardware or the network (gigabit, the synology and the OH standing right next to each other).

Currently I’m at 760 items in OH3, the semantic model giving me more groups I ever had and I’m only at the basic dozen or so rules. The 3secs are definitive due to the Nuki binding, I moved to direct API calls and I only see the buggy/lazy API in the Nuki bridge slowing things down.

But I want to

display the data in my visualisation
use the data to do stuff

That’s why I need them in OH… perhaps I can reduce the load with an ESB-kinda layer in between, which covers all the pulling and comparing and only updates changed items… I’ll have to sleep over this!

Thanks for your input, both of you!

rlkoshak · January 5, 2021, 4:35pm

But you don’t have to send data that hasn’t changed to display it any use it except in rare circumstances. I think that’s the main point.

Also, pay attention to the way rules work in OH 3 as that might come into play here. Unlike in OH 2.5 where when a rule is triggered a new thread is pulled from the thread pool to run it, even if it’s the same rule. But in OH 3, each rule gets it’s own thread, but that means that only one copy of the rule can run at a time which means the triggers will queue up if it’s a long running rule or it’s processing a whole lot of triggers that came in really fast. That can lead one to need a different approach. For example, OH 2.5 might be able to process big JSON messages coming in very fast because it can process them in parallel with the same rule. But in OH 3 it might be better to break the data up a lot more and process the data in separate rules (or with no rules at all).

binderth · January 5, 2021, 6:10pm

I just installed Node-Red today and played around a bit with it. From the looks of it, I think I will change all my internal sensors without a binding to push to Node-Red or let Node-Red pull, where’s no push functionality with the sensors. Node-Red has a “report-by-exception” method, and thus will only send items (via MQTT is the best way I think!), when they changed. So I just have to rearrange my items AGAIN after switching to OH3, but hey - what else can we do in lockdown…

binderth · January 6, 2021, 11:16am

ok now, so as I never ever played with Node-Red before, this i pretty impressive. I not only managed to setup a Node-Red within literally minutes on a Pi3, I also managed to rewrite my polling scripts for three of my sensors “families” (one Rest-API and one RS-232 byte serial connection) and link it to Node-Red, where all my sensor data without a dedicated binding will land.
Node-Red then sends only changed values to openHAB per MQTT, where I do have dedicated MQTT-Things, which then bring the magic on!

I’m about to think, it’s not necessary from the perfomance perspective (my Main openHAB runs on a Pi4/4GB), but it’s

fun to do
great way to spend a night in lockdown
and clears my openhab.log