Remove / smoothen unrealistic item values which randomly appear

binderth · May 23, 2022, 2:41pm

My heating system can be accessed via a RS232 serial connection and has a known bug, which spits out some unrealistic item values like this:

As I’ve got some averageSince rules on those items, I’m looking for a solution to:

either remove the unrealistic values
ignore incoming unrealistic values

Has anyone an idea on how to achieve that?
What’s best? I could do this either in openHab or in the python script reading that values?
The company selling the hardware says, the bug won’t be fixed.

rlkoshak · May 23, 2022, 3:04pm

I assume this is the Serial binding? I know the MQTT binding and perhaps others have a way to create a min/max range outside of which the message will be ignored. That doesn’t appear to be supported in the Serial binding.

This would be a good candidate for a Profile but, assuming the docs are up to date, there isn’t one for this yet. So we are left with two options depending on what you want to do with these values.

If you want to replace the out of range value with some sort of constant you can use a transformation (or the tranform Profile) and use a JS transformation to test if the value is out of range and if so replace the value with some constant.

If you want to replace the out of range value with the previous value the Item held, you’ll need to use a rule and a proxy Item.

If you want to omit that value from persistence without a separate proxy Item, you’ll need to remove the Item from the persistence config, so it’s not persisted automatically. Then in a rule triggered when the Item changes state, test to see if the value is reasonable and only if it is call persist to persist the state, preventing the erroneous value from being saved into the database.

binderth · May 23, 2022, 3:34pm

Indeed it’s MQTT-based, as the python script reads the serial communication directly (which uses Hexadecimal byte-coded numbers) and then sends the results via MQTT to openHAB.

out-of-range doen’t quite fit, because the bug is sending sometimes random numbers, which unfortunately quite fit in the sensor’s range (as you can see, the blue line drops from 60-ish to 18 - which is still a “reachable” value for that sensor). I already drop “really” unrealistic values from getting sent in the first place within the python-script (e.g. >110°C for (compressed) water temperatures) or the likes. But thing is, they are “reachable”, yet it is an “unrealistic” drop from 62°C to 18°C and back to 63°C…

that would be a way to go. Complicated, but doable.

Yet I’m still lacking an idea to detect those huge drops (or spikes, they also come im spikes ).

rlkoshak · May 23, 2022, 3:39pm

If you have control over the original script (i.e. you understand it enough to make changes to it) that’s the best place to fix it.

And what makes the value unreasonable is the delta from the previous reading. So subtract the previous reading from the current reading and take the absolute value of that. If the result of that calculation is above a certain threshold refrain from publishing that reading.

binderth · May 23, 2022, 3:48pm

Yeah, I wrote that script… and about that! I kinda tried this and sometimes the delta could be within plausible reach. So, for example the readout interval is at 2minutes and for some reason one of my three heating possibilities start (solar thermic, wood furnace or gas heater), within two minutes there could be a certain spike in temperature → see here:

And if I ignore the first “spike” from 39.8°C to 51.9°C (that’s within 2mins), all the readings afterwards get discarded, too. Because my script still thinks 39.8 is the baseline…

That’s where I’m out of ideas and thought perhaps I can do something with the persisted data here?

rlkoshak · May 23, 2022, 4:08pm

There is no baseline. Even if you don’t publish a reading because it’s too big of a jump doesn’t mean you don’t keep it.

Let’s say we set the delta threshold to anything more than 15.

Reasonable Spike:

Previous	Current	What the script does
30	30	Publishes
30	56	Doesn’t Publish
56	58	Publishes

We miss that one “real” reading.

Bug:

Previous	Current	What the script does
60	60	Publishes
60	15	Doesn’t Publish
15	61	Doesn’t Publish
61	62	Publishes

The bug reading is suppressed but you lose one “real” reading too.

If you don’t care about how timely the values are reported, you could look at the last reading and the next reading but that will delay publishing of the reading by at least four minutes. In this case you’d take the abs of the delta between the current reading and the previous reading as well as the abs of the delta between the current reading and the next reading. Only if both of the deltas are too large suppress publishing the reading.

But if you need the reading to be published immediately I can think of no way to detect the error readings without losing a valid reading too.

But you’d still need to provide some sort of heuristic to use to “trim” the database which takes you right back to the original problem. Since it’s the same problem either way, may as well solve it at the source.

binderth · May 23, 2022, 4:19pm

I’ll have to think about it. In my python-script I don’t save/persist the previous value somewhere and the script is completely restarted every 2minutes. So perhaps I’ll make that running continously and have access to the previous values for that comparison.
or - as I do have Node-Red handy, perhaps I can find something there.

Thanks - as usual from you a very competent thougt analysis!

Keep you updated!

splatch · May 23, 2022, 8:00pm

I have similar issue with data I receive from CAN/modbus which sometimes can clash and result in odd values. In principle I defined two strategies - a counter and range. Counter should never go down and its increase should not exceed certain level. For power I simply filter values based on permitted range.

Prior applying these filters I saw all sorts of troubles, ie. inverter output getting into several MW, or counter spiking by MWh. As I do not run openHAB at a nuke plant I knew these were odd. For inverter data I simply have a limit of 0...20000. For counter I permit maximum increase of ‘N’ or 10% from most recent value stored in openHAB. Depending on the need I chose a static value or a percentage as it makes possible to auto-scale (automatically increase accepted value range).
Both strategies I closed into a set of profiles. Their implementation sits between thing and item. Each update reported by binding goes over filtering. This makes item list compact (no extra rule necessary) and does not make much of “per item” adjustments cause links are almost always there.

Not sure if openHAB has same mechanism out of the box, but you could have a look at scripted profiles which were discussed some time ago. With them you could implement same in few lines of code.

binderth · May 24, 2022, 6:20am

Ok, so as I use Node-Red anyways for all different kinds of stuff, I connected my read-outs with Node-Red.
Within Node-Red, there’s an ootb-check you can use to block some values depending on their predecessors. It’s called “RBE-filter” (Report by Exception).

With 2.something version of node-Red, there was some improvements on this node, so you can now choose from “last input value” or “last valid output value”. So if I’m to combine rlkoshak’s approach with Node-Red, I now have implemented this one:
grafik

So with that I get basically Rich’s approach without messing with openHAB rules and logic. Usually the bug appears at least once within a few hours, but since I implemented this, I got some real smooth graphs:

So, I guess, this works now! Thanks for your help!