Persistence strategy respecting a threshold

TL;DR: What is the openHAB way to persist an item value only when it changed significantly?

  • Platform information:

    • Hardware: Raspberry Pi 3
    • OS: Raspbian Jessie
    • Java Runtime Environment: oracle-java-jdk 1.8.0_65
    • openHAB version: 2.1.0
  • My general background: Iā€™m beginner on openHAB, currently evaluating version 2.1 with no experience on version 1. Iā€™m active user of FHEM for a few years. My hardware setup contains heterogenous components including thermo-/hygrometers (TFA), power outlets (Intertechno) and heating control (MAX!, eBUS). Iā€™m software developer with few experiences in Perl and a heavy background in Java, thus openHAB looks promising to me.

  • My first openHAB project: Iā€™m trying to integrate thermo-/hygrometers. I have 16 sensors that transmit their state (temperature, humidity, battery and rssi) about every 10 seconds. The signal is received by a linux daemon which creates a JSON message and publishes it per MQTT. FHEM and openHAB are subscribers to the respecting MQTT topics.

  • My problem: the item values often change within a very small range (0.1 Ā°C / 2%). This is no problem for UI display, but I donā€™t want to persist all those small changes. I chose and prefer persistence strategy everyChange, and I donā€™t want to switch to a fix time-based strategy because that would require a lot of disk space. Iā€™d like to set a persistence strategy that accepts every change > 0.2 Ā°C or every change > 3%, a combination of every change wih a threshold. FHEM allows such a setting.

  • My efforts: Iā€™ve searched for solutions, all I found is a thread where the supposed solution is the combination of a dummy item and a rule. I donā€™t prefer this solution because in my case that would be 16 dummy items and maybe 16 rules (I donā€™t hope so, didnā€™t dive into rules until now). I saw a method deltaSince that can be used in rules, but didnā€™t find a way to make persistence depend on a rule.

What is the openHAB way to solve my problem? Is there something like a persistence filter or pre-processing phase? Or is it possible for an average user to add a persistence strategy?

The quick answer, no, I am not aware that OH2 allows persistence strategies with a trigger threshold. Here some other thoughts:

  • If you are worried about disk space you could persist through using rrd4j which remains always at a fixed space (by making historical data less granular). Note: rrd4j works only for numbers.
  • even if you are persisting every hour it is unlikely that you would run into disk space problems any time soon: 16 data points once an hour makes approx 140.000 data points per yearā€¦not really anything big (even every 30mins does not seem excessive).
  • You can write rules that iterate over group members and therefore it is likely that you just could write one rule for all your items, this thread will likely give you all you need (once you familiarized yourself with rules in general): Design Pattern: Working with Groups in Rules

In addition to rrd4j, there is also InfluxDB with a retention policy to automatically delete or decimate old data.

But frankly, we are not talking about a lot of disk space. 16 temperatures stored as letā€™s say a 32-bit number with a 64-bit timestamp saved every second we are looking at around 370MB for a yearā€™s worth of data. Since one second is on the extreme end of the range lets say once a minute which gives us around 6 MB per year of data for 16 Items.

Personally, storage is so cheap these days I would not find it worth my time to try and reduce the this storage. It would be decades before the DB grew large enough that Iā€™d be worried about space, and I can deal with that with a retention policy or rrd4j if I were.

So the OH way would probably to save every change and use rrd4j or InfluxDB with a retention policy, or just ignore it as the amount of space is insignificant.

But, as lipp_markus said, if you find it worth your time to tackle this, your only choice is to use Rules.

There is a method on the Item class called persist. So you can have a rule something like:

rule "Temperature changed"
when
    Item Temperatures received update
then
    Temperatures.members.filter[temp | temp.previousState != temp.state && Math::abs(temp.previousState as Number - temp.state as Number) > 0.2].forEach[temp | temp.persist]
end

Iā€™m pretty sure that you can call persist on an Item that is not listed in a .persist file with a strategy

The above triggers any time any member of the Temperatures Group receives an update, filters down to just those Items whose previousState is different from the current state and the difference is more than 0.2 and calls .persist on each of those Items.

If that doesnā€™t work, you will need to create Proxy Items for each and instead of calling .persist, postUpdate to the Proxy Item and configure your Proxy Item to save on every change to persistence. You will also change how you do the filter. the below uses https://community.openhab.org/t/design-pattern-associated-items/15790:

rule "Temperatures changed"
when
    Item Temperatures received update
then

    Temperatures.members.forEach[temp |
        val proxy = TempProxies.members.findFirst[p | p.name == temp.name+"_Proxy"]
        if(temp.state as Number != proxy.state as Number) proxy.postUpdate(temp.state as Number)
    ]
end
2 Likes

as usual, Rich is head on nail! :wink:
one thing you to keep in mind, if you want to persist only with a threshold is how to avoid not persisting anything at all. Letā€™s just assume, that your temperature is inertial and just stays under the threshold, but over the day, this cold then add up to a significant change, so you have to add that behaviour to your rule also. Since you donā€™t want to use persistence, you canā€™t use the cool ā€œaverageSinceā€ with an item, so you must find other persistences to check, if an item changed since more than one interval.

I would go with a ā€œcompactingā€ function within the persistence - or just delete old values, if you wonā€™t need them. I for myself also persist all my sensors (>150, if I count them correctly) within MySQL in my case. As my MySQL grew also big, I just added a stored procedure as described here: ā€œGarbage Collectionā€ on MySQL persistence: deleting old Item states. If i had some more time (and ran out of space with my DB), Iā€™d like to come up either with a migration to InfluxDB or some stored procedure which would compact my data in junks of some hourlong intervals or something - I donā€™t knowā€¦ But for the time being, I have around 500 items in my MySQL-Database running for approx. a year now and the size is 216,12 MB right now.

1 Like

Thank you very much for your quick and extensive comments. I understand that the problem is common and usually solved on the database layer.

Let me correct one thing: my motivation is not only disk space, I should have pointed this out more clearly. I admit that space is not the biggest problem nowadays. I have a few more things in mind:

  • Chart appereance: looks quite ugly when the line jump up and down (see attached example graph).
  • Data quality: from a logical perspective I consider these small changes measurement errors (caused by hardware or sampling resolution) that Iā€™d like to see corrected.
  • Lifetime of flash-based storage memory (think of an SD card in a Raspberry Pi)
  • Efficiency: since Iā€™m already running a home automation setup, Iā€™m able to compare the actual use of disk space. For example, all climate values of Nov 2017 require 3.5 MB in FHEM (uncompressed text file) whereas a single day requires currently 2.5 MB in OpenHAB (sqlite demo instance). This is because temperature and humidity values ā€œjumpā€ a lot.

I donā€™t want to change (i.e. delete or average) historic data because this approach does not cover the use case of long-time analysis (e.g. comparison of the course of a day over years). Thatā€™s why I stay away of rrd4j or InfluxDB with a retention policy.

Richā€™s first example rule looks interesting, could be a way to go for me. Iā€™ll think into this direction.

Once again thanks for your efforts!

Regarding the SD card issue. Either let the pi use USB SSD or better yet use an external base. As my NAS is already idling the day, I use my Synology for MySQL (MariaDB) and as MQTT broker (mosquitto).
For data quality: either be sure to use high quality sensors or just live with it, but besides my 1ā‚¬ ds18b20 are quite stable. The data of sensors within my appliances (e.g. heating, ventilation, RTRs, ā€¦) detect info, which directly lead the appliance to react (I collect them for on top automation purposes). The only really ā€œfast jumpingā€ sensor I see is my Rehau air quality USB sensor, and Iā€™m not quite sure if itā€™s the sensor or the location I placed itā€¦ So my guess is, youā€™re judging the data from a very ā€œperfectionisticā€ point of viewā€¦ :stuck_out_tongue_winking_eye:

Long story short: try external persistence to spare your SD card frequent activity and donā€™t be so hard to your sensorsā€¦

1 Like

@openhab-user: did you already find a solution? I am also new to openhab and think it is really important to store sufficient values but to keep only that which store information - even with cheap, big storage.

I would even like to have some kind of hysteresis, such that for slow increasing values every one is stored, but jittering up to a defined value is filtered out.

Rules do not seem to be the right approach to tackle this, because it will blow up the rule set unnecessary. Even worse if we need proxy items.

Does anyone see a chance to do more intelligent filtering on numbers natively?

No, I didnā€™t, sorry.