Design Patterns: Generic Is Alive

rlkoshak · October 5, 2016, 10:35pm

Edit: Rewritten for OH 4.

Please see Design Pattern: What is a Design Pattern and How Do I Use Them for a description of what Design Patterns are and how to use them.

Problem Statement

Many sensors report status to openHAB periodically (e.g. temperature sensor) or unpredictably (e.g. motion sensors) and a subset of these sensors do not have a built in way for openHAB to query whether or not they are alive. The only way for openHAB to detect that these devices are still alive is to assume they are dead if they have not communicated after a certain period of time.

Concept

Create Timers that get reset every time the sensor Item receives and update. If a certain amount of time passes (I recommend at least 2x the device’s normal reporting period) perform some action to respond to the no longer reporting sensor.

Rule Template

There is rule template on the marketplace that implements this design pattern.

Install Template and Instantiate and Configure the Rule

This rule template implements this design pattern. Configure it as follows:

Create a Group and add all the Items that represent a sensor reading to that Group. We’ll call it “AllSensors”.
Add expire metadata to all the Items added to the AllSensors Group so that they get set to UNDEF if they do not receive an update for too long.
Create a rule or a script that will be called when the sensor is determined to be offline. This rule can do anything you want including sending an alert, changing the state of a Switch Item (tracks the online status of the sensor), take remedial action like restarting a binding or a service, etc. See below for an example that sends an alert when a sensor stops reporting and starts reporting again.
Install the Threshold Alert rule template from MainUI → Add-on Store → Automation → Rule Templates.
Create a new rule and choose "Threshold Alert " as the “Create from Template”. This will provide a form to enter the rule template parameters which will customize the behavior of this instance of the template.
Configure the rule as follows (if a parameter is not mentioned, you can leave it as the default or set it as desired):

Property	Value	Purpose
Rule UID	Something meaningful	the unique identifier for the rule that is about to be created
Name	Something meaningful	Name under which the rule will appear under Settings. → Rules
Description	Something meaningful	A sentence or two explaining what the rule does
Triggering Group	`AllSensors`	The group created in 1 above, select from your Items. Changes to this Group’s members trigger the rule.
Threshold State	`UNDEF`	The state to look for and alert on
Alert Delay	'PT15M`	An ISO8601 duration indicating how long the Item should be in the `UNDEF` state before calling the alerting rule. In this case we use 15 minutes.
Reminder Period	`PT24H`	An ISO8601 duration indicating how long to wait before calling the alert rule again if the Item remains `UNDEF` after the initial alert. In this case we get a reminder every day if the Item remains `UNDEF`.
Alert Rule	UID of the Rule created in 3 above	This is the rule that gets called when an Item remains UNDEF for 15 minutes.

Click “Save” and you are done.

If you want an alert when the sensor comes back after it has been alerted as offline, create the rule to alert and set it for the “End Alert Rule” property. It can be the same rule as the “Alert Rule”.

The rules that get called with have values passed into them including the name (alertItem) and state (alertState) of the Item that is UNDEF and whether the Item is alerting (isAlerting). See the rule template docs for details.

Processing Rule

The rule you created in step 3 above will be called by the rule instantiated from the rule template.

Blockly

As previously mentioned, the called script can do anything desired. I’ve set mine up to keep track if we’ve already alerted on this piece of Equipment (based on the Semantic Model and with the help of a Switch Item) so I only get one alert for the whole equipment, even if more than one sensor went offline for that Equipment.

JS Scripting

var msg = (isAlerting) ? ' is offline!' : ' is reporting again!';
actions.NotificationAction.sendNotification('test@example.org', items[alertItem].label + msg);

Rules DSL

Unfortunately Rules DSL does not support accessing the variables passed in from the calling rule.

Timestamps

If all you need is a timestamp of when the last time the sensor reported instead of generating an alert or taking some action, you should create a DateTime Item to represent the last report time of a given sensor. Link that Item to the Channel(s) that represent the sensor and apply the timestamp Profile. Any time any of the linked Channels receives an update, the Item will be updated to now.

neil_renaud · October 7, 2016, 9:16am

Some feedback…

“now.plusHours(timeoutMinutes )”

That looks like a bug. If you also get the reschedule and the schedule using the same units you should use the variable rather than hard coding…

timers.get(sw.name).reschedule(now.plusHours(2)) // Make sure this matches above, use an appropriate time

rlkoshak · October 7, 2016, 3:40pm

Yes, totally a bug. I discovered this last night when I was trying to figure out why I wasn’t getting an alert for something I knew was down.

That is also a bug. These things happen when you retype your code to be generic as opposed to just pasting in your working code. Grrr.

rlkoshak · December 6, 2016, 10:11pm

Added a new version that simplifies the code using the new Expire binding.

PeterBoehm · March 2, 2017, 9:23pm

Hi Rich,

I tried your “expire” version of “generic is alive”.

But I get two errors:

no viable alternative at input ‘Functions$Function3’

and

missing EOF at ‘if’

at

if(!lock.isLocked) { // skip this event if there is already a lock on this Switch

Maybe a typo I couldn’t find?

Thanks for your help

rlkoshak · March 7, 2017, 7:33pm

There is an unmatched {, (, or [ somewhere in your file I’m willing to bet.

PeterBoehm · March 9, 2017, 6:26pm

I did a copy and paste from your code without changing anything. I’ll check again, maybe I find something…

rlkoshak · March 9, 2017, 6:57pm

Load the rule into Designer. If there is a typo of this sort or a syntax error it will highlight it.

PeterBoehm · March 9, 2017, 9:20pm

I get these errors (according to the attached screenshot) in designer after copy and paste of your code block.

Maybe the first warning is a hint?:

The import ‘org.eclipse.xtext.xbase.lib.Functions’ is never used.

rlkoshak · March 9, 2017, 10:09pm

There were two typos and one error that I found in the code above. Here is a corrected version:

import org.eclipse.xtext.xbase.lib.Functions
import java.util.Map
import java.util.concurrent.locks.ReentrantLock

// Globals
val Map<String, Boolean> notified = newHashMap // Flag to avoid duplicate alerts
val Map<String, ReentrantLock> locks = newHashMap // locks to avoid InvalidState exceptions when processing multiple updates at the same time

val Functions$Function3<SwitchItem, Map<String, Boolean>, 
                        Map<String, ReentrantLock>, Boolean> processOn = 
[ sw, notified, locks |

    // Generate a lock if there isn't one for this Switch
    if(locks.get(sw.name) == null) locks.put(sw.name, new ReentrantLock)

    val lock = locks.get(sw.name)
    if(!lock.isLocked) { // skip this event if there is already a lock on this Switch
        try {
            lock.lock

            sw.sendCommand(ON) // this will start the Expire timer

            // Alert if we have been previously been alerted the device was down
           if(notified.getOrDefault(sw.name, false)){
               // alert code goes here
           }
           notified.put(sw.name, false)
        }
        catch(Throwable t) {
            logError("isAlive", "Error in locked part of processOn: " + t.toString)
        }
        finally{
            lock.unlock
        }
        true // return value
    }
]

// We don't need the lock here because the rules that call it only get triggered once 
// unlike above which gets triggered multiple times per event
val Functions$Function2<SwitchItem, Map<String, Boolean>, Boolean> processOff = 
[sw, notified |
    if(!notified.getOrDefault(sw.name, false)){
        // alert code goes here
        notified.put(sw.name, true)
    }
]

// Start timers for all devices
rule "System started, kick off initial timers"
when
    System started
then
    gDevicesStatus.members.forEach[SwitchItem sw |
        sw.sendCommand(ON)
        Thread::sleep(500)
    ]
end

// Device 1 is alive!
rule "gDevice1 received update"
when
    Item gDevice1 received update
then
    processOn.apply(Device1Status, notified, locks)
end

// Device 1 is dead!
rule "Device1Status is dead"
when
    Item Device1Status changed to OFF
then
    processOff.apply(Device1Status, notified)
end

For the curious they were:

I included the varaible name “sw” inside the < > part of the Functions$Function3 definition
I failed to close the < > in the Functions$Function2 definition
It didn’t like my true in the finally clause so I moved that to be the last line in the lambda (this is the error)

PeterBoehm · March 9, 2017, 10:44pm

Cool, errors are gone

Many thanks Rich!!!

semperor · February 12, 2019, 8:27pm

Could you short explain why? Didn’t get it…

rlkoshak · February 12, 2019, 10:24pm

Because a Group never gets assigned a state if you don’t give it a type. If the Group never gets a state it never gets an update. If it never gets and update, there is no event that can be used to trigger the Rule.

isoparme · February 11, 2021, 12:35pm

I am in the process of setting up a rule to check “alive” status, and in particular the zwave door contacts, some of these doors or windows are not opened often, even ever, yet I must be able to know if the sensor is “alive” i read this post and it helps me a lot, however i have some questions:
when a zwave module wakes up “sometimes only once a day”, does it send an “update” of the status? including battery level status? and if so can I use instead of “changeSince”, “updateSince” for windows that are rarely opened? and if i use influxdb can i use in strategy: everyUpdate, everyChange, “and” everyMinute, or the everyMinute can truncate the “updateSince”?

here are my rules :

rule “Detection d’anomalie batteries si pas de changement depuis 24h”
when
Time cron “0 03 19 1/1 * ? *”//demarre a 19h10
then
val ContactBatDevices = It_Group_ContactBat.members.filter[sensor|sensor.updatedSince(now.minusHours(24), “influxdb”) == false]

ContactBatDevices.forEach [ sensor |
    msg4 = msg4 + (transform("MAP", "ContactBatSensorAnomalie.map", sensor.name) + ': ' +  '\n')
    logInfo("Sensor Statut ","possible anomalie sur " + sensor.name + ": " + sensor.changedSince(now.minusHours(24), "influxdb").toString )
    VarNumberFailedSensor3 = VarNumberFailedSensor3 +1
    ]

 if (msg4 != "" )
   {
    sendBroadcastNotification(msg4 + " .Présente une possible anomalie de fonctionnement")
    logInfo("Le satut du capteur : ",msg4 +".Présente une possible anomalie de fonctionnement")
   }
   msg4 = ""

end

this actually work, but “inside” i can’t see change et update to be sure it work , for the z-wave and its awayking I will ask on a specific post, thank you in advance

rlkoshak · February 11, 2021, 3:38pm

Sometimes. You can find out but triggering a rule on received update on those Items and watch for that rule to trigger. It’s probably be a good idea to do that anyway so you can get an idea of how often they actually do wake up and report. Especially if it’s reporting a battery status I would expect the Item to be updated even if the received state is the same as the Item’s current state.

It may be less than once a day.

The database doesn’t know the difference between a value that was saved because of an update, a change, a command, or periodically because of a cron based strategy (e.g. everyMinute). There is no way it can implement an updateSince.

It can’t tell the difference between an entry caused by everyMinute and one caused by updateSince.

You could configure those Items to only save on updates and at no other times. Then you know that the only entries in the database were caused by updates. Then you can get lastUpdate’s time and compare that to now.minusX to see if the Item has updated too long ago.

But it’ll probably be easier to implement the Expire Binding or a Timer based approach. Reschedule the timer every time the Item receives an update. If it takes too long, the timer goes off and sends the alert that the device is offline.

isoparme · February 11, 2021, 3:42pm

yes is Probably the best way .
I try to set strategy to “strategy = everyUpdate” and
I keep you informed

Very thanks again to all yours explanations and time!

isoparme · February 11, 2021, 7:45pm

with only set strategy to “strategy = everyUpdate” it work i see on grafana the “hit” point on the graph made with influxdb source