Design Patterns: Generic Is Alive

Edit: Rewritten for OH 4.

Please see Design Pattern: What is a Design Pattern and How Do I Use Them for a description of what Design Patterns are and how to use them.

Problem Statement

Many sensors report status to openHAB periodically (e.g. temperature sensor) or unpredictably (e.g. motion sensors) and a subset of these sensors do not have a built in way for openHAB to query whether or not they are alive. The only way for openHAB to detect that these devices are still alive is to assume they are dead if they have not communicated after a certain period of time.

image

Concept

Create Timers that get reset every time the sensor Item receives and update. If a certain amount of time passes (I recommend at least 2x the deviceā€™s normal reporting period) perform some action to respond to the no longer reporting sensor.

Rule Template

There is rule template on the marketplace that implements this design pattern.

Install Template and Instantiate and Configure the Rule

This rule template implements this design pattern. Configure it as follows:

  1. Create a Group and add all the Items that represent a sensor reading to that Group. Weā€™ll call it ā€œAllSensorsā€.

  2. Add expire metadata to all the Items added to the AllSensors Group so that they get set to UNDEF if they do not receive an update for too long.

  3. Create a rule or a script that will be called when the sensor is determined to be offline. This rule can do anything you want including sending an alert, changing the state of a Switch Item (tracks the online status of the sensor), take remedial action like restarting a binding or a service, etc. See below for an example that sends an alert when a sensor stops reporting and starts reporting again.

  4. Install the Threshold Alert rule template from MainUI ā†’ Add-on Store ā†’ Automation ā†’ Rule Templates.

  5. Create a new rule and choose "Threshold Alert " as the ā€œCreate from Templateā€. This will provide a form to enter the rule template parameters which will customize the behavior of this instance of the template.

  6. Configure the rule as follows (if a parameter is not mentioned, you can leave it as the default or set it as desired):

Property Value Purpose
Rule UID Something meaningful the unique identifier for the rule that is about to be created
Name Something meaningful Name under which the rule will appear under Settings. ā†’ Rules
Description Something meaningful A sentence or two explaining what the rule does
Triggering Group AllSensors The group created in 1 above, select from your Items. Changes to this Groupā€™s members trigger the rule.
Threshold State UNDEF The state to look for and alert on
Alert Delay 'PT15M` An ISO8601 duration indicating how long the Item should be in the UNDEF state before calling the alerting rule. In this case we use 15 minutes.
Reminder Period PT24H An ISO8601 duration indicating how long to wait before calling the alert rule again if the Item remains UNDEF after the initial alert. In this case we get a reminder every day if the Item remains UNDEF.
Alert Rule UID of the Rule created in 3 above This is the rule that gets called when an Item remains UNDEF for 15 minutes.
  1. Click ā€œSaveā€ and you are done.

If you want an alert when the sensor comes back after it has been alerted as offline, create the rule to alert and set it for the ā€œEnd Alert Ruleā€ property. It can be the same rule as the ā€œAlert Ruleā€.

The rules that get called with have values passed into them including the name (alertItem) and state (alertState) of the Item that is UNDEF and whether the Item is alerting (isAlerting). See the rule template docs for details.

Processing Rule

The rule you created in step 3 above will be called by the rule instantiated from the rule template.

Blockly

image

As previously mentioned, the called script can do anything desired. Iā€™ve set mine up to keep track if weā€™ve already alerted on this piece of Equipment (based on the Semantic Model and with the help of a Switch Item) so I only get one alert for the whole equipment, even if more than one sensor went offline for that Equipment.

JS Scripting

var msg = (isAlerting) ? ' is offline!' : ' is reporting again!';
actions.NotificationAction.sendNotification('test@example.org', items[alertItem].label + msg);

Rules DSL

Unfortunately Rules DSL does not support accessing the variables passed in from the calling rule.

Timestamps

If all you need is a timestamp of when the last time the sensor reported instead of generating an alert or taking some action, you should create a DateTime Item to represent the last report time of a given sensor. Link that Item to the Channel(s) that represent the sensor and apply the timestamp Profile. Any time any of the linked Channels receives an update, the Item will be updated to now.

19 Likes

Some feedbackā€¦

ā€œnow.plusHours(timeoutMinutes )ā€

That looks like a bug. If you also get the reschedule and the schedule using the same units you should use the variable rather than hard codingā€¦

timers.get(sw.name).reschedule(now.plusHours(2)) // Make sure this matches above, use an appropriate time

Yes, totally a bug. I discovered this last night when I was trying to figure out why I wasnā€™t getting an alert for something I knew was down.

That is also a bug. These things happen when you retype your code to be generic as opposed to just pasting in your working code. Grrr.

Added a new version that simplifies the code using the new Expire binding.

1 Like

Hi Rich,

I tried your ā€œexpireā€ version of ā€œgeneric is aliveā€.

But I get two errors:

no viable alternative at input ā€˜Functions$Function3ā€™

and

missing EOF at ā€˜ifā€™

at

if(!lock.isLocked) { // skip this event if there is already a lock on this Switch

Maybe a typo I couldnā€™t find?

Thanks for your help

There is an unmatched {, (, or [ somewhere in your file Iā€™m willing to bet.

I did a copy and paste from your code without changing anything. Iā€™ll check again, maybe I find somethingā€¦

Load the rule into Designer. If there is a typo of this sort or a syntax error it will highlight it.

I get these errors (according to the attached screenshot) in designer after copy and paste of your code block.

Maybe the first warning is a hint?:

The import ā€˜org.eclipse.xtext.xbase.lib.Functionsā€™ is never used.

There were two typos and one error that I found in the code above. Here is a corrected version:

import org.eclipse.xtext.xbase.lib.Functions
import java.util.Map
import java.util.concurrent.locks.ReentrantLock

// Globals
val Map<String, Boolean> notified = newHashMap // Flag to avoid duplicate alerts
val Map<String, ReentrantLock> locks = newHashMap // locks to avoid InvalidState exceptions when processing multiple updates at the same time

val Functions$Function3<SwitchItem, Map<String, Boolean>, 
                        Map<String, ReentrantLock>, Boolean> processOn = 
[ sw, notified, locks |

    // Generate a lock if there isn't one for this Switch
    if(locks.get(sw.name) == null) locks.put(sw.name, new ReentrantLock)

    val lock = locks.get(sw.name)
    if(!lock.isLocked) { // skip this event if there is already a lock on this Switch
        try {
            lock.lock

            sw.sendCommand(ON) // this will start the Expire timer

            // Alert if we have been previously been alerted the device was down
           if(notified.getOrDefault(sw.name, false)){
               // alert code goes here
           }
           notified.put(sw.name, false)
        }
        catch(Throwable t) {
            logError("isAlive", "Error in locked part of processOn: " + t.toString)
        }
        finally{
            lock.unlock
        }
        true // return value
    }
]

// We don't need the lock here because the rules that call it only get triggered once 
// unlike above which gets triggered multiple times per event
val Functions$Function2<SwitchItem, Map<String, Boolean>, Boolean> processOff = 
[sw, notified |
    if(!notified.getOrDefault(sw.name, false)){
        // alert code goes here
        notified.put(sw.name, true)
    }
]

// Start timers for all devices
rule "System started, kick off initial timers"
when
    System started
then
    gDevicesStatus.members.forEach[SwitchItem sw |
        sw.sendCommand(ON)
        Thread::sleep(500)
    ]
end

// Device 1 is alive!
rule "gDevice1 received update"
when
    Item gDevice1 received update
then
    processOn.apply(Device1Status, notified, locks)
end

// Device 1 is dead!
rule "Device1Status is dead"
when
    Item Device1Status changed to OFF
then
    processOff.apply(Device1Status, notified)
end

For the curious they were:

  • I included the varaible name ā€œswā€ inside the < > part of the Functions$Function3 definition
  • I failed to close the < > in the Functions$Function2 definition
  • It didnā€™t like my true in the finally clause so I moved that to be the last line in the lambda (this is the error)

Cool, errors are gone :slight_smile:

Many thanks Rich!!!

Could you short explain why? Didnā€™t get itā€¦

Because a Group never gets assigned a state if you donā€™t give it a type. If the Group never gets a state it never gets an update. If it never gets and update, there is no event that can be used to trigger the Rule.

1 Like

I am in the process of setting up a rule to check ā€œaliveā€ status, and in particular the zwave door contacts, some of these doors or windows are not opened often, even ever, yet I must be able to know if the sensor is ā€œaliveā€ i read this post and it helps me a lot, however i have some questions:
when a zwave module wakes up ā€œsometimes only once a dayā€, does it send an ā€œupdateā€ of the status? including battery level status? and if so can I use instead of ā€œchangeSinceā€, ā€œupdateSinceā€ for windows that are rarely opened? and if i use influxdb can i use in strategy: everyUpdate, everyChange, ā€œandā€ everyMinute, or the everyMinute can truncate the ā€œupdateSinceā€?

here are my rules :

rule ā€œDetection dā€™anomalie batteries si pas de changement depuis 24hā€
when
Time cron ā€œ0 03 19 1/1 * ? *ā€//demarre a 19h10
then
val ContactBatDevices = It_Group_ContactBat.members.filter[sensor|sensor.updatedSince(now.minusHours(24), ā€œinfluxdbā€) == false]

ContactBatDevices.forEach [ sensor |
    msg4 = msg4 + (transform("MAP", "ContactBatSensorAnomalie.map", sensor.name) + ': ' +  '\n')
    logInfo("Sensor Statut ","possible anomalie sur " + sensor.name + ": " + sensor.changedSince(now.minusHours(24), "influxdb").toString )
    VarNumberFailedSensor3 = VarNumberFailedSensor3 +1
    ]

 if (msg4 != "" )
   {
    sendBroadcastNotification(msg4 + " .PrƩsente une possible anomalie de fonctionnement")
    logInfo("Le satut du capteur : ",msg4 +".PrƩsente une possible anomalie de fonctionnement")
   }
   msg4 = ""

end

this actually work, but ā€œinsideā€ i canā€™t see change et update to be sure it work , for the z-wave and its awayking I will ask on a specific post, thank you in advance

Sometimes. You can find out but triggering a rule on received update on those Items and watch for that rule to trigger. Itā€™s probably be a good idea to do that anyway so you can get an idea of how often they actually do wake up and report. Especially if itā€™s reporting a battery status I would expect the Item to be updated even if the received state is the same as the Itemā€™s current state.

It may be less than once a day.

The database doesnā€™t know the difference between a value that was saved because of an update, a change, a command, or periodically because of a cron based strategy (e.g. everyMinute). There is no way it can implement an updateSince.

It canā€™t tell the difference between an entry caused by everyMinute and one caused by updateSince.

You could configure those Items to only save on updates and at no other times. Then you know that the only entries in the database were caused by updates. Then you can get lastUpdateā€™s time and compare that to now.minusX to see if the Item has updated too long ago.

But itā€™ll probably be easier to implement the Expire Binding or a Timer based approach. Reschedule the timer every time the Item receives an update. If it takes too long, the timer goes off and sends the alert that the device is offline.

yes is Probably the best way .
I try to set strategy to ā€œstrategy = everyUpdateā€ and
I keep you informed

Very thanks again to all yours explanations and time!

with only set strategy to ā€œstrategy = everyUpdateā€ it work i see on grafana the ā€œhitā€ point on the graph made with influxdb source

1 Like