Design Patterns: Generic Is Alive

rules
designpattern
Tags: #<Tag:0x00007f51dec60fe8> #<Tag:0x00007f51dec60ea8>

(Rich Koshak) #1

Please see Design Pattern: What is a Design Pattern and How Do I Use Them for a description of what Design Patterns are and how to use them.

Problem Statement

Many sensors report status to openHAB periodically (e.g. temperature sensor) or unpredictably (e.g. motion sensors) and a subset of these sensors do not have a built in way for openHAB to query whether or not they are alive. The only way for openHAB to detect that these devices are still alive is to assume they are dead if they have not communicated after a certain period of time.

Concept

image
Create Timers that get reset every time the sensor Item receives and update. If a certain amount of time passes (I recommend at least 2x the device’s normal reporting period) set the Item to UNDEF to indicate that it is no longer reporting. I recommend using the Expire binding for this.

Alternatively you can create a separate Switch Item that gets set to ON when the sensor updates and a Timer to set it to OFF.

If one is using MQTT, one can bind the previously mentioned Switch to the Last-Will-and-Testament topic for the device that communicates over MQTT to turn off.

Expire Binding Example

Items

Group:Switch DeviceSatuses // we must give the Group a type
Number Item1Sensor1 (DeviceStatuses) { expire="10m" }
Contact Item1Sensor2 (DeviceStatuses) { expire="10m" }
Switch Item1Sensor3 (DeviceStatuses) { expire="10m" }

Number Item2Sensor1 (DeviceStatuses) { expire="6m" }
Contact Item2Sensor2 (DeviceStatuses) { expire="6m" }
Switch Item2Sensor3 (DeviceStatuses) { expire="6m" }

The Group is optional and only required if you want to generate an Alert when one of these Items goes to UNDEF.

Rules

None. If you want to generate an Alert when a device goes to UNDEF use the following Rule.

rule "A sensor stopped reporting"
when
    Member of DeviceStatuses changed to UNDEF
then
    // report alert on triggeringItem
end

Theory of Operation

Every time the Item gets updated the Expire binding sets/resets a Timer. When the configured amount of time passes without an update, but default the Expire binding will set the Item to UNDEF. In your Rules, be sure to check for UNDEF before trying to use the Item.

The Rule above gets triggered any time a member of DeviceStatuses changes to UNDEF where you can generate an Alert or otherwise take remedial actions.

On the Sitemap or HABPanel, Items that are UNDEF will appear as -.

Switch Status Item Example

Items

Group:Switch:AND(ON,OFF) DeviceStatuses "Device Status [%s]"
    <network>

Group:Switch DeviceUpdates

Switch Device1_Status "Device 1 Status [%s]" <network> (DeviceStatuses)
Number Device1_Sensor1 (DeviceUpdates)
Contact Device1_Sensor2 (DeviceUpdates)
Switch Device1_Sensor3 (DeviceUpdates)

Switch Device2_Status "Device 2 Status [%s]" <network> (DeviceStatuses)
Number Device2_Sensor1 (DeviceUpdates)
Contact Device2_Sensor2 (DeviceUpdates)
Switch Device2_Sensor3 (DeviceUpdates)

Each device has more than one sensor. The online status of the device is rolled up into a status Switch. The Expire binding can and should be used here as well but I’ll show a Timer based solution for completeness. To use the Expire binding based approach use the following on the Status Switch Items:

{ expire="10m,command=OFF" }

Rules

import java.util.Map

val Map<String, Timer> timers = newHashMap

rule "Process sensor update"
when
    Member of DeviceUpdates received update
then
    if(timers.get(triggeringItem.name) === null) {
        createTimer(now.plusMinutes(10), [ | 
            sendCommand(triggeringItem.name.split("_").get(0)+"_Status", "OFF")
        ])
    }
    else timers.get(triggeringItem.name.reschedule(now.plusSeconds(10)))
end

rule "A device stopped reporting"
when
    Member of DeviceStatuses received command OFF
then
    // alert
end

Complex Working Example

This is based on old code that was written before the Expire binding was created. I plan on rewriting at some point to use Expire binding and UNDEF as shown in the first example.

Items

Group:Switch:AND(ON, OFF) gSensorStatus "Sensor's Status [MAP(admin.map):%s]"  <network>

Group:Switch gOfflineAlerted

// Sonoffs
Switch vSonoff_3157_Online "Powercord 3157 [MAP(admin.map):%s]" <network> (gResetExpire) { mqtt="<[mosquitto:tele/sonoff-3157/LWT:state:MAP(sonoff.map)", epxire="24h,state=OFF" }
Switch vSonoff_3157_Online_Alerted (gOfflineAlerted)
...

// Nest
Switch vNest_Online "Nest Status [MAP(hvac.map):%s]" <network> (gSensorStatus) { nest="<[thermostats(Entryway).is_online]" }
Switch vNest_Online_Alerted (gOfflineAlerted)

// Network
Switch vNetwork_Cerberos "Cerberos Network [MAP(admin.map):%s]" <network> (gSensorStatus, gResetExpire) { channel="network:servicedevice:cerberos:online", expire="2m" }
Switch vNetwork_Cerberos_Alerted (gOfflineAlerted)
...

// Services
Switch vCerberos_SensorReporter_Online "Cerberos sensorReporter [MAP(admin.map):%s]" <network> (gSensorStatus, gResetExpire) { mqtt="<[mosquitto:status/sensor-reporters:command:OFF:.*cerberos sensorReporter is dead.*],<[mosquitto:status/cerberos/heartbeat/string:command:ON]", expire="11m,command=OFF" }
Switch vCerberos_SensorReporter_Online_Alerted (gOfflineAlerted)
...

// Zwave devices
Switch vMainFloorSmokeCOAlarm_Heartbeat "Main Floor Smoke/CO Alarm is [MAP(admin.map):%s]"    <network> (gAlarmStatus, gSensorStatus, gResetExpire) { channel="zwave:device:dongle:node5:alarm_general", expire="24h,command=OFF" }
Switch vMainFloorSmokeCOAlarm_Heartbeat_Alerted (gOfflineAlerted)
...

Examples of a variety of sensor types are shown above. Each has an associated Offline Alerted Item to prevent multiple alerts about the device over a given period of time.

Rules

import org.eclipse.smarthome.model.script.ScriptServiceUtil
import java.util.Map

val Map<String, Timer> timers = newHashMap

rule "A sensor changed its online state2"
when
    Member of gSensorStatus changed
then
    if(previousState == NULL) return;

    val alerted = ScriptServiceUtil.getItemRegistry.getItem(triggeringItem.name+"_Alerted") as SwitchItem
    if(alerted === null) {
        logError("admin", "Cannot find Item " + triggeringItem.name+"_Alerted")
        aInfo.sendCommand(triggeringItem.name + " doesn't have an alerted flag, it is now " + transform("MAP", "admin.map", triggeringItem.state.toString) + "!")
        return;
    }

    var n = transform("MAP", "admin.map", triggeringItem.name)
    val name = if(n == "") triggeringItem.name else n

    // If we are flapping, reschedule the timer and exit
    if(timers.get(triggeringItem.name) !== null) {
        timers.get(triggeringItem.name).reschedule(now.plusMinutes(1))
        logWarn("admin", name + " is flapping!")
        return;
    }

    if(alerted.state == triggeringItem.state) {
        val currState = triggeringItem.state
        // wait one minute before alerting to make sure it isn't flapping
        timers.put(triggeringItem.name, createTimer(now.plusMinutes(1), [ |
            // If the current state of the Item matches the saved state after 5 minutes send the alert
            if(triggeringItem.state == currState) {
                aInfo.sendCommand(name + " is now " + transform("MAP", "admin.map", triggeringItem.state.toString) + "!")
                alerted.postUpdate(if(currState == ON) OFF else ON)
            }
            timers.put(triggeringItem.name, null)
        ]))
    }
end

rule "Reminder at 08:00 and system start"
when
          Time cron "0 0 8 * * ? *" or
          System started
then
    val numNull = gSensorStatus.members.filter[ sensor | sensor.state == NULL ].size
    if( numNull > 0) logWarn("admin", "There are " + numNull + " sensors in an unknown state")

    val offline = gSensorStatus.members.filter[ sensor | sensor.state == OFF ]
    if(offline.size == 0) return;

    val message = new StringBuilder
    message.append("The following sensors are known to be offline: ")
    offline.forEach[ sensor |
        var name = transform("MAP", "admin.map", sensor.name)
        if(name == "") name = sensor.name
        message.append(name)
        message.append(", ")
        gOfflineAlerted.members.filter[ a | a.name==sensor.name+"_Alerted" ].head.postUpdate(ON)
    ]
    message.delete(message.length-2, message.length)

    aInfo.sendCommand(message.toString)
end

Theory of Operation

When a member of gSensorStatus changes state if the previous state was NULL we ignore it.

Next we get the Alerted Switch to see if we have already alerted on this Item. We use Design Pattern: Human Readable Names in Messages to transform the Alerted Item’s name to something more meaningful for logs and alert messages.

If the sensors are flapping, we reschedule the timer and wait a bit for the flapping to stop, logging the fact that it is flapping of course.

If the alerted Switch matches the triggeringItem then that means that the triggeringItem changed state and we need to generate a new alert. Set a Timer and if the Item remains in the same state send an alert using Design Pattern: Separation of Behaviors.

The second Rule produces a digest listing all the offline sensors every morning at 08:00 and when OH restarts.

Advantages

Provides a generic and expandable way to get alerted or execute logic when a periodically reporting device goes silent for a period of time. One need only create some Groups and add a simple rule to add monitoring for a new device. It works with any sort of device. Or if using the Expire binding, one doesn’t even need Rules.

Related Design Patterns

Design Pattern How It’s Used
Design Pattern: Associated Items Building up the name of an Item to postUpdate or sendCommand based on the name of triggeringItem. Used in the Switch Status Item Example to update the status switch. Used in the complex working example to update the Alerted Switch.
Design Pattern: Human Readable Names in Messages Converting Item names to meaningful names for use in logs and messages in the complex example.
Design Pattern: Separation of Behaviors Centralized alerting.

Design Pattern: Human Readable Names in Messages
Design Pattern: Associated Items
Design Pattern: Expire Binding Based Timers
OpenHAB 2.0 Rules: Create list of HSBTypes
Item/Thing error feedack
Design Pattern: Event Limit
[SOLVED] How to check if state changed in the last day
[SOLVED] MQTT populating temperature values - no it is not!
Please test the new Expire Binding
[SOLVED] Watchdog Expire Alerting - Hope I Am On The Right Track
How can I round a value to 2 digits
Design Pattern: Working with Groups in Rules
Inactivity of items // no updates // automatically tracking
Action on DateTime item age
How to find out if binding stopped working?
Iterating over a group, want to check an alternate item, sometimes
Making Decisions Based on Time of Day
Detecting offline Things in a less stupid way
MQTT device initial state checking
Logic operators precedence
ASH 2200 with USB-WDE1-2 supported?
Notifications in group design pattern
No working trigger functions openHAB 2.1
Several button pressed in GUI on 1 line
Alert when item in a group has not been updated for x hours?
Unable to get DateTime to work
Test if CloudMQTT broker is alive
sendCommand and itemName from variable
Run a forEach once per object per hour
Problems with OneWire hang ups
Z-Wave - Getting item status
Group averageing
How to show ip
OH 2.x Timer Things
Design Pattern: Motion Sensor Timer
Zigbee two way communication
Problem with LWT options
Problem with LWT options
A few questions on status indicators for switches
(neil_renaud) #2

Some feedback…

“now.plusHours(timeoutMinutes )”

That looks like a bug. If you also get the reschedule and the schedule using the same units you should use the variable rather than hard coding…

timers.get(sw.name).reschedule(now.plusHours(2)) // Make sure this matches above, use an appropriate time


(Rich Koshak) #3

Yes, totally a bug. I discovered this last night when I was trying to figure out why I wasn’t getting an alert for something I knew was down.

That is also a bug. These things happen when you retype your code to be generic as opposed to just pasting in your working code. Grrr.


(Rich Koshak) #4

Added a new version that simplifies the code using the new Expire binding.


(Peter Boehm) #5

Hi Rich,

I tried your “expire” version of “generic is alive”.

But I get two errors:

no viable alternative at input ‘Functions$Function3’

and

missing EOF at ‘if’

at

if(!lock.isLocked) { // skip this event if there is already a lock on this Switch

Maybe a typo I couldn’t find?

Thanks for your help


(Rich Koshak) #6

There is an unmatched {, (, or [ somewhere in your file I’m willing to bet.


(Peter Boehm) #7

I did a copy and paste from your code without changing anything. I’ll check again, maybe I find something…


(Rich Koshak) #8

Load the rule into Designer. If there is a typo of this sort or a syntax error it will highlight it.


(Peter Boehm) #9

I get these errors (according to the attached screenshot) in designer after copy and paste of your code block.

Maybe the first warning is a hint?:

The import ‘org.eclipse.xtext.xbase.lib.Functions’ is never used.


(Rich Koshak) #10

There were two typos and one error that I found in the code above. Here is a corrected version:

import org.eclipse.xtext.xbase.lib.Functions
import java.util.Map
import java.util.concurrent.locks.ReentrantLock

// Globals
val Map<String, Boolean> notified = newHashMap // Flag to avoid duplicate alerts
val Map<String, ReentrantLock> locks = newHashMap // locks to avoid InvalidState exceptions when processing multiple updates at the same time

val Functions$Function3<SwitchItem, Map<String, Boolean>, 
                        Map<String, ReentrantLock>, Boolean> processOn = 
[ sw, notified, locks |

    // Generate a lock if there isn't one for this Switch
    if(locks.get(sw.name) == null) locks.put(sw.name, new ReentrantLock)

    val lock = locks.get(sw.name)
    if(!lock.isLocked) { // skip this event if there is already a lock on this Switch
        try {
            lock.lock

            sw.sendCommand(ON) // this will start the Expire timer

            // Alert if we have been previously been alerted the device was down
           if(notified.getOrDefault(sw.name, false)){
               // alert code goes here
           }
           notified.put(sw.name, false)
        }
        catch(Throwable t) {
            logError("isAlive", "Error in locked part of processOn: " + t.toString)
        }
        finally{
            lock.unlock
        }
        true // return value
    }
]

// We don't need the lock here because the rules that call it only get triggered once 
// unlike above which gets triggered multiple times per event
val Functions$Function2<SwitchItem, Map<String, Boolean>, Boolean> processOff = 
[sw, notified |
    if(!notified.getOrDefault(sw.name, false)){
        // alert code goes here
        notified.put(sw.name, true)
    }
]

// Start timers for all devices
rule "System started, kick off initial timers"
when
    System started
then
    gDevicesStatus.members.forEach[SwitchItem sw |
        sw.sendCommand(ON)
        Thread::sleep(500)
    ]
end

// Device 1 is alive!
rule "gDevice1 received update"
when
    Item gDevice1 received update
then
    processOn.apply(Device1Status, notified, locks)
end

// Device 1 is dead!
rule "Device1Status is dead"
when
    Item Device1Status changed to OFF
then
    processOff.apply(Device1Status, notified)
end

For the curious they were:

  • I included the varaible name “sw” inside the < > part of the Functions$Function3 definition
  • I failed to close the < > in the Functions$Function2 definition
  • It didn’t like my true in the finally clause so I moved that to be the last line in the lambda (this is the error)

(Peter Boehm) #11

Cool, errors are gone :slight_smile:

Many thanks Rich!!!


(Semperor) #12

Could you short explain why? Didn’t get it…


(Rich Koshak) #13

Because a Group never gets assigned a state if you don’t give it a type. If the Group never gets a state it never gets an update. If it never gets and update, there is no event that can be used to trigger the Rule.