There is now a Rule Template on the marketplace to implement this for you. See below for instructions on how to set it up.
Please see Design Pattern: What is a Design Pattern and How Do I Use Them for a description of what Design Patterns are and how to use them.
Problem Statement
Many sensors report status to openHAB periodically (e.g. temperature sensor) or unpredictably (e.g. motion sensors) and a subset of these sensors do not have a built in way for openHAB to query whether or not they are alive. The only way for openHAB to detect that these devices are still alive is to assume they are dead if they have not communicated after a certain period of time.
Concept
Create Timers that get reset every time the sensor Item receives and update. If a certain amount of time passes (I recommend at least 2x the deviceās normal reporting period) set the Item to UNDEF
to indicate that it is no longer reporting. I recommend using the Expire binding for this.
Alternatively you can create a separate Switch Item that gets set to ON when the sensor updates and a Timer to set it to OFF.
If one is using MQTT, one can bind the previously mentioned Switch to the Last-Will-and-Testament topic for the device that communicates over MQTT to turn off.
Open Reminder Rule Template
Install Template and Instantiate and Configure the Rule
This rule template implements this design pattern. Configure it as follows:
-
Create a Group and add all the Items that represent a sensor reading to that Group. Weāll call it āAllSensorsā.
-
Add
expire
metadata to all the Items added to the AllSensors Group so that they get set toUNDEF
if they do not receive an update for too long. -
Create a rule or a script that will be called when the sensor is determined to be offline. This rule can do anything you want including sending an alert, changing the state of a Switch Item (tracks the online status of the sensor), take remedial action like restarting a binding or a service, etc. See below for an example.
-
Install the Open Reminder Rule Template from MainUI ā Settings ā Automation.
-
Create a new rule and choose āOpen Reminderā as the template.
-
Configure the rule as follows:
-
give the rule a meaningful ID, name and description
-
choose AllSensors as the Group
-
type in āUNDEFā for the Alert State
-
toggle Invert to ON meaning we want to alert when the Item goes to
UNDEF
, not changing away fromUNDEF
-
Provide a default initial timeout, the amount of time before the alert is generated. Use ISO8601 format (e.g. "PT15m " means 15 minutes).
-
You can provide the namespace for Item metadata to use instead of the default initial timeout. This can be useful if you want to have a different timeout for each sensor.
-
If you want to get repeated alerts, set the duration between alerts, again using ISO8601 format. Leaving blank will alert only the one time.
-
Leave Reschedule toggled off. This option is useful for motion sensors where you might not want to alert until a certain amount of time after the last motion is detected.
-
Select the rule you created in step 3 as the āAlert Ruleā
-
Finally, if desired, set a Do Not Disturb period. Alerts that occur during this period will be suppressed until the end of the period, if the Item is still in the alerting state. If the end time is before the start time, the period is assumed to span midnight.
- Click āSaveā and you are done.
If you also want alerts when the sensor comes back online, create another rule with the same settings only leave Invert toggled OFF.
Processing Rule
The rule you created in step 3 above will be called by the rule instantiated from the rule template. The Item that caused the rule to be called will be stored in the context as alertItem
and the state of the Item that caused the rule to be called will be stored in the context as currState
. Access to the context depends on the Rules Language. In JS Scripting, they are just inserted as stand alone variables.
As previously mentioned, the called script can do anything desired. Iāve set mine up to keep track if weāve already alerted on this piece of Equipment (based on the Semantic Model and with the help of a Switch Item) so I only get one alert for the whole equipment, even if more than one sensor went offline for that Equipment.
All code is in JS Scripting ECMAScript 11.
The script condition for that is:
var { itemRegistry } = require('@runtime');
var equipment = actions.Semantics.getEquipment(itemRegistry.getItem(alertItem)); // requires the Java Item, not JS Item
var statusItem = items.getItem(equipment.name + '_Status');
statusItem.state == 'ON';
Notice the use of Design Pattern: Associated Items to obtain the helper Switch Item.
The script action is
var { itemRegistry } = require('@runtime');
var {alerting} = require('rlk_personal');
var logger = log('Sensor Offline');
var equipment = actions.Semantics.getEquipment(itemRegistry.getItem(alertItem)); // requires the Java Item, not JS Item
var statusItem = items.getItem(equipment.name + '_Status').postUpdate('OFF');
alerting.sendAlert(equipment.label + ' has stopped reporting and is likely offline');
alerting
is from my personal library and it sends broadcast alerts and/or email based on a bunch of conditions.
The whole rule is as follows:
configuration: {}
triggers: []
conditions:
- inputs: {}
id: "2"
configuration:
type: application/javascript;version=ECMAScript-2021
script: >
var { itemRegistry } = require('@runtime');
//console.log('Received an alert on ' + alertItem);
var equipment = actions.Semantics.getEquipment(itemRegistry.getItem(alertItem)); // requires the Java Item, not JS Item
var statusItem = items.getItem(equipment.name + '_Status');
//console.log('received an offline alert for equipment ' + equipment.name + ' and status Item ' + statusItem.name);
// Just went offline
statusItem.state == 'ON';
type: script.ScriptCondition
actions:
- inputs: {}
id: "1"
configuration:
type: application/javascript;version=ECMAScript-2021
script: >-
var { itemRegistry } = require('@runtime');
var {alerting} = require('rlk_personal');
var logger = log('Sensor Offline');
var equipment = actions.Semantics.getEquipment(itemRegistry.getItem(alertItem)); // requires the Java Item, not JS Item
var statusItem = items.getItem(equipment.name + '_Status').postUpdate('OFF');
alerting.sendAlert(equipment.label + ' has stopped reporting and is likely offline');
type: script.ScriptAction
Related Design Patterns
Design Pattern | How Itās Used |
---|---|
Design Pattern: Associated Items | Building up the name of an Item to postUpdate or sendCommand based on the name of triggeringItem. Used in the Switch Status Item Example to update the status switch. Used in the complex working example to update the Alerted Switch. |
DEPRECATION WARNING
The remaining examples are kept for historical purposes but they should be considered deprecated and no longer supported.
Expire Binding Example
Item
Group:Switch DeviceSatuses // we must give the Group a type
Number Item1Sensor1 (DeviceStatuses) { expire="10m" }
Contact Item1Sensor2 (DeviceStatuses) { expire="10m" }
Switch Item1Sensor3 (DeviceStatuses) { expire="10m" }
Number Item2Sensor1 (DeviceStatuses) { expire="6m" }
Contact Item2Sensor2 (DeviceStatuses) { expire="6m" }
Switch Item2Sensor3 (DeviceStatuses) { expire="6m" }
The Group is optional and only required if you want to generate an Alert when one of these Items goes to UNDEF
.
Python
Deprecated
None. If you want to generate and Alert when a device goes to UNDEF use the following Rule.
from core.rules import rule
from core.triggers import when
@rule("A sensor stopped reporting")
@when("Member of DeviceStatuses changed to UNDEF")
def offline_alert(event):
# report alert on event.itemName
Rules DSL
None. If you want to generate an Alert when a device goes to UNDEF
use the following Rule.
rule "A sensor stopped reporting"
when
Member of DeviceStatuses changed to UNDEF
then
// report alert on triggeringItem
end
Theory of Operation
Every time the Item gets updated the Expire binding sets/resets a Timer. When the configured amount of time passes without an update, but default the Expire binding will set the Item to UNDEF
. In your Rules, be sure to check for UNDEF
before trying to use the Item.
The Rule above gets triggered any time a member of DeviceStatuses changes to UNDEF where you can generate an Alert or otherwise take remedial actions.
On the Sitemap or HABPanel, Items that are UNDEF
will appear as -
.
Switch Status Item Example
Items
Group:Switch:AND(ON,OFF) DeviceStatuses "Device Status [%s]"
<network>
Group:Switch DeviceUpdates
Switch Device1_Status "Device 1 Status [%s]" <network> (DeviceStatuses)
Number Device1_Sensor1 (DeviceUpdates)
Contact Device1_Sensor2 (DeviceUpdates)
Switch Device1_Sensor3 (DeviceUpdates)
Switch Device2_Status "Device 2 Status [%s]" <network> (DeviceStatuses)
Number Device2_Sensor1 (DeviceUpdates)
Contact Device2_Sensor2 (DeviceUpdates)
Switch Device2_Sensor3 (DeviceUpdates)
Each device has more than one sensor. The online status of the device is rolled up into a status Switch. The Expire binding can and should be used here as well but Iāll show a Timer based solution for completeness. To use the Expire binding based approach use the following on the Status Switch Items:
{ expire="10m,command=OFF" }
Python
from core.rules import rule
from core.triggers import when
from core.actions import ScriptExecution
from org.joda.time import DateTime
timers = {}
@rule("Process sensor update")
@when("Member of DeviceUpdates received update")
def sensor_update(event):
if event.itemName not in timers:
ScriptExecution.createTimer(DateTime.now().plusMinutes(10),
events.sendCommand("{}_Status".format(event.itemName.split("_")[0]), "OFF")
else:
timers[event.itemName].reschedule(DateTime.now().plusSeconds(10))
@rule("A device stopped reporting")
@when("Member of DeviceStatuses received command OFF")
def sensor_offline(event):
# alert
Rules DSL
import java.util.Map
val Map<String, Timer> timers = newHashMap
rule "Process sensor update"
when
Member of DeviceUpdates received update
then
if(timers.get(triggeringItem.name) === null) {
createTimer(now.plusMinutes(10), [ |
sendCommand(triggeringItem.name.split("_").get(0)+"_Status", "OFF")
])
}
else timers.get(triggeringItem.name.reschedule(now.plusSeconds(10)))
end
rule "A device stopped reporting"
when
Member of DeviceStatuses received command OFF
then
// alert
end
Complex Working Example
This is based on old code that was written before the Expire binding was created. I plan on rewriting at some point to use Expire binding and UNDEF
as shown in the first example.
Items
Group:Switch:AND(ON, OFF) gSensorStatus "Sensor's Status [MAP(admin.map):%s]" <network>
Group:Switch gOfflineAlerted
// Sonoffs
Switch vSonoff_3157_Online "Powercord 3157 [MAP(admin.map):%s]" <network> (gResetExpire) { mqtt="<[mosquitto:tele/sonoff-3157/LWT:state:MAP(sonoff.map)", epxire="24h,state=OFF" }
Switch vSonoff_3157_Online_Alerted (gOfflineAlerted)
...
// Nest
Switch vNest_Online "Nest Status [MAP(hvac.map):%s]" <network> (gSensorStatus) { nest="<[thermostats(Entryway).is_online]" }
Switch vNest_Online_Alerted (gOfflineAlerted)
// Network
Switch vNetwork_Cerberos "Cerberos Network [MAP(admin.map):%s]" <network> (gSensorStatus, gResetExpire) { channel="network:servicedevice:cerberos:online", expire="2m" }
Switch vNetwork_Cerberos_Alerted (gOfflineAlerted)
...
// Services
Switch vCerberos_SensorReporter_Online "Cerberos sensorReporter [MAP(admin.map):%s]" <network> (gSensorStatus, gResetExpire) { mqtt="<[mosquitto:status/sensor-reporters:command:OFF:.*cerberos sensorReporter is dead.*],<[mosquitto:status/cerberos/heartbeat/string:command:ON]", expire="11m,command=OFF" }
Switch vCerberos_SensorReporter_Online_Alerted (gOfflineAlerted)
...
// Zwave devices
Switch vMainFloorSmokeCOAlarm_Heartbeat "Main Floor Smoke/CO Alarm is [MAP(admin.map):%s]" <network> (gAlarmStatus, gSensorStatus, gResetExpire) { channel="zwave:device:dongle:node5:alarm_general", expire="24h,command=OFF" }
Switch vMainFloorSmokeCOAlarm_Heartbeat_Alerted (gOfflineAlerted)
...
Examples of a variety of sensor types are shown above. Each has an associated Offline Alerted Item to prevent multiple alerts about the device over a given period of time.
Python
Note: this version of the code differs slightly from the Rules version below. It also uses Timer Manager from the Helper Libraries to create and maintain the timers. The code supports antiflapping timers and uses Design Pattern: Using Item Metadata as an Alternative to Several DPs to keep track of whether weāve alerted that a device is offline so we can alert again when it returns back online.
"""Rules to keep track of whether or not a device has gone offline or not and
generate an alert message when it goes offline or return online.
Author: Rich Koshak
Functions:
- alert_timer_expired: Called when a device changes state and stays that way
for enough time that it's not flapping.
- alert_timer_flapping: Called when a device changes state too rapidly
indicating it's flapping.
- status_alert: Rule called when a sensor changes state and sends an alert
if necessary.
- status_reminder: Rule triggered at 8am every morning to issue a report
with all the known offline devices.
- pm_online: Called when the Zwave power meter Thing changes state.
- heartbeat: Called when a member of SensorEvents receives an update
indicating the device is online.
"""
from threading import Timer
from core.rules import rule
from core.triggers import when
from core.metadata import get_key_value, set_metadata
from core.actions import Transformation
from core.log import log_traceback
from personal.util import send_info, get_name
from personal.timer_mgr import TimerMgr
timers = TimerMgr()
@log_traceback
def alert_timer_expired(itemName, name, origState, log):
"""Called when we determine that a sensor's online state is not flapping.
Arguments:
- itemName: Name of the sensor's Item.
- name: Human friendly name of the sensor.
- origState: The state that originally triggered the Timer to check for
flapping.
- log: Logger from the triggering Rule.
"""
on_off_map = { ON: 'online', OFF: 'offline' }
alerted = get_key_value(itemName, "Alert", "alerted") or "OFF"
if items[itemName] != origState:
log.warning("In alert_timer_expired and {}'s current state of {} is "
"different from it's original state of {}."
.format(name, items[itemName], origState))
# If our alerted flag equals the Item's state we need to generate an alert
if str(items[itemName]) == alerted:
send_info("{} is now {}".format(name, on_off_map[items[itemName]]), log)
set_metadata(itemName,
"Alert",
{ "alerted": 'OFF' if alerted == 'ON'else 'ON' },
overwrite=False)
else:
log.warning("Alert timer expired but curr state doesn't match alert {} "
"!= {}".format(name, items[itemName], alerted))
def alert_timer_flapping(itemName, name, log):
"""Called when a sensor's online state appears to be flapping.
Arguments:
- itemName: Name of the sensor Item.
- name: Human friendly name of the sensor.
- log: Logger from the triggering Rule.
"""
alerted = get_key_value(itemName, "Alert", "alerted") or "OFF"
log.warning("{} is flapping! Alerted = {} and current state = {}"
.format(name, alerted, items[itemName]))
@rule("Device online/offline",
description="A device we track it's online/offline status changed state",
tags=["admin"])
@when("Member of gSensorStatus changed")
def status_alert(event):
"""Triggered when a member of gSensorStatus changes. We don't care if the
sensor changed from a UnDefType. Set a Timer to see if the device is
flapping.
"""
name = get_name(event.itemName)
if isinstance(event.oldItemState, UnDefType):
status_alert.log.warning("{} is in an undef type, canceling any running "
"timers".format(name))
timers.cancel(event.itemName)
return
timers.check(event.itemName,
60000,
lambda: alert_timer_expired(event.itemName,
name,
event.itemState,
status_alert.log),
lambda: alert_timer_flapping(event.itemName,
name,
status_alert.log),
reschedule=True)
@rule("System status reminder",
description=("Send a message with a list of offline sensors at 08:00 and "
"System start"),
tags=["admin"])
@when("Time cron 0 0 8 * * ?")
@when("System started")
def status_reminder(event):
"""Called at system start and at 8 AM and generates a report of the known
offline sensors
"""
numNull = len([i for i in ir.getItem("gSensorStatus").members
if isinstance(i.state, UnDefType)])
if numNull > 0:
status_reminder.log.warning("There are {} sensors in an unknown state!"
.format(numNull))
offline = [i for i in ir.getItem("gSensorStatus").members if i.state == OFF]
offline.sort()
if len(offline) == 0:
status_reminder.log.info("All sensors are online")
return
offline_str = ", ".join(["{}".format(get_name(s.name)) for s in offline ])
offline_message = ("The following sensors are known to be offline: {}"
.format(offline_str))
for sensor in offline:
set_metadata(sensor.name, "Alert", { "alerted" : "ON"}, overwrite=False)
send_info(offline_message, status_reminder.log)
Theory of Operation
When a member of gSensorStatus changes we trigger the online/offline Rule.
If the previous state was NULL or UNDEF we cancel any running Timers and ignore the event.
timers.check()
causes the Timer Manager to look to see if there is already a timer scheduled for event.itemName. If there is, it reschedules it for one minute into the future and the āalert_timer_flappingā lambda getās called allowing us to do something in the case where the sensor is flapping (in this case we just log about it). If there is no timer, it creates one to go off in a minute.
After a minute without flapping, the Timer Manager will the call alert_timer_expired lambda. This function gets whether or not weāve alerted on this sensorās going offline from the Item Metadata. Then it checks to see if the Itemās current state differs from the state that caused the Timer to be created in the first place. If not we exit. If we alerted when the device went offline, we send an alert and set the Item metadata.
At system startup and once a day the āSystem status reminderā rule runs to generate a report listing all the devices that are currently offline. The friendly name for each sensor is pulled from the Item Metadata. send_info
and get_name
are both simple functions in my personal library and imported.
from core.actions import NotificationAction
from core.jsr223.scope import actions
from configuration import admin_email # automation/lib/python/configuration.py
from core.metadata import get_value
def send_info(message, logger):
"""Sends an info level message by sending an email and logging the message
at the info level.
Arguments:
- message: The String to deliver and log out at the info level.
- logger: The logger used to log out the info level alert.
"""
out = str(message)
logger.info("[INFO ALERT] {}".format(message))
NotificationAction.sendNotification(admin_email, out)
(actions.get("mail", "mail:smtp:gmail")
.sendMail(admin_email, "openHAB Info", out))
def get_name(itemName):
"""Returns the 'name' metadata value or the itemName if there isn't one.
Arguments:
itemName: The name of the Item
Returns:
None if the item doesn't exist TODO: verify.
"""
return get_value(itemName, "name") or itemName
Rules DSL
import org.eclipse.smarthome.model.script.ScriptServiceUtil
import java.util.Map
val Map<String, Timer> timers = newHashMap
rule "A sensor changed its online state2"
when
Member of gSensorStatus changed
then
if(previousState == NULL) return;
val alerted = ScriptServiceUtil.getItemRegistry.getItem(triggeringItem.name+"_Alerted") as SwitchItem
if(alerted === null) {
logError("admin", "Cannot find Item " + triggeringItem.name+"_Alerted")
aInfo.sendCommand(triggeringItem.name + " doesn't have an alerted flag, it is now " + transform("MAP", "admin.map", triggeringItem.state.toString) + "!")
return;
}
var n = transform("MAP", "admin.map", triggeringItem.name)
val name = if(n == "") triggeringItem.name else n
// If we are flapping, reschedule the timer and exit
if(timers.get(triggeringItem.name) !== null) {
timers.get(triggeringItem.name).reschedule(now.plusMinutes(1))
logWarn("admin", name + " is flapping!")
return;
}
if(alerted.state == triggeringItem.state) {
val currState = triggeringItem.state
// wait one minute before alerting to make sure it isn't flapping
timers.put(triggeringItem.name, createTimer(now.plusMinutes(1), [ |
// If the current state of the Item matches the saved state after 5 minutes send the alert
if(triggeringItem.state == currState) {
aInfo.sendCommand(name + " is now " + transform("MAP", "admin.map", triggeringItem.state.toString) + "!")
alerted.postUpdate(if(currState == ON) OFF else ON)
}
timers.put(triggeringItem.name, null)
]))
}
end
rule "Reminder at 08:00 and system start"
when
Time cron "0 0 8 * * ? *" or
System started
then
val numNull = gSensorStatus.members.filter[ sensor | sensor.state == NULL ].size
if( numNull > 0) logWarn("admin", "There are " + numNull + " sensors in an unknown state")
val offline = gSensorStatus.members.filter[ sensor | sensor.state == OFF ]
if(offline.size == 0) return;
val message = new StringBuilder
message.append("The following sensors are known to be offline: ")
offline.forEach[ sensor |
var name = transform("MAP", "admin.map", sensor.name)
if(name == "") name = sensor.name
message.append(name)
message.append(", ")
gOfflineAlerted.members.filter[ a | a.name==sensor.name+"_Alerted" ].head.postUpdate(ON)
]
message.delete(message.length-2, message.length)
aInfo.sendCommand(message.toString)
end
Theory of Operation
When a member of gSensorStatus changes state if the previous state was NULL we ignore it.
Next we get the Alerted Switch to see if we have already alerted on this Item. We use Design Pattern: Human Readable Names in Messages to transform the Alerted Itemās name to something more meaningful for logs and alert messages.
If the sensors are flapping, we reschedule the timer and wait a bit for the flapping to stop, logging the fact that it is flapping of course.
If the alerted Switch matches the triggeringItem then that means that the triggeringItem changed state and we need to generate a new alert. Set a Timer and if the Item remains in the same state send an alert using Design Pattern: Separation of Behaviors.
The second Rule produces a digest listing all the offline sensors every morning at 08:00 and when OH restarts.
Advantages
Provides a generic and expandable way to get alerted or execute logic when a periodically reporting device goes silent for a period of time. One need only create some Groups and add a simple rule to add monitoring for a new device. It works with any sort of device. Or if using the Expire binding, one doesnāt even need Rules.