Hi,
In this thread I want to share my solution for sensor ‘alive’ or sensor ‘heartbeat’ monitoring.
Preface
The initial trigger for this was a failure of my zwave stick. It stopped working for any reason. Because of that OH missed some room temperature values and that caused my room automation not to work properly for some days until I realized that something is going wrong.
After that I was looking for a possibility to monitor my zigbee and zwave thermostats which are reporting room temperature values, because I want to get informed about such problems just in time.
I started to write some rule-code and looked around a little bit in the forums. I found some possibilities to realise my requirements but wanted to find an easy and simple solution (at least it seems to me )
.
Requirement
- Monitoring of sensor values (=items) to recognize that an item has not been updated within a maximum period of time. E.g. “Item_XY has not been updated for 1 hour”.
I call this a “heartbeat timeout”. - Notification as a Telegram message
- “Easy” to implement
.
Thoughts
My first idea was to create a rule for every item, save a timestamp and check it periodically, etc. But that seemed much too ineffective to me.
Then I wanted to use persistence service and check for each item if its value has not changed for a certain period of time. While I was trying that I struggled with a known issue that the .changedSince() method does not work with influxdb what I am using. And also this approach was not really satisfying.
Alternatively, I wanted to use the .lastUpdate () method. The problem is that my sensor values, which are recorded in the influxdb persistence service, are logged every minute, so lastUpdate gives me timestamps one minute ago no matter if an item update has really been received.
But I also use the mapdb persistence service. So far only to log some items on every change so that they can be restored after a reset.
Decisive idea
Now I have created an extra group and have set the persistence trigger to everyUpdate in the mapdb.persistence file for all items in this group. So I can use the .lastUpdate () method for these items to check when they were updated the last time.
Another advantage of this group: I can loop through all items in this group in one rule and don’t have to process each item individually.
.
Final solution
Now my final solution. It consists mainly of three parts:
- An item group
- A persistence service
- A rule for the group
In groups.items I have a group defined:
//Items to be monitored for heartbeat. See system.rules
Group G_MonitorHeartbeat_A "Heartbeat monit. category A"
Note: I have “category A” in the name, because there is also “B” and “C”, because I use different heartbeats (different time periods) for different sensors
.
My Items which should be monitored become members of this group. Example:
Number I_FlOG_HVAC_SensorTemperature "Current temperature [%.1f °]" (G_Hvac, G_IfxEvMin, G_MonitorHeartbeat_A) {channel="zwave:device:e2101bdd:node5:sensor_temperature"}
Note the G_MonitorHeartbeat_A
group of the item.
.
In mapdb.persist I define persistence for all items of this group G_MonitorHeartbeat_A
:
Items {
//All items of this group are collected on every change and restored on startup
G_Restore* : strategy = everyChange, restoreOnStartup
//All items of this group are collected on every update and restored on startup
//This group is for heartbeat monitoring to detect last update of an item
G_MonitorHeartbeat_A* : strategy = everyUpdate, restoreOnStartup
//All items of this group are collected on every update and restored on startup
//This group is for heartbeat monitoring to detect last update of an item
G_MonitorHeartbeat_B* : strategy = everyUpdate, restoreOnStartup
}
Warning: This only works if the affected items are not also saved to mapdb persistence service via cron (periodically). They must only be saved on update!
.
Now the rule:
The rule trigger is a cron statement:
Time cron "0 0/1 * * * ?" // every 1 minute
When the rule triggers, we loop throug all items of our group:
G_MonitorHeartbeat_A.allMembers.forEach[item |
//Job to do here
]
Now in our for loop the core of it all are these lines:
var DateTime LastUpd = item.lastUpdate("mapdb").toDateTime
if (LastUpd.plusHours(3).isBeforeNow) {
//Heartbeat timeout exceeded
}
With these lines we check the timestamp of the last update of this item in the mapdb persistence service. If this timestamp plus 3 hours is before now than we know that the item has not been refreshed for at least 3 hours.
.
Retriggering
There is still one challenge to overcome:
When the rule detects an heartbeat timeout I want to get informed via telegram message. But I do not want to get the same message every minute as long as the timeout stays exceeded.
Conclusion: I need a way to remember the state of my heartbeat check for every item so that the rule can check if it’s a new or a known timeout violation.
This is done by a list of the item names that have triggered an heartbeat timeout:
import java.util.List
var List<String> Heartbeat_A_TriggerList = newArrayList()
Note: The list has to be defined outside the rule!
Now we can add an item name to the list by HeartbeatTriggerList.add(item.name)
, remove it with HeartbeatTriggerList.remove(item.name)
and check if it is already in that list by if (HeartbeatTriggerList.contains(item.name))
.
Finally my complete rule:
rule "System_HeartbeatMonitoring_A"
when
Time cron "0 0/1 * * * ?" // every 1 minute
then
logInfo("System_HeartbeatMonitoring3h", "Checking all items of group " + G_MonitorHeartbeat_A.label + " for heartbeat timeout...")
G_MonitorHeartbeat_A.allMembers.forEach[item |
var DateTime LastUpd = item.lastUpdate("mapdb").toDateTime
if (LastUpd.plusHours(3).isBeforeNow) {
//Heartbeat timeout exceeded
if (Heartbeat_A_TriggerList.contains(item.name)) {
//Already triggered this alarm
logInfo("System_HeartbeatMonitoring3h", "Still heartbeat alarm for " + item.label + " (" + item.name + "). Last update: " + LastUpd.toString("dd.MM.yyyy HH:mm:ss (Z)"))
}
else {
//New alarm
logInfo("System_HeartbeatMonitoring3h", "Heartbeat alarm for " + item.label + " (" + item.name + ")! Last update: " + LastUpd.toString("dd.MM.yyyy HH:mm:ss (Z)"))
//Add to trigger list
Heartbeat_A_TriggerList.add(item.name)
sendTelegram("JanOnly", "\u203c \ud83d\ude32 Heartbeatalarm!\n" + "Item: " + item.label + " (" + item.name + ")\n" + "Letztes update: " + LastUpd.toString("dd.MM.yyyy HH:mm:ss (Z)"))
}
}
else {
if (Heartbeat_A_TriggerList.contains(item.name)) {
//This one has been triggered and is now back here again. :-)
logInfo("System_HeartbeatMonitoring3h", "Heartbeatalarm terminated :-) for " + item.label + " (" + item.name + "). Last update: " + LastUpd.toString("dd.MM.yyyy HH:mm:ss (Z)"))
sendTelegram("JanOnly", "\ud83d\ude0a Heartbeatalarm beendet.\n" + "Item: " + item.label + " (" + item.name + ")\n" + "Letztes update: " + LastUpd.toString("dd.MM.yyyy HH:mm:ss (Z)"))
//remove from trigger list
Heartbeat_A_TriggerList.remove(item.name)
}
else {
//Everthing is fine
//logInfo("System_HeartbeatMonitoring3h", "No Heartbeatalarm :-) for " + item.label + " (" + item.name + "). Last update: " + LastUpd.toString("dd.MM.yyyy HH:mm:ss (Z)"))
}
}
]
end
.
I have used this whole mechanism with several groups to realize different timeouts like 1 hour, 3 hours, etc.
Finally
I hope that I have not created a duplicate with my thread and I am happy if someone can use it to get some ideas.