Monitor the OpenHAB runtime state - what tool do you use?

Hi,

Now OpenHAB becomes more and more important for my daily life, it controls heating, hot water production, sunshade open/close, charging el car etc, the fault tolerent of my installation becomes less and less. In general I’m very happy with stability and reliability of OH, even trust it more than HomeKit (automation there is a mystery).
I currently have a rule defined in OH to periodically check if any Thing is OFFLINE, it simply calls the RESTful API and get Things in JSON and iterate over all Things and make a list of whatever is offline. This approach works well, however it is against the very fundamental idea of monitoring - I’m basically use the system to be monitored to monitor itself. If the system stops working, the monitoring also stops working.
Given that the way how I monitor is pretty much standard (call a RESTful API and check the content), I’d like to ask if anybody uses a better way of monitoring OpenHAB? Which tool do you use?
I’m running OH as Docker container on my Linux-based NAS.

You could use Thing Status Reporting [4.0.0.0;4.9.9.9] which will react immediately when a Thing goes OFFLINE instead of polling on the REST API. It’s simpler to implement too as the Thing status change is handled for you and all you need to do is process the change.

I have a two tiered monitoring system. But I also run a home lab with OH being just one among many services that I want to keep track of.

Tier 1: OH monitors home automation relevant devices and services using the rule template I linked to above, and the Network binding, and Threshold Alert and Open Reminder [4.0.0.0;4.9.9.9] configured to alert when a sensor that normally reports updated relatively frequently stops updating the Item.

I used the semantic model and each equipment has a “Status” Item which gets set to OFF if any of the above checks indicate that the device has gone offline. I use an alerting rule and Service Status Standalone Widget on my MainUI page to keep track of when something goes offline. But importantly I have rules that will do something differently based on whether the needed services are online or not using that status Item (e.g. if I try to open the garage door but the garage door controller is offline I’ll generate an alert to give feedback so the users knows why the door isn’t opening.

Tier 2: I use Zabbix to monitor the overall health and status of all my machines and services. Zabbix sends me emails when there’s a problem (e.g. service is down, RAM is running out, disk space is low, etc.). If OH itself ever goes offline I’ll know about it from Zabbix.

About the only thing I can’t get alerts on is when Internet overall goes offline, but I get alerts from Nest, Residio, and other sources when that happens so I don’t feel the need to host Zabbix on a VSP off premisis.

2 Likes

Thanks will look into more details and maybe post more questions here if you don’t mind.