Need advice for monitoring solution

Oliver2 · May 23, 2021, 2:16pm

I am currently putting together a concept for a small monitoring solution which I want to introduce here and ask you for some feedback before I start. Thus avoiding to find out at a later time, that I could have done better.
Monitored elements are

Things in general (i.e. if they are online or not)
some of these things will be monitored additionally by their item states like
- battery levels of smoke detectors
- smoke detector status (i.e. smoke alarm)
- device temperature (especially shelly devices)
- CPU, RAM, temperature of OH Server
- and some more

I defined two EVENT types:

FAULT (thing offline, CPU load exceeding threshold, etc)
ALARM (smoke detector alarm (true/false), internal temperature of shelly device >85°, etc)

each item which is identified to create a fault event or an alarm is member of either a fault group or alarm group.

Goal:
the goal is to have a MainUI element (cell, card) with two badges which contain the amount (count) of items in fault state or alarm state. By clicking on that cell a popup opens with detailed information (item name, item state, etc)

No big deal so far.

The question now is how to create an event when an item (e.g. shelly internal temperature) is exceeding a threshold (85°C). Please bear in mind that the solution has to be somehow maintainable.

sure, I could create a “huge” rule. everytime a value of one of those 2 group members changes I can compare it to a threshold and .postUpdate a 2nd (proxy) item. But that would mean that every time a temperature value has changed (e.g. from 30 to 31 degrees) the rule is launched.
on the other hand I could create either item metadata (state description or command options?) or profiles (map or hysteresis?). But that means I will not be able to work with its original analogue value anymore.

This is where I am uncertain and look for some useful tips.

yfaway · May 23, 2021, 2:56pm

I don’t quite understand your second option.

Generally, a home automation system will outgrow your initial intention, so you want to be rather generic. For example, today you might just care about the temperature crossing a threshold, but what if tomorrow you have some other rules that depend on changes in temperature, but not necessarily have anything to do with your thresholds.

There is nothing wrong with invoking a rule on every value changes. That’s what they are designed for. Now, certainly you can push some logic down to the layer below. For example, I have some ESP sensors that I hard code the thresholds in the C/C++ code. Those sensors will only trigger alarm when the thresholds are crossed. They simplify the logic at the OpenHab layer at the cost of less flexibility. If I need to change the threshold I have to re-program the sensors.

My point is that there is no free lunch. In some cases you might want to generalize even further at the OpenHab layer, especially as your system gets more complicated. Here are the events I have in my system.

Seaside · May 23, 2021, 3:15pm

Keep the rules as simple as possible. Trigger on some condition, send a command to another item that will trigger another rule etc.

Looping over groups and doing generic things using lambdas in rules dsl is usually very hard to get right. If you find yourself in need of more advanced rules I would look at some other options like.jython, hapapp etc.
The major drawback with rules dsl imo is that it’s difficult to share code and solve things in a generic way.

/s

Seaside · May 23, 2021, 3:21pm

Also why do you need to loop over all things and check online status? I’m a bit curious what the use case is. If I have for instance a rpi as an example that I want to monitor I would rather have its online status mapped in a channel connected to an item rather than a thing.

Regards S

rossko57 · May 23, 2021, 3:41pm

It doesn’t sound huge.

rule "check overheat"
when
   Member of mySensors changed
then
   if (triggeringItem.state > 85|°C) {
       flashingLight.sendCommand(ON)
      logInfo("Alarm", "Device " + triggeringItem.name + " is on fire!")
   }
end

That’s fine; it’s an event-driven system and that is how to use it.

JustinG · May 23, 2021, 4:42pm

I’m not sure I understand the objection to using profiles here. You can have two different items connected to the same channel and have one with the profile and one without:

Temperature Channel A -> "Temperature" Number:Temperature item
Temperature Channel A -> Scale Profile (or js profile) -> "Temperature alarm" Switch item

That way you still have an item with the original state and a simple switch for the alarm.

Oliver2 · May 23, 2021, 5:15pm

Many thanks for all your feedback.

Well, that sounds on the first sight more complicated to me. Isn’t it easier for maintenance purposes (adding new items, deleting items, changing thresholds, …) to have the whole logic within one rule?

take a smoke detector as an example. I am interested in two information:

is it present at all, i.e. availble to OH?
Is its state normal or alarm
battery status
for the first information I just need the thing’s status. How I get its state could also be part of this discussion. @rossko57 pointed out a cool solution where a rule is triggered if a thing changes its state. no need to create channels, items and maybe another rule.

It will if you add quite a lot of items many of them have their own threshold where you start to raise an event.

ok - to me an event is when a value (item state) exceeds a threshold resulting in actions I need to take. events are categorized like informational message, fault, pre-alarm, alarm. kind of “logical layer” or interpretation of an event compared to a technological “event” within OH when a value changes from 30 to 31.

Oliver2 · May 23, 2021, 5:27pm

Hey Justin,
you got my objection. as you pointed out that would mean I have to create a 2nd item per channel.

that is what this thread is about (and more). creating events via rules (with all their pros&cons) or 2nd channel item (with all its pros&cons). maybe there are some more ways?
may I ask you all what you would recommend? I have some experience now with OH but currently I cannot see an advantage of one way over the other.

Oliver2 · May 23, 2021, 5:30pm

how do you do that?

rossko57 · May 23, 2021, 6:38pm

Going back to -

That greatly depends what “it” is. There is no The Way solution. Some devices/services might be polled, you can detect when they go missing, but maybe not in the same way for every device. Some devices/services send periodic reports, you can detect when those stop coming. Some devices/services may send completely asynchronously, you’ll have to make guesses and jump hoops.

Really, this is what Items are about - you can collect all that disparate stuff, derived from several different methods, into Items uniformly representing “present”.

Alright, but you are going to have to adjust to openHABs much more simplistic view where practically everything is an event - e.g. updating a temperature reading. You get the choice whether to do anything about that, or only if it is a change, or only if the difference is blah, but the events happen anyway.

Well, I reckon that’s another line.

`rule "check overheat"
when
   Member of mySensors changed
then
   thresholdItem = ScriptServiceUtil.getItemRegistry.getItem(triggeringItemName + "_threshold")
   if (triggeringItem.state > thresholdItem.state) {
       flashingLight.sendCommand(ON)
      logInfo("Alarm", "Device " + triggeringItem.name + " is on fire!")
   }
end`

That’s trading simplicity of rule against giving every sensor Item “blah” a companion Item “blah_threshold”.
It could be enhanced to use the companion if it exists, or use some default it it doesn’t.
There’s other ways to do this kind of thing,most with no great runtime or maintenance advantage.

Oliver2 · May 23, 2021, 6:52pm

true, but we are still not talking about the same. In the end I will have:
a couple of dimmers: temperature thresholdItemA
Systeminfo binding: in total 4-6 thresholdItems
Network devices (NAS, printer, router, switches, repeater): 20 more thresholdItems
and many many more I am currently not aware of

JustinG · May 24, 2021, 2:52am

At some point, even if you try to track the information about your devices by rules, those states will have to be in items, especially since you were originally talking about have notifications with the number of devices in various states. There is no point at all in re-inventing the wheel and trying to create your own aggregation method; you won’t do better than the built-in groups aggregation functions and those groups will have to have items representing those online/offline states as members. Yes, it seems like a lot of work up front depending on how many devices you’re talking about, but the benefits significantly outweigh the costs. Having all this information as items even makes whatever rules you eventually need more efficient as rules can get specific item arrays based on tags or group membership.

I already have a limited, similar setup already. Each device or service that I wish to track has not only an extra online item but also an online timestamp item so that I can know how long it has been offline if it goes offline. With tags, these are divided into critical items and non-critical items and then I have a widget that shows the when non-critical items (such as personal electronics) are online, and goes into an alarm state when one of the critical items is offline.

Some of these items are directly from channels from the things themselves or channels with additional profiles, others have to be created via a different binding altogether such as the http or network binding. As Rich is fond of saying, in terms of system overhead, “Items don’t really cost you anything.”

Oliver2 · May 24, 2021, 9:28am

many many thanks to rossko and especially you, Justin, that you took the time for your answer.
your answer is really valuable to me because if an expert like you goes down the path, creating for each information a seperate item (despite the fact that it is more work) instead of creating a complex rule than I know this path could not be that wrong.
again, many thanks for sharing your best practices!

Seaside · May 24, 2021, 9:43am

I was about to write you an answer JustinG pretty much covered it.
How I track online status depends a bit about which binding it is. For instance I have the a binding for my wireless clients with a separate online status in a channel. For other things I have rules which updates a timestamp every time a items is updated, I can then look at that timestamp. I also tend to have a lot of MQTT-things, where I in some cases post the online status or timestamp directly from mqtt to the thing.

For instance for a timestamp update it could look like this:

rule "updateMyTimestamp"
when
    Item MyMonitoredItem received update 
then
    MyTimestamp.postUpdate(new DateTimeType())
end

Oliver2 · May 24, 2021, 9:47am

may I ask one more last question: is it possible to map a thing status directly to an item?
I know how to do it by rule but thought it might be possible by profile

Oliver2 · May 24, 2021, 9:47am

many thanks seaside. will include all in my small project

rossko57 · May 24, 2021, 9:57am

As a general comment, I notice that quite often newish users have a resistance to “making more Items” as though there’s some cost to it. I’m not sure why.

Of course things can get in a big jumble quickly - but that’s the same risk as making more rules, more UI pages, more Things.
The key to sanity I think is to envisage some structure to start with, e.g. simplistic naming conventions “mySensor” has a partner “mySensor_available”. And - be prepared to revisit and e-jig everything at a future date

No.
If it was important, the binding authour should provide you a channel for that purpose.

Be careful about interpreting what a Thing status means; look upon a Thing as the pathway to some device/service, not a model of the device.
Example;
a battery powered smoke alarm may sleep for days. openHABs Thing can and should show ONLINE throughout that. So far as we know, all is well.
Example;
Once an hour, we fetch a weather report from some remote service. In-between, the Thing remains ONLINE - as far as we know, everything is good.
Then your router catches fire, internet connection is lost. Now your Thing should show OFFLINE - it’s unusable, even though the remote service itself is fine.

Oliver2 · May 24, 2021, 12:36pm

true. I try to keep the system as lean as possible.

Thanks for your point of view - I agree to most of it. If a thing is online it doesn’t necessarily mean that everything is working properly. However if a thing is offline it tells me that anythings is not working properly. That’s the way I see it.
And that is the reason why I am additionally monitoring a value item (like temperature, signal strength, etc).
I do not want to rely purey on items as it is not certain that items go into a state like “UNDEF”, when a thing becomes “OFFLINE”.
That’s where Justins suggestion comes in which I definitely also want to implment.

rlkoshak · May 24, 2021, 4:08pm

Ultimately, openHAB is a “Home Automation Bus”. It’s designed to be first and foremost an automation system. Yes, it supports some great status UI elements and the like but where it excels is “something happened, do this in response” type systems. It’s going to require far more work and be far inferior as a monitoring system compared to systems actually designed to do this job (e.g. Zabbix, Prometheus, ELK stack, etc.)

This is likely the root of your overall objections. When one looks at how to make OH work for this sort of thing there is almost always a post like this saying “but that’s so much work! There has to be a better way!” Sadly, there isn’t. This is not the job that OH was designed to do. It can do it, but it’s going to be a lot of work.

My recommendation is and remains to only track the status of devices that actually impact the automations. For example, if I send a command to open the garage but the garage door controller is offline return an alert telling me so I know why the door didn’t open. For that all I need is a single Item and the Network binding. OH doesn’t need the % CPU use or CPU temperature or anything like that. All OH needs to know is if it’s online. Use something like Zabbix to monitor everything else. It’ll be less work to set up over all and provide a far superior set of visualizations and alerting.

No, only can be done by a rule. There is no Channel that represents the Thing’s status, though that might be an elegant way to deal with the problem of tracking the status of Things. Maybe someone who has a need for tracking Thing statuses should file an issue and see what the devs think, even better post a PR if they can code a bit.

It “feels” like it would be easier to add a default Channel to all Things (i.e. in the base classes) than to do something like creating the concept of Groups for Things and other ideas that have been floated over the years.

An even more important example is MQTT. By default, if the MQTT Broker Thing is online (i.e. in communication with the broker) all the Things will be ONLINE whether or not the actual devices on the other side of the broker are online. Now there is a way to use the LWT topics but that requires additional setup on both ends. So all your devices could be offline but as long as the broker is up OH will treat all the devices as online too. Because as rossko57 so aptly stated, the status means the communications channel is still up, not necessarily the end device.

rossko57 · May 24, 2021, 4:12pm

Which I think would just fire up the endless “my device XX has exploded but the Thing is still ONLINE” traffic