Error Handling In OpenHAB

robconnolly · November 5, 2015, 3:32am

Not sure if this is the best place for this in terms of topics, but this location made the most sense to me.

My basic question is:

How are error cases handled within OpenHAB?

Let me propose an example from my own setup:

Let’s say that I have an item, which uses the HTTP binding to poll a remote device every few minutes and retrieve a value. How do I handle the case where the device is down and not responding to HTTP requests?

Currently I see the error printed to my log files, however I’d like to be able to handle this inside a rule, so I could send a notification or something.

The only two approaches I’ve been able to come up with so far are:

Poll the device separately with the NetworkHealth binding, which introduces more network traffic and may or may not reflect the real state of a HTTP request to the device.
Configure the OpenHAB logging system to send the error message via email (which should be doable in logback). However, this would send all error messages, not just the ones I’m interested in.

Any thoughts appreciated?

bob_dickenson · November 5, 2015, 10:08am

Just off top of my head, here are a couple of ideas (ie have not tested either, so may not work and likely more elegant ways exist):

Couple your periodic poll so that the same rule used exec binding to run a script which “tails” the openhab log where the error would show if it occurred, pipe the output to a grep/find for the offending phrase/error message, and use the boolean result or count result to determine whether to send your notification. (I think this might get ugly though, particularly on surfacing the result of the grep/find back to OH)
Another approach would be to handle the entire interaction with the device using the JSR-223 script binding.

If I had to pick one to try first, I think it would be the JSR approach since it requires less popping back and forth between levels of abstraction and would give you finer control as you develop the rule.

gersilex · November 5, 2015, 10:28am

You can use timers to react on not receiving an ON (Okay) status from the HTTP binding in a specified time.

Simply reset the Timer every time you receive an update on that polled Item. Depending on how important this is, you can give it some grace time before reacting on the failure. I recommend 1,5x the polling interval.

Could look like this:

var Timer tAlarm
var int iAlarmGraceTime = 90 // 90 seconds. Because I poll the HTTP item every minute

rule "HTTP Alarm Gateway error handler"
when Item myAlarmHTTPinterface received update
then
    if(tAlarm != null){
        println("Received update, rescheduling timer")
        tAlarm.reschedule(now.plusSeconds(tAlarmGraceTime))
    }
    else
    {
        tAlarm = createTimer(now.plusSeconds(iAlarmGraceTime))[|
            println("No update from Alarm Gateway for " + iAlarmGraceTime + " seconds! Alarming...")
            sendCommand(RedAlarmLights, ON)
            sendCommand(Siren, ON)
            notifyMyAndroid("Alarm System", "No update for " + iAlarmGraceTime + "! Alarm was started!")
        ]
    }
end

Timers run in a seperate thread. This ensures it is run at the correct time- Timers do not persist through openHAB restarts, but it will be recreated, as soon as the first update to your HTTP item is received

KjetilA · November 5, 2015, 11:27am

In the world of industrial automation, it is quite common to use a concept called OPC for accessing data items in the PLC from a client (e.g. a PC running the HMI). In OPC every data item is a tuple (or triple?) of the following attributes: status, timestamp, and value. The status is typically GOOD, UNCERTAIN or BAD.

The concept of OPC is not directly applicable to the openHAB concept, however, the idea of having meta-information (like status and timestamp) associated with the data item (the value) is applicable and could be used by bindings to indicate the freshness/usefulness of the value.

I realize this is a change that cannot be quickly introduced - but it could maybe be considered in the context of openHAB2?

rlkoshak · November 5, 2015, 3:50pm

This is how I did it for awhile. Eventually I moved to MQTT and use the Last Will and Testament to tell me when it is down. But when I was using NH, I would set a timer when NH said it was down and if it is still down after the timer goes off I send the alert and take remedial action.

You can make this apply to just the HTTP binding’s errors if it is the direction you want to go.

If you are using the sendHttp* actions you will get the return String which you can see if it is an error.

If using the HTTP binding, I like @gersilex 's approach.

rlkoshak · November 5, 2015, 3:54pm

From a openHAB perspective, how does one determine whether a value is GOOD, UNCERTAIN, or BAD? Do you do it based on the age of the data (i.e. an Item starts out GOOD when first updated, after a certain amount of time it transitions to UNCERTAIN if there hasn’t been an update, as more time passes without an update it transitions to BAD)? Or is there some other criteria used?

I like the concept but it seems like you would need to configure it for every item individually. And the bindings would need to support it, right?

robconnolly · November 5, 2015, 8:04pm

That’s how I’m doing it for some other devices and it works great.

This seems like a nice way to approach the problem. The only issue is the need to create a new rule for each device. Perhaps I can use a lambda function to make it more generic.

Thanks for all the replies.

rlkoshak · November 5, 2015, 8:37pm

Absolutely you can create a lambda. And if you put your devices in a group you can consolidate this down to just one rule with no need for the lambda.

Items:

Group gHttpDevices
Item httpPollDevice1 ... (gHttpDevices) ...
Item httpPollDecice2 ... (gHttpDevices) ...

Rules:

val Map<SwitchItem, Timer> timers = newHashMap
val ReentrantLock lock = new ReentrantLock

rule "HTTP Devices Updated"
when
    Item gHttpDevices received update
then
    try {
        lock.lock
        Thread::sleep(100) // give persistence time to save the update
        gHttpDevices.members.filter(sw|sw.changedSince(now.minusSeconds(1))).forEach[sw |
            if(timers.get(sw) == null) {
                timers.put(createTimer(now.plusMinutes(5)[ | 
                    // Device stopped responding, send alert and/or take action
                ]))
            }
            else timers.get(sw).reschedule(now.plusMinutes(5))
        ]
    } catch(Throwable t) {
        logError("Rule", "Error processing HTTP polling update")
    } finally {
        lock.unlock
    }
end

The above is more of an outline than anything. I’m sure I messed something up. I usually do. But it should get you started.

The only warning is due to the way openHAB propagates changes to a group, this rule will be triggered multiple times for each update to an item in the group. But because all you are doing is keeping a timer from going off it probably doesn’t matter. However this is why I put a lock around the logic, so it can only be executing one at a time.

You may want to create a system started rule to initially create and kick off the timers to cover the case where one of your HTTP polling devices is down when openHAB first comes up.

robconnolly · November 5, 2015, 8:53pm

Nice, I like it. Thanks!