Modbus error management in openHAB

This is a more developed solution to automatically disable failed Modbus Things, and re-enable periodically to attempt recovery.
There are two parts to it.

But first, create a dummy Item to use as a communication tool between the rules

// catcher for Modbus Poller Thing status changes
String MBpollerChanged "Failed Modbus UID [%s]" <tools> {autoupdate="false", expire="20m,state= "}

I send commands here, rather than state updates, so that a flurry of incidents in quick succession will be handled in a nice queue and not overwrite each other.

That means we can use the unused Item state to hold any message we like, if we stop autoupdate working on the command. We can also use expire to auto clear the message if updates stop (i.e. errors stop).

The Thing status catcher

I stole this from here, thankyou Yannick @ysc

As mentioned earlier, ordinary DSL rules do not offer neat ways to deal with status changes from several Things. NGRE rules do - but the PaperUI Experimental rules editor still hides the useful methods we want to use here.
What we can do is use the NGRE rules importer feature to get round that. We write the rule in JSON in a file, and then import into openHAB rules.

So make a file mbthingstatus.json and put it where you like. We only need to import it once, but you might want it again for a later reinstall or something. I chose to put mine in conf/scripts/

[
    {
        "uid": "MBpollerStatusChanged",
        "name": "Modbus poller thing status changes",
        "description": "",
        "visibility": "VISIBLE",
        "enabled": true,
        "triggers": [
            {
                "id": 1,
                "label": "When a ThingStatusInfoChangedEvent is received",
                "configuration": {
                    "eventTopic": "smarthome/*",
                    "eventSource": "",
                    "eventTypes": "ThingStatusInfoChangedEvent"
                },
                "type": "core.GenericEventTrigger"
            }
        ],
        "conditions": [],
        "actions": [
            {
                "id": 2,
                "label": "execute a given script",
                "inputs": {},
                "configuration": {
                    "type": "application/javascript",
                    "script": "// js poller thing status change reporter
                        var thingid = event.topic.split('/')[2];
                        // examine events for modbus:poller:xxx things
                        if (thingid.split(':')[0]=='modbus' && thingid.split(':')[1]=='poller') {
                            var myjson = JSON.parse(event.payload);
                            if (myjson[0].status=='OFFLINE') {
                                // pass to DSL rule via designated Item
                                events.sendCommand('MBpollerChanged', thingid.toString());
                                print('Modbus failure ' + thingid + ' ' + myjson[0].status);
                            };
                        };"
                },
                "type": "script.ScriptAction"
            }
        ]
    }
]

Clever folk could probably do more work in this rule, but I have chosen to pass the info to an ordinary DSL rule.
I would have thought we could filter for modbus things in the "eventTopic": "smarthome/*", line but that doesn’t work for me, so the filtering is done in javascript code.

The file needs importing by typing in the karaf console

smarthome:automation importRules /openhab/conf/scripts/mbthingstatus.json

Disable and re-enable Bridge Thing

So I did this in an ordinary DSL rule

var Integer slowRetry = 10  // minutes for Modbus recover attempt

rule "Modbus poller failure"
when
	Item MBpollerChanged received command
	// command contains poller Thing UID
	// issued by NGRE rule when Thing status changes to OFFLINE
	// note that can happen multiple times for one incident
then
	logInfo("MBerrman", "poller went offline " + receivedCommand)
		// fetch poller status
	val pollerjson = sendHttpGetRequest("http://localhost:8080/rest/things/" + receivedCommand , 3000)
	if (pollerjson !== null) {
			// extract owning tcp/serial bridge uid
		val slaveuid = transform("JSONPATH", "$.bridgeUID", pollerjson)
			// log it
		MBpollerChanged.postUpdate(slaveuid)
			// check if bridge is online still
	    val slavestatus = getThingStatusInfo(slaveuid).getStatus()
		logInfo("MBerrman", "slave " + slaveuid + " status " + slavestatus)
		if (slavestatus.toString == "ONLINE") {
				// set the TCP/serial bridge thing offline, not the poller
			logInfo("MBerrmgr", "Disabling slave " + slaveuid + " for " + slowRetry.toString + "mins")
			sendHttpPutRequest("http://localhost:8080/rest/things/" + slaveuid + "/enable", "application/json", "false")
				// now schedule a future retry
			createTimer(now.plusMinutes(slowRetry), [ |
	    		logInfo("MBerrmgr", "Setting slave ONLINE for retry " + slaveuid)
				sendHttpPutRequest("http://localhost:8080/rest/things/" + slaveuid + "/enable", "application/json", "true")
			] )
		}
	} else {
		logInfo("MBerrmgr", "OOPS disaster fetching REST for " + receivedCommand)
	}
end

This will disable the Modbus slave (TCP/serial thing) that owns a failing poller, and then try to re-enable again in ten minutes. If it fails again, we will go around the two-rule loop again.

You can display a line on your sitemap

Text item=MBpollerChanged visibility=[MBpollerChanged!=""]

Note that there are a couple of limitations to this method.

If the device is broken at system boot time, the binding will likely have already polled and failed before your rules are available. There won’t be a Thing status change to trigger management when the rules are ready.

If the system is closed down while a device is broken, the Thing OFFLINE (DISABLED) setting will be “remembered”, and poll attempts will not resume upon system reboot because the timer to do that was lost.

1 Like