Lesson Learned: Put limits on the use of an external resource

So I went on a little weekend trip this past weekend. It was a nice little get away but I returned to problems. Occasionally my zwave/zigbee controller/coordinator Things would go offline. I suspect it’s a problem with the hypervisor but it doesn’t happen enough to have made it worth investigation yet.

However, in planning for investigation, I’ve create a simple rule to notify me when either goes offline. Since it’s the same HUSZB-1, it’s the same device so I’d often get two messages at the same time when it does go offline. The notification is an email sent to and through my gmail account.

And here’s the problem. Almost as soon as I left the house, the controller/coordinator started to rapidly flap online and offline several times a minute. When I came home the lights were not on and when I finally looked at my logs I say that I had reached the daily limit in the number of emails gmail will allow to be sent. :scream:

BTW, the limit seems to be around 1000.

So, the lessons learned are:

  • even for temporary rules, put a limit on the number of resources it can use
  • don’t forget about these temporary rules and maybe disable them when you go on a trip.

For the curious, here’s the rule since adding the rate limiting and a debounce:

triggers:
  - id: "1"
    configuration:
      thingUID: zwave:serial_zstick:zw_controller
      status: OFFLINE
    type: core.ThingStatusChangeTrigger
  - id: "2"
    configuration:
      thingUID: zigbee:coordinator_ember:zg_coordinator
      status: OFFLINE
    type: core.ThingStatusChangeTrigger
conditions: []
actions:
  - inputs: {}
    id: "3"
    configuration:
      type: application/javascript
      script: >
        var OPENHAB_CONF = java.lang.System.getenv("OPENHAB_CONF");

        load(OPENHAB_CONF+'/automation/lib/javascript/personal/alerting.js');

        load(OPENHAB_CONF+'/automation/lib/javascript/community/rateLimit.js');

        load(OPENHAB_CONF+'/automation/lib/javascript/community/timerMgr.js');


        this.tm = (this.tm === undefined) ? new TimerMgr() : this.tm;

        this.rl = (this.rl === undefined) ? new RateLimit() : this.rl;


        var sendAlertGen = function(rl) {
          return function() { rl.run(function(){ sendAlert("The Zwave Controller or Zigbee Coordinator is OFFLINE!"); }, "24h"); }  
        };


        this.tm.check("zigbee", "1m", sendAlertGen(this.rl), true);
    type: script.ScriptAction

timerMgr.js and rateLimit.js can be found at https://github.com/rkoshak/openhab-rules-tools. Prior to this the rule had just the import and the one line call to sendAlert. An email was sent every time the rule triggered.

2 Likes

This is a question about your zigbee things going offline.

We have the same zigbee stick with the same flashed firmware.

I have 19 zigbee things. On OH startup, they all go online
4 go offline after some period of time.
restarting Openhab repeats this behavior with the same 4 things going offline.
The 4 Things are Leviton Zigbee dimmers. I have 13 other Leviton zigbee dimmers which never display this issue.

Did you work out your zigbee thing going offline issue?

It’s the stick that goes offline, not the Things individually. Well, when the Coordinator thing does go offline all the other Zigbee Things go offline too but the root cause of that is the Coordinator. So the behavior I saw is different.

I never had a chance to look into the root cause and suspect it had something to do with Docker. Docker has updated a couple of times since the last time I’ve seen the problem so until it occurs again I’m going to blame a bug in Docker as the root cause.