Z-wave message timeouts

I have had a problem with z-wave since time eternal. It’s normally not really a problem but it’s finally bugging me since I need to be able to send a few switch commands in quick succession.

Currently, if open basicui or phone app/whatever and say change a dimmer from 90->10->90->10 a bunch of times in a row they’ll start timing out. Doesn’t matter which dimmer, just pick one and it’ll start timing out after a few iterations. As long as I don’t do that kind of thing its no problem, but I have a few scene controllers now and sometimes there’s multiple messages sent and it’s affecting my ability to use these switches.

I have a debug log available. No missing nodes, no failed notes. Nightly heals at 2am.

Example of one of the dimmers having timeouts

computer is plenty fast enough, something like a 3rd gen i5 and 8gb ram. aeotech gen 5 stick

House is drywall and 2x4s. Nothing is more than say 50 feet from the stick
What other information can I provide?

There can be only one message requiring a response outstanding to a node at one time. The only full round trip visible in your log is callback 71 and it took nearly a second. It appears the previous TX (70) took 5.5 seconds and callback 72 TX is not yet complete after 4.079 seconds. So it appears to me, despite thin walls and no zombie nodes there is a issue there. If you make changes faster than the network can process them, it just going to build in the queue. Time out for a message is usually 5 seconds.

Hope this helps some.

Bob

So i should be focusing on the first timeout since after that it will just naturally get worse.
That was just a snippet. Here (hopefully) is the full log with just node 34.

Still kind of hard to read, but the device seems to be sending multiple reports with the same info. Some commands take so long the OH marks them as offline. What is the device and can you adjust the parameters to suppress reports?

Bob

if you click on it and download it and zoom in you can read it.
Anyway, that’s a ZEN72, a 700 series Zooz paddle dimmer. It has these options:

It looks to me like your use case (haunted house lighting?) creates a denial of service attack on your zwave network. Recognize that once the controller gets a command message it will literally flood the network with messages (one after another) at different speeds and different routes until it gets an “Ack”. The device will also keep sending reports at different speeds and paths until it gets an “ack”. A Zniffer would help verify, but it looks like the controller and device keep sending messages, but neither is “acking” the message of the other because they are too busy sending messages. This may not be a technical correct answer, but it is what it “looks” like.

I’m not a radio transmission expert, but I have read in the forum that more that 10 frames a second might be a zwave limit. I have 47 nodes and run about 10 frames a minute, suppressing command polls, redundant device reports and avoid using percentages as triggers (a killer with stray readings around zero)

Anyway if you post the actual debug log file (.txt) unfiltered, unprocessed) I will take a look to see other devices are contributing to the congestion or see anything else chatty, but I’m not promising anything.

Bob

Thanks Bob. I have 25? 30? Not haunted hause, its just that that’s a sure fire way to create the condition where things go to hell. I have several energy reporting nodes that I can look into as well to supress their readings more (i’ve gone through them in the past to keep them quieter but more may be needed).

I would look into a zniffer. It is very enlightening about traffic. My only issue is that after you fix everything there is nothing to do (OH & Zwave-wise) but try to help people on the forum :wink: I thought about converting back to a second back-up controller, but you never know when a need might arise.

Bob

barring getting a zniffer for now i’m going to borrow another z-stick from amazon and just put a few devices on it and see how it reacts. perhaps my expectations are the problem and this may be a way to determine that.

second z-stick installed and paired it with one of the dimmers. i can play the scale like a piano and it handles it like a champ. so, i guess i’m going to have to go device by device until i figure out which one is causing the problem.

Just out of curiosity, I ran zniff with a dimmer(Jasco) and a switch (Zooz) and switched them quickly back and forth and they seem to handle it with an “ack” in the low msec range. So you are probably right about it being related to a specific device or some network congestion.

The Jasco dimmer is only Zwave basic so the communication is at 40kb, versus 100kb for the Zooz. Command poll is disabled, the Zooz responds with a “report” after the “set”. Although the zniffer does not show all the binding Debug information, it is real time since it runs on the PC and not the Rpi3b/OH.
Bob


1 Like

I have some Dome brand smart plugs which I kind of suspect as being the problem.

My network consists mostly of Zooz, Inovelli, and Aeotec with some other randoms sprinkled in

As noted above, stray watt readings on an “off” plug are a problem, if you have reports that run with a percentage change.

Also I have replaced all my Dome motion sensors with Zooz as they are way too chatty.

Bob

ok… why would you do this?

Is there any legitimate reason you would want to do this?

So you are stress testing the network? I guess you found it’s limits

What exactly were you expecting to happen? I’m not trying to be a smart a$$, really asking

My only question is after you do this, and the system has a little while to recover, does it return to normal operation?

@apella12

Thanks for your help, Bob. You’ve helped me understand the energy update settings (not direcrly i suppose but got me pointed where i needed to be). A lot of this was set up ~4 years ago and its been working ever since.

Finally after removing the 3 dome switches, 2 aeotec nanos, and 2 inovelli lzw36 switches its responding really fast like I would expect. While I’m not sure which one was the exact cause yet, I’m getting there. Of those 7 devices none ever seemed particularly chatty on the network. The nanos are getting replaced by newer Zooz switches (and plus the nanos can be sold for more than the Zooz switches are and I’m not using the nanos to their capability). Then I’ll add the other stuff back and see what happens…
Thanks again Bob. I’ll probably update this if I figure out the exact device causing the issue.