Z-Wave network often goes intermittently silent for >10 minutes

I am running Openhab 4.3.3. on Raspberry Pi 4B, installed through Openhabian. I am running z-wave network with about 50 devices (all mains-powered, including sensors with an adapter each). The controller is an Aeotec Stick Gen 5.

I have a strange problem that started happening a few weeks ago. My z-wave network goes completely silent for extended periods of time before awakening again intermittently. It is more like very active for 2-3 minutes, then silent for 10-20 minutes before waking up again for those 2-3 minutes. I have read multiple blogs and carried out the steps as suggested in this blog.

I removed all the dead nodes (as far as I could identify), but that didn’t seem to help at all. In fact, I had a feeling that every time I plugged the stick out and put it back in the RasPi, it only became slower and more unresponsive.

I was wondering whether the ‘age’ of the controller could have something to do with it, since I am running it for over 7 years now. So this morning, I backed up the eeprom from the old stick and restored on another Gen 5 stick that I had lying around. Unfortunately, while on the Simplicity Z-wave PC Controller, devices seem to work like a charm (I could only test some lights with 'basic on/off feature, since I didn’t know how to send shutter commands in PC controller), when I put the stick back in RasPi, things were pretty much the same as the old stick.

As you all probably know, on RasPi 4B, I am connecting via a USB Hub. Could that be a problem and ignoring/dropping packets to my controller?

I have enabled the z-wave debugging on RasPi an placed a snippet here. As can be seen at times the network is very active (I have a 3-phase smart meter that sends the update values every few seconds), but at other times it is completely silent. I tried using the log viewer, but I couldn’t really make much sense of the information there. One thing I noticed is that the queue seems to about 40-50. Is that normal/expected?

I also noticed that in some forums/blogs one could see the latency of communication. In my log, I am unable to see any such thing. Should I enable some thing for that?

Thanks a lot everyone!

recent_Zwave_log.txt (813.2 KB)

For a start I would suggest disabling the nightly heal and wait a few days or restart OH after disabling.

On the meter; Cut back on the meter reporting to just what you need. Do not report as percent change, only based on time. Also are the endpoints (some are negative AC) needed, don’t you just want the total. Lastly what is that device?

The lack of latency information is because a lot of commands are failing. And no a 40-50 command queue is not normal.

Thanks for you reply and suggestions!

I have disabled the nightly heal. I will let it run for a while for now, and see if it already helps. Every time I restart OH, it seems to take even longer for the Z-wave network to somewhat stabilise.

The device is a Qubino 3-phase meter. I have reduced it to report changes only every 5 minutes or a change of 10%. Indeed, I only need the total values, but there is no parameter to disable reporting individual channels. I must add that until about two months ago, OH was working quite alright with 1% change reporting.

What can I do to fix it? Or What could be the reason for such large command queue?

I don’t know. It was pretty jammed up in the log snippet, so I don’t know how it got there. Let’s see what happens and then run another log in a few days and see what the queue looks like. Longer term if you do restart OH (or just the ZW binding), switch the ZW to Debug first. Whatever is happening might be right at the start.

Thanks for the tip. I have set the log level to Debug using Karaf console at the moment. Does this setting get preserved through OH/RPi restarts?
Edit: I found in the documentation that indeed, the settings are persistent.

Not to say it’s the problem, but the battery in my Gen 5 stick died after about that long. Two replacements were only $5 or $10 but you’ll need a small soldering iron to replace it. It’s a long shot, but maybe?

That’s indeed something that I considered and that’s why I transferred all the settings to another stick with the exact same configuration but which wasn’t used. Just to rule out ‘aging’ problem.

Both sticks with the same firmware seem to behave similarly erratically. I am wondering whether openhab has some settings of old ghost nodes stored somewhere that are causing the issue.

Nodes are on the zstick itself in NVM. You could do an inventory of the nodes in OH including any ignored in the inbox and what you have physically in the house.

I think most of the problem is node 86. The silence is because node 86 is pummeling the controller with meter updates. The other nodes are blocked from completing the heal, but we’ll see.

Edit I did notice the device default % change was 50 and the default report interval was 600. The OH poll interval for a powered node should be at least 86400, not 1200, especially since you are getting reports at least every 600 seconds anyway. I would still vote to disable the % parameter completely

Thanks for the inputs.

I decided to upgrade the firmware on one of the sticks and plug that one in.

That seems to have fixed the problem. The reaction is very fast.

I also increased now the reporting change to 20% instead of 10% to reduce the traffic further.
As per your advice, I have also increased the poll interval to 86400.

I will keep monitoring it for a few days and report any strange behaviour.

Indeed, I did an inventory check and there are about two nodes that are off. I should probably remove them and fix but like they say ‘don’t fix it if it ain’t broken’. =) So as long as it is working at 95%, I will leave it for now. When it’s time to take out the stick the next time, I will take care of the two ghost/duplicate nodes.

1 Like

Keep the network heal disabled. Ghost nodes could have been why the ZW queue was so large (a lot of messages are queued in a heal and may have nowhere to go). Changing the stick to do the firmware upgrade cleared the queue. It could get slower over time with ghost nodes, but maybe it will be ok

So far I have the heal disabled, and it all seems to be working properly. Thanks for all your inputs.

1 Like