Lately, my z-wave network has become a bit unreliable.
Essentially, sometimes commands are not sent, and I have no much idea where the problem is.
I have a presence rule in OpenHab, stating that on door open (node 42) or motion (node 14) then light should go on (node 8). Switching off the light is a bit more complex, and involves some few steps and timers.
Most of the time, the setup works flawlessly; but then, every now and then, it “stops working”, with delays of even 10-20 seconds.
This morning, for instance, I had the issue in the screenshot.
What really strikes me is the huge time difference between the COMMAND RECEIVED and the actual transmission.
Is this something related to the binding/OpenHab? Or to the controller? Or to something else?
Note that few weeks ago, I bought a second controller as a backup (still the same, Aeon Stick Gen5, but with a slightly newer firmware version), and I started using it after copying the data.
The very first line in your screenshot is complaining about node 65 not getting an ACK after 7297 ms. Without looking at the full log file, this would suggest there might be an issue communicating with node 65 (which appears to be a battery-powered device).
Moving the controller will change all the routes that have been discovered. If you have newer devices, everything should sort itself out eventually, but Z-Wave performance may be slow in the meantime. Older devices may need a heal to get communications working again. The network wide heal currently has some issues. I also have mine disabled due to it making my network completely unusable. If/when it is needed, I heal individual devices.
How many nodes are in your network and how many are mains powered vs battery?
I have about 40 nodes, 17 of them are powered – in every room there there is at least one powered node. All of the nodes should be z-wave plus, with the exception of the danfoss thermostats.
Since the movement I have healed every node; and since the self-heal was running every night at 2am, I would expect that now routing problems should be over.
The strange thing is that self-heal was always enabled; but the problems seem to be worse now.
I cannot say if this is because of the change in the stick (the new one has a newer firmware), because of a change in the location (before it was in the very center of the apartment), because of some change to configuration, or because of a change in OH (honestly, I tend to exclude this last option).
I have found , in my experience, routing does not always occur as you would expect.
I have obtained a UZB3 stick with Zniffer software to investigate my network further. Zombie or ghost nodes can cause havoc too.
Yes, at a certain point in past I had a ghost node, and it was causing a lot of troubles.
Then I discovered how to remove it, and for a couple of months I was in (z-wave) bliss.
But now there are no ghosts, and everything seems OK from that point of view.
I am taking your advice, and I have just ordered a UZB3 stick.
Hopefully, that knowledge will help in debugging the issues I am currently facing.
That said, it’s quite sad that to get a reliable z-wave mesh you have to buy such type of equipment, and get into such level of detail.
If I won’t be able to solve problems, I might decide to go back; but I would like to do whatever possible to maintain in the current place.
What do you mean with fixing?
After moving the controller, I have healed every node manually, and on top of that, the nightly healing should still do its job. Is there anything else I should do to improve the situation?
Too late, I had already ordered it when I read your post.
Now, I want to think that this is still a good idea, since I wanted to keep the older zstick around as a backup. Please don’t tell me otherwise
After a restart it will normally be slow. This is because the binding interrogates all the devices during startup (this is called initialization). Initialization can take some time, especially if you have a large network. It can take even longer for the initialization to fully complete when you have battery-powered devices, as they need to wake up before they can be interrogated by the binding.
The command I referenced in the post is only meaningful when things are in steady state. Running this soon after a startup will not yield meaningful results.
Yes, during startup the system is slow; and that’s expected. OTOH, you might see that as a load test.
And that’s not so far from certain situations I have seen: the network works OK, but then I have many guests, quite a lot of updates from all sensors at the same time, and typically something stops working.
The problem I have is that most of the time the network is in a steady state; but issues rarely happen when the the system is in a steady state.
I have followed some of the advice found in this topic and in the forum.
One node battery was dead, so the node was not responding. I have replaced it. Note for the future: the battery information is not always reliable, so I should have some other way to check offline nodes.
Reduced the amount of traffic on the network. Either by increasing the wake-up interval, but also by limiting the amount of data reported back (e.g. energy consumption for some switches).
Stopped healing, and enabling only when adding new nodes.
Now, it is a bit too early to celebrate a success, but it looks like the situation already improved a lot.
Finally, the system started to react consistently fast and reliably.