Zwave mesh network instability

My zwave network has become sluggish and unreliable, looking at the 2 screenshots you can see that before a reboot (bottom image) the network doesn’t look great, after a reboot it looks better (top image). Any ideas why this would be?

Neither screenshot show what i would consider to be a mesh network, does the look right though for zwave?

Running windows 10
OH 2.5.0


Funnily enough I have just posted this

which may explain why there is more to it than this diagram. It has its uses but as you point out it is not the mesh.

I still have a few edits but it is close to finished.

Thanks for the link…

Is it right that a network nominates a node to be secondary?
In my first screengrab above, node17 seems to be nominated in this way?


No that is not a concept relating to slave nodes. You can add a secondary controller but that is very different.

So now you have read that you understand the binding is not controlling the routing. If you have issues with your network they are not caused directly by the binding.

So you have to ask what in your network has changed not what is the binding doing that has caused this.

Any ideas what has changed in your network? Any configuration changes? Any nodes moved or added?

hmmm, every now and then the 2 nodes in the garden shed just stop responding and appear as orphaned on the network map. I plug them in in the house for a few day and they sort themselves out and then i put them back in the shed again…till next time

Are you running a regular heal?

nightly, 2am

i wonder why a reboot of the windows server helped my network though?

Possibly because the Heal until today had an issue that left threads hanging that were only released when the server was restarted if it failed.

Chris posted it was fixed a few hours ago.

Now you may or may not gain from running heal every day but unless there are big changes in your network you may be better not to but moving two nodes into the house and back is not great.

It might be that the heal and the moving nodes is causing you more problems than it solves.

1 Like

Marvellous, thanks :grinning: :grinning:

Read through my other posts. I have a few more edits to make but after you have you might want to dig deeper.

As you are on windows 10 if you want a true understanding of your mesh you could just install the PC Controller program from silicon labs and look at the actual network topology.

You can also run a network health check.

i’ll do that now, thanks

This is (or was, actually) known to NOT (fully) work so most people had it disabled.
@chris just fixed an issue. I suggest to install that before proceeding.
You would need to install latest SNAPSHOT, though, to get it.

I doubt the fix resolved all the issues with the heal. That fix was focused on one very specific problem with the node neighbor update transaction.

Edit: I’m not suggested that’s what you were saying. Just wanted to point out that issues still remain. :wink:

My understanding is the fix was to extend the time system listens for healing/update responses.

1 Like

Not exactly. The specific issue that was identified was related to node neighbour updates, but the fix was to resolve the way that timers are used when waiting for responses. This could impact other transactions where there is a non-default response time used (although I don’t recall if there are others or not).

1 Like

Ok. Fair point. I don’t know if there are other transactions that don’t use the default. However, back when I was looking closely at the heal process, I don’t recall seeing any transactions other than the neighbor update that had long response times.

No, I don’t believe this is accurate. The fix is actually broader than this (involves the timeout on all requests), but only affects transactions that don’t use the default timeout setting. @chris if I have this wrong, please correct as I’d like to be clear about the scope/implications of the fix).

You are correct.

Most transactions use a standard timeout, so aren’t affected.

ZWave transactions using the SerialAPI are quite complex (unfortunately). There are different types of responses for different requests - it feels a bit like it was designed by experimentation rather than engineering :frowning: . For the new binding this is now completely re-written and I think it will also fix some of the CAN errors that people see (but that’s another discussion all together).

1 Like

Did a quick search of the code base (searched on withTimeout). There were only 2 results that explicitly set the timeout (RequestNodeInfo and RequestNodeNeighborUpdate). Could there be others that don’t use withTimeout to build the message?

The RequestNodeInfo is 25000 ms, which is interesting. Isn’t that request used a lot?