So, where can I find some documentation to interpret what’s going on here? I’ve done a test (in a different house, not in the one having problems) with same Simon Tech equipment. I simply clicked one time to turn OFF and another time to turn ON and this is the output:
I’d advise a methodical approach like your screen shot or you will get overwhelmed. Originally, I tried to keep a table, but I’m kind of analytical. I did split my network so most nodes were direct to the controller. The last column is the health assessment from the silabs tool. nodes.pdf (253.9 KB)
Overall, the test you did looks okay.
Starting from the left the speed is 40K. Zwave plus can get to 100K, but maybe these are older devices. The RSSI (signal strength) is okay but could be higher. If it was really low the speed could drop to 9.6K. The delta is the time between the frames, for 40K 10-20 ms is good. The routing from 19 to 1 requires one hop (13). The only glitch is lines 8 to 14. Node 13 had to send the meter report 3 times (60 ms apart) before it was acknowledged by the controller. It also seemed like it was acked twice, also about 60 ms apart. A minor note is you get a meter report with every switch. Is that needed? However, as noted above, no major issue. If they are all like that you will be fine.
In the “Association Groups” for the “1: Lifeline” I always have only the “Controller” as device. Since in the slower places, those are actually far away from the controller, should I also select here the nodes for the Repeaters I have closer to each slow device to “force” it to pass through that repeater instead of going directly to the controller?
While checking the network though, for the node 43 which is a switch far from the controller, I do see that, supposedly, this is already communicating with the node 20, which is a repeater mid way to the controller.
Now, for that node 20 to the controller, I do see a direct connection as well but all direct connections to the controller are unidirectional:
I will be able to physically be there in a few days and perform the Zniffer tests though.
As I want to be 100% sure where the problem is (independently of being from the devices, the network or the setup, I just want this solved once and for all), what methodical way do you suggest me to do and which steps in order to actually be able to create a proper report on this matter?
No. The node map is not useful for diagnoses. Also only the controller in lifeline is correct. The mapping is a result of the find neighbors command. The controller is excluded from that command because it either bogs down (500 chip) or won’t work at all (700, 800 chips), there will only be unidirectional arrows. On the controller UI page, properties, it should only show itself as a neighbor
I would suggest triggering the problem node(s) and see what the Zniffer shows and take notes about hops and speeds. If you disable the nightly heal (my recommendation), you could just try to heal the nodes that are slow with many hops. The nightly heal will scramble it up again, so would need to be disabled. Don’t be surprised if your repeaters are not used in any of the routes.
Thank you once again for all your help. I now know that the issue actually lies on the devices for some reason.
The distance between some devices and the controller is also not the best, as it’s far away. But there are dozens of devices and that is usually favorable because there are multiple routes to reach the controller with only 2 hops, even for the most distance device.
As I said before, even clicking the physical button sometimes does nothing, other times the device acts in a very weird way and there are even some physical buttons with some pins broken and malfunctioning because of that (since they are not correctly touching the back pins and so they are sometimes completely off when touched because they come out a little bit).
Those devices that go off, obviously are then a bottleneck to the network because if that was included in a certain route, then when some device tries to reach that node, it won’t be able to, causing congestion on the network.