There is a lot of misunderstanding about z-wave routing.
This is a very simplified bitesize description that avoids explanation of how the routes are established and managed, how preferences are defined and the types of attempts at communication that are made to get a command or acknowledgement through. It deals with the simple and ideal source route example.
All of the clever things z-wave does to get the message through will be described in a later post.
Unlike routing in networks that some people may be more familiar with there are no routers in a z wave network that know where to send every message. Intermediate nodes are just repeaters. They listen for a reference to their Node ID in the route description. If they are not the end node all they do is shout the message out so the next node in the list can pick it up if it is in range. The repeater has no idea if the next node is in range or not. The source node has built the route to be used from information stored as a return route (slave device) or in the routing table (controller device). The underlying layer does inform the repeater if the message was received but that is as far as it goes. If the source asked the repeater to do it again and it was still not possible the repeater would just blindly try and try again.
This is why if you move or remove a node it is important that the controller and any slaves that reference that node have their routing information updated. The primary way to do this is a network heal.
The next thing to understand is that the controller application or binding is on the far side of a serial interface with a seperate API that has its own set of ACK, NAK and CAN for communication with the controller proper. It is possible with this API to get the routing table from non volatile memory in the controller but that is not what the neighbours list diagram in openHab is. Do not think that by looking at this diagram that you can tell what routes will be used.
The diagram in openHAB tells you nothing more than at -6db the nodes during a heal nodes reported back they were neighbours of the nodes they have lines to. It is neither a full list of neighbours or contains any information that helps you understand that actual routing. It gives you a list of nodes that may be able to be part of a route no more.
If you want to see the topology table as an openHAB user currently the only way is to connect a different program to your controller. The easiest one is the PC Controller Program that is a free download from Silicon labs.
Using the PC Controller Program you can also check some of the stored return routes stored on nodes. This will be described in a future post.
This is sort of useful but this gives you the static information that is stored in the nodes and does not tell you how well the routing is performing. The only way you will see this is to get a zniffer and listen to your network
A zniffer is nothing clever. It is just a zwave USB stick with a special firmware. When used with the zniffer application you can see all of the communication in range of your zniffer.
It is just a passive listener and will only receive messages from nodes that are in range. For this reason it is best used in a laptop and currently has only a windows application.
This is a zniffer capture of a simple command to turn on an endpoint on a device with multiple endpoints.
Remember what I said about zniffer being a passive listener that hears what is in range when interpreting what I say. Also remember there is a raw layer of communication below this which has yet another layer of ACK that the zniffer firmware uses to know the command made it from one node to the destination. I refer to this ACK as “raw acknowledged”
The controller (this has nothing to do with any information in the binding) has calculated from the routing information held in the controller and defined the route to Node 4 is 1 → 107 → 2 → 4.
Looking line by line.
-
The first line is the command from Node 1 (my controller) raw acknowledged by node 107
-
The second line is the repeater 107 reshouting the command and raw acknowledged by node 2.
-
The third line is the repeater 2 reshouting the command and raw acknowledged by node 4 the destination.
The device node 4 then immediately replies with an ACK using a route defined in memory in Node 4. This ACK just says the node has the command not that it has acted on it.
It has used in this case a reverse of the outbound route but this is not always the case as the routing table and the return route are independently managed.
-
The fourth is the ACK from node 4 and raw acknowledged by node 2
-
The fifth is the ACK reshouted by node 2 and raw acknowledged by node 107
-
The sixth is the ACK reshouted by node 107 and raw acknowledged by the destination
-
The last just confirms the ACK made it from 4 back to 1
We have success and as long as the device has done what it was asked to do it has switched on.
Clearly there is no guarantee that the device has switched but as long as the device is well designed and functioning correctly it will have.
Next a report going from a device to another device. In this case a device reporting a change to the controller.
The device has the route to the controller stored as a return route. It would have the route to any other associated devices also.
Here is the zniffer trace.
In this case only one repeater but you can see a good return with ACK from the controller.
Now lets see the first level of retry that is in the raw communication level as zniffer shows it.
So why the difference?
What you see here I believe is a retry at the base level. What I referred to as the raw acknowledgement earlier was not received or some other issue with the communication. I find it hard to understand/explain the lines in zniffer but I believe the transmission was retried and succeeded.This can be very confusing as in the trace it looks like the ACK made it first time but what zniffer picks up from the radio signal is not exactly what the nodes see. It is likely in this case that while zniffer picked up the raw acknowledgement, for some reason the node did not in a timely way so th ACK was sent again.
So this is the same route working perfectly and not long after with a minor glitch that recovered. These low level issues could be looked at with a lower level tool but they are generally transient and if they are not they show as issues in the higher levels.
That is the first level of resilience in the zwave network.
Next article on how zwave routing tries to get the message through.