Z Wave Routing Basics

Tags: #<Tag:0x00007f5c9ee02c28> #<Tag:0x00007f5c9ee02ac0>

There is a lot of misunderstanding about z-wave routing.

This is a very simplified bitesize description that avoids explanation of how the routes are established and managed, how preferences are defined and the types of attempts at communication that are made to get a command or acknowledgement through. It deals with the simple and ideal source route example.

All of the clever things z-wave does to get the message through will be described in a later post.

Unlike routing in networks that some people may be more familiar with there are no routers in a z wave network that know where to send every message. Intermediate nodes are just repeaters. They listen for a reference to their Node ID in the route description. If they are not the end node all they do is shout the message out so the next node in the list can pick it up if it is in range. The repeater has no idea if the next node is in range or not. The source node has built the route to be used from information stored as a return route (slave device) or in the routing table (controller device). The underlying layer does inform the repeater if the message was received but that is as far as it goes. If the source asked the repeater to do it again and it was still not possible the repeater would just blindly try and try again.

This is why if you move or remove a node it is important that the controller and any slaves that reference that node have their routing information updated. The primary way to do this is a network heal.

The next thing to understand is that the controller application or binding is on the far side of a serial interface with a seperate API that has its own set of ACK, NAK and CAN for communication with the controller proper. It is possible with this API to get the routing table from non volatile memory in the controller but that is not what the neighbours list diagram in openHab is. Do not think that by looking at this diagram that you can tell what routes will be used.

The diagram in openHAB tells you nothing more than at -6db the nodes during a heal nodes reported back they were neighbours of the nodes they have lines to. It is neither a full list of neighbours or contains any information that helps you understand that actual routing. It gives you a list of nodes that may be able to be part of a route no more.

If you want to see the topology table as an openHAB user currently the only way is to connect a different program to your controller. The easiest one is the PC Controller Program that is a free download from Silicon labs.

Using the PC Controller Program you can also check some of the stored return routes stored on nodes. This will be described in a future post.

This is sort of useful but this gives you the static information that is stored in the nodes and does not tell you how well the routing is performing. The only way you will see this is to get a zniffer and listen to your network

A zniffer is nothing clever. It is just a zwave USB stick with a special firmware. When used with the zniffer application you can see all of the communication in range of your zniffer.

It is just a passive listener and will only receive messages from nodes that are in range. For this reason it is best used in a laptop and currently has only a windows application.

This is a zniffer capture of a simple command to turn on an endpoint on a device with multiple endpoints.

Remember what I said about zniffer being a passive listener that hears what is in range when interpreting what I say. Also remember there is a raw layer of communication below this which has yet another layer of ACK that the zniffer firmware uses to know the command made it from one node to the destination. I refer to this ACK as “raw acknowledged”

The controller (this has nothing to do with any information in the binding) has calculated from the routing information held in the controller and defined the route to Node 4 is 1 -> 107 -> 2 -> 4.

Looking line by line.

  1. The first line is the command from Node 1 (my controller) raw acknowledged by node 107

  2. The second line is the repeater 107 reshouting the command and raw acknowledged by node 2.

  3. The third line is the repeater 2 reshouting the command and raw acknowledged by node 4 the destination.

The device node 4 then immediately replies with an ACK using a route defined in memory in Node 4. This ACK just says the node has the command not that it has acted on it.

It has used in this case a reverse of the outbound route but this is not always the case as the routing table and the return route are independently managed.

  1. The fourth is the ACK from node 4 and raw acknowledged by node 2

  2. The fifth is the ACK reshouted by node 2 and raw acknowledged by node 107

  3. The sixth is the ACK reshouted by node 107 and raw acknowledged by the destination

  4. The last just confirms the ACK made it from 4 back to 1

We have success and as long as the device has done what it was asked to do it has switched on.

Clearly there is no guarantee that the device has switched but as long as the device is well designed and functioning correctly it will have.

Next a report going from a device to another device. In this case a device reporting a change to the controller.

The device has the route to the controller stored as a return route. It would have the route to any other associated devices also.

Here is the zniffer trace.

In this case only one repeater but you can see a good return with ACK from the controller.

Now lets see the first level of retry that is in the raw communication level as zniffer shows it.

So why the difference?

What you see here I believe is a retry at the base level. What I referred to as the raw acknowledgement earlier was not received or some other issue with the communication. I find it hard to understand/explain the lines in zniffer but I believe the transmission was retried and succeeded.This can be very confusing as in the trace it looks like the ACK made it first time but what zniffer picks up from the radio signal is not exactly what the nodes see. It is likely in this case that while zniffer picked up the raw acknowledgement, for some reason the node did not in a timely way so th ACK was sent again.

So this is the same route working perfectly and not long after with a minor glitch that recovered. These low level issues could be looked at with a lower level tool but they are generally transient and if they are not they show as issues in the higher levels.

That is the first level of resilience in the zwave network.

Next article on how zwave routing tries to get the message through.

14 Likes

A typical z-wave diagram from openHAB.

It is interesting that this map shows 16 is a neighbour of 16 which is an odd anomaly but you could have islands of nodes and all sorts of things that look odd. It may or may not indicate an issue.

The map is a nice adornment but tells you little you should worry about. None of the data in it is ever used for routing.

A section of topology from my network.

Again it is interesting and if there were any nodes with no neighbours it would be a concern but even with many neighbours routing may not be good. It shows no indication of signal strength.

The actual routes that have been calculated for each node by the controller are also not identified. The controller will hold up to 2 calculated routes for each node.

Slave devices have a small number of calculated routes sent to them on inclusion and on heal by the controller. The maximum number will be 4 per node that has an association to the salve device.

This is a two node network with controller and one slave that shows the simplest case

the blue squares indicate the nodes are neighbours. 1->2 and 2->1

TX and RX paths from device to devices are not necessarily symmetrical.

The most robust network would have all nodes in a good direct range of the controller. As this map gives no idea of the quality of the link you may see a direct link but it may not be used as the preferred route or indeed ever used with the routing algorithm in the controller preferring another route.

This only shows the topology as the controller sees the network. Even this will not show all possible routes.

Devices do not have a copy of this.

Any rows or columns that are all red are an issue. A resilient network would have 4 or more blue squares in every row and column but the more the better as some of those blue squares may not be very good links.

You have shown a few maps like that. how do you read them?
I have a longer wait for my UZB-3 stick because they shipped me the wrong one :frowning:

That is a pity.

You do not need zniffer for this. It is in PC Controller. Just put your controller in and reload. It is read from the NVM in the controller.

It has all of the node numbers along the top and down the left There is a white line on the diagonal as this is the intersection of the node with itself. The chart gives a box representing each node beside every other node and highlights those that are neighbours.

Give it a try.

The best tool is the IMA tool inside PC Controller. This will test every node has a route to the controller or route between nodes to test if an association other than lifeline is OK.

It grades every route.

I cannot easily do that with my production controller. I may experiment with my test one later.

What controller do you use? Is it one of those ZooZ?

My Production one is a Zooz but it is mapped through to a Hyper-V VM. My test one is a HUSBZB-1

How do you backup your Zooz? If you are lucky and it is standard you can back it up with the PC Controller.

I do not back it up but access it through a Debian VM. I do not really have access to my son’s server to unmap and remove it easily.

Although there are a couple of production things this is still somewhat a hobby.

If you get a chance it is worth backing up the NVM. Understand it is difficult but it is nice to have a backup so if it does fail it is $30 stick and not days of adding nodes back in addition.

One thing that can not be retrieved from the controller is the strength of the signal between nodes so it is hard to know the margin that the route may have.

While you can see the report in zniffer during an add or change of routing due to heal or explorer it is not made available from the controller but is collected and you would assume used in the controllers’s routing algorithm.

This is actually available on new controllers and I plan to add this in the new binding, but it won’t be available in the current version - sorry.

Thanks Chris

Even 2.5.2 has fixed a few little niggles. Every little release makes it all a bit smoother.

The extended slave stats would be great so people understand they have a load of retries in their network and the latest working route.

Do you have to expose a network health check screen to get certification? Nice functionality but a lot of work. Would openHAB interfaces even support?

Yes, there are some IMA requirements that need to be covered.

Well, this is unfortunately the big question. ZWA has requirements on the UI, and currently I don’t think OH can support them in the core. I’ve created a PR for some notifications which is one of the requirements, but I need to wait to see if anyone responds.