[SOLVED] Z-Wave Network Heal.. when is it needed? does it always need to happen?

robmac · July 21, 2019, 12:40pm

In theory even if you add a node it should not be needed. Remove or move to a very different location will be more likely to cause issues.

When you add the node, neighbours and node are added to the table and any devices associated to your new device get their routes set.

The node and controller should sort out the last working route and all being good always use that. No heal needed.

I am just a bit against a full network heal as on larger networks certainly 100+ my experience and reported experience by others is bad. Fibaro released a firmware for HC2 that issued a 2.a.m heal every day and that is why I am now here not using HC2.

It night by night destroyed my stable 144 node network as it wiped out all of the LNR and caused a retry storm. When they turned it off the network did fix itself but I had already decided to leave by the nearest exit.

Bruce_Osborne · July 21, 2019, 12:46pm

It looks like there is a difference of opinions here based on difference in interpretation of the standards.
For the sake of us poor users, please do not state something as fact if it is just your interpretation & not undisputed fact.discussion among experts is good, but please be clear when responding to user queries.

Thank you very much for your caring to respond and your dedication.

robmac · July 21, 2019, 12:52pm

Is the full statement and is fact. If it does not all sort itself out when left to its own devices by all means heal but if it sorts itself out don’t do it for the sake of it. No opinion here but by all means use heal but you may well regret it.

Bruce_Osborne · July 21, 2019, 12:57pm

Fair enough.
I was just trying to sort our fact from personal interpretation or opinion.
Not singling you out at all.

I hope I did not offend you too much…

robmac · July 21, 2019, 1:00pm

Even Chris’s robust responses do not offend.

It is all healthy and we all learn. I am also not always very precise in my wording so if you do not understand my meaning please challenge.

Bruce_Osborne · July 21, 2019, 1:25pm

I currently have my small network (7 nodes?) set up to the default daily heal. I am just trying to learn from others as I slowly expand my network & my learning.

robmac · July 21, 2019, 2:21pm

My recommendation is turn periodic network heal off.

Use a controller with a reasonably recent version of Z-Wave and the SDK

Don’t use full network heal routinely when you add a node. When a node is added it will go through asking neighbours etc and the controller will add it into the topology. Unless the add fails for some reason you should not need to do anything.

If after leaving a good time the new node is not behaving well then try healing just that node. I always find it is best to let things settle before assuming it is not working. When you read that part of the developer guide you can see there is a lot that goes on and some of it can take some time to settle in my experience.

One thing I would say to anyone starting out and it is also in Chris’s guide. Start slow.

Plan so you do not need to remove or move devices. Those are the actions that may lead to needing to rebuild the network routing table.

After each change let things settle down and be sure the change has worked.

After your network is mature, I would recommend that you never add more than one node a day and only add the next when you are satisfied the last is good. Rebuilding 7 from scratch is not too bad re adding 140 is a lot of time and if things get really bad then that may be what you have to do.

The full network heal in my opinion is the nuclear option. If you have a big network it probably won’t help as it will escalate into armageddon on your network, if you have a small network it may work but don’t do it until you have checked for other reasons that things are not working. If you can work out from the logs that it is a particular device just treat that one it is a lot safer.

Bruce_Osborne · July 21, 2019, 2:24pm

Thanks.
I have a HUSBZB-1 I think @chris indicated it is old and cannot be updated.

robmac · July 21, 2019, 2:36pm

We are going off topic but as it will avoid people hitting that nuclear button, you are going to spend a small fortune on devices so spending a little on a controller that can be updated with the latest SDK prior to the Z/IP tied SDKs is probably a good investment.

I just use a sigma designs UZB-3. It is a 500 chipset and can take the latest version of the firmware prior to the 700/Z/IP firmwares.

You can buy either the static controller or the bridge version but you do need to flash with the static controller firmware before using with OpenHab.

They are relatively cheap and last time I looked places like digikey and mouser have them in one or other flavour. If you buy two you can turn one into a zniffer and that is the best tool anyone can purchase to diagnose issues. Make sure you buy the correct frequency for your area. Other frequencies will sort of work but not very well.

Do not buy UZB-7.

Bruce_Osborne · July 21, 2019, 2:45pm

Thanks for the advice.
I have just started “dipping my toes” into Smart Home and my stick did not lock me into Z-Wave, Zigbee, or a software platform while minimizing expense while I learn.

leif · July 22, 2019, 4:32am

Wow, this thread got interesting real fast!

I had never heard of Z/IP before. Interesting concept, not sure I understand what the purpose is. We’ve already had IP-capable Z-wave controllers forever, and must certainly easier to secure it than the all of the individual Z-Wave packets? Seems like it’d make it trivial to hack the lights from the home network. Has there been any more discussion of Z/IP here? I’d love to read it and learn more.

Am I covered with these ones? It’s the z-wave.me version. I think I have enough backups at least .

I’ve had auto network heal disabled for a few days now and things seem to be working fine. It’ll probably be a while until I need to add more Z-Wave devices, since I now (through openHAB, its plethora of bindings, and rolling my own with ESP microcontrollers) have several viable alternatives.

robmac · July 22, 2019, 6:16am

They well may be as z-wave.me does keep firmware releases up to date. Check the firmware and SDK versions and take a bit of advice which is the most recent stable.

That is a big advantage of being openHAB and at some point it might get a Z/IP binding.

Z/IP is a whole other game and a long story. At the most basic level it does not look to add a lot but when you play with it you can see some advantages and some disadvantages.

I can totally believe what Chris says that some companies will see it as too big a departure from what they are used to and go elsewhere. It would have to be another thread though or the focus of this thread will get lost completely.

In summary:

First read Chris’s guide about the binding.

then

You do not need to Heal all of the time and certainly not every night
Later SDKs are more resilient and have mechanisms that make healing less likely to be needed
Try to find the cause and treat that. Heal is not a magic solution that will fix everything. If a device is too far from the network it is too far and you will need to add nodes that connect it. If you have a device with bad firmware then healing will not fix that. If you have too many reports it will make the network less resilient and you will get nodes behaving strangely and dropping from the network etc.
If you decide nothing else is possible then heal but use individual device heal and do it one at a time with time to settle.
If you remove or move a node a heal is necessary to correct the routing table but even then just healing nodes that used to route through that node may fix. You may well find that the node will reconnect as explorer frames will find the node if supported by the device and controller.

The last point is an interesting one. With a Zniffer you can see what routes are being used and which routes are not stable. If you have issues this is the best way to diagnose if the logs do not yield the answer.

Full network heal is the nuclear option and only when nothing else works would be my position. It is that tempting quick fix as it sounds like it is the magic solution but it is not.

The routing table has a few purposes and needs to be correct for these things to work.

The neighbour topology is used as a backup mechanism when the primary routing mechanism fails. If the topology is wrong this mechanism can not work and will add no value. Having it correct will add resilience.

When setting a device to device association, I believe the table is used to set the initial two values in the return route. If these are incorrect associating devices may not work.

Adding a device should not need a heal and certainly not a full network heal. When the device is added it discovers all of the neighbours and it is added to the table. IF the add does not complete well then a level of heal may be necessary.

Bruce_Osborne · July 22, 2019, 8:26am

Newbie question here. Doesn’t both the controller and device need to have the new sdk to receive the benefits?
I believe many devices are not upgradeable. You are then basically restricted to the lowest sdk on your network.

robmac · July 22, 2019, 9:30am

As you are a new hopefully most of the devices you can buy have the features. Worth making sure when you are shopping.

Bruce_Osborne · July 22, 2019, 9:39am

How do you tell that?
As I have discovered you cannot depend on model names to identify. You would need the IDs. Also due to the expense of Z-Wave I have been tending toward the lower cost Chinese devices. Even the supplied documentation may not match the device.

Celaeno1 · July 22, 2019, 10:06am

@robmac

Silabs SDK Version Overview

Series	Branch	SDK	API	Keil	Release Date	Life Cycle	Release Notes
500	6.8x	6.82.01	6.09.00	9.54a	23/04/2020	xxxxx	Z-Wave 500 Series SDK v6.82.01
500	6.8x	6.81.06	6.07.00	9.54a	19/07/2019	xxxxx	Z-Wave 500 Series SDK v6.81.06
500	6.8x	6.81.05	6.06.00	9.54a	07/05/2019	xxxxx	Z-Wave 500 Series SDK v6.81.05
500	6.8x	6.81.03	6.04.00	9.54a	xxx	xxxxx
500	6.8x	6.81.01	6.02.00	9.54a	xxx	xxxxx	Z-Wave 500 Series SDK v6.81.01
500	6.8x	6.81.00	6.01.00	9.54a	27/09/2017	Monitored
500	6.7x	6.71.03	5.02.00	9.54a	08/01/2018	Monitored	Z-Wave 500 Series SDK v6.71.03
500	6.7x	6.71.02	5.02.00	9.54a	13/07/2017	Monitored
500	6.7x	6.71.01	4.61	9.54a	01/03/2017	Monitored
500	6.7x	6.71.00	4.60	9.54a	20/01/2017	Monitored
500	6.6x	6.61.01	4.62	9.54a	06/04/2017	Active By Request
500	6.6x	6.61.00	4.33	9.54a	21/04/2016	Monitored
500	6.5x	6.51.10	4.54	9.54a	09/02/2017	Active By Request
500	6.5x	6.51.09	4.38	9.54a	01/07/2016	Monitored
500	6.5x	6.51.08	4.34	9.54a	04/05/2016	Monitored
500	6.5x	6.51.07	4.24	9.53	19/02/2016	Monitored
500	6.5x	6.51.06	4.05	9.53	26/06/2015	Monitored
500	6.5x	6.51.05	4.05	9.53	02/12/2014	Monitored
500	6.5x	6.51.04	4.01	9.53	08/05/2014	Monitored
500	6.5x	6.51.03	3.99	9.51a	18/07/2014	Monitored
500	6.5x	6.51.02	3.95	9.51a	08/05/2014	Monitored
500	6.5x	6.51.01	3.92	9.51a	07/04/2014	Monitored
500	6.5x	6.51.00	3.83	9.51a	13/12/2013	Monitored

Above overview shows the SDK versions.

Which of these is a prerequisite for it to work (“without daily heal”)?

Latest SDK version of zwave.me UZB1 is afaik: 6.81.01

Latest SDK version of Aeotec ZW090 is afaik: 6.51.06 (see here) How to determine, which version is on this stick? Maybe somebody knows this? And there is no new firmware available (see here)!!??

robmac · July 22, 2019, 11:15am

All 500 should have explorer so could find a route even when the neighbours in table are not correct.

Some manufacturers may have also implemented on older chips.

All versions from way back only use neighbours during retry so could work even with some incorrect entries in the table.

Do be careful not to interpret what I am saying is that it is not good to have the routing table correct so you do not look at things that are not working and always expect it to repair.

In my experience the root cause of a lot of things that look like routing issues is not fixed by a network Heal. Best understand what is causing the issues before healing.

In the end your network is always going to be more resilient when the table is good but just because you see symptoms that look like routing issues the table is not necessarily bad.

An experiment for you to try.

Take a single node with power monitoring and install at the far end of your network where it will definitely route via only one or two nodes. Turn on all power monitoring and reports to the highest level the device allows. Now switch it on and off 10 times in quick succession and report back what happens.

I would predict that the first switch command is good and then things get slightly odd. You will get pauses and even a lockup. A few CAN on the serial interface possibly and the device will probably be marked dead.

You see it is dead and you assume the routing table is bad so you reach for heal? I promise you it is not and do not run heal. If the routing was good before it is still good now.

Now imagine you have got 10 of those devices with sensible parameters set but still all coming back to the controller through one or two nodes. Switch them all in a script ON then OFF with no delay 10 times. Is your routing table wrong or does it all sort out when that transient load has finished. And don’t expect anything else in the network to behave when you are doing that.

chris · July 22, 2019, 12:16pm

Sorry for the slow response - I’ve been travelling and just catching up one emails…

Hmmm - I’m not sure it’s really right to make such rash statements . I think it’s your interpretation of the documentation, which is (in fact) quite poor (I mean the documentation - not your interpretation ).

I’m not really sure how this will significantly help. It will NOT help slaves with their routing at all and that is one of the main purposes of the heal - to provide slaves routes to the controller and other nodes to which they are associated.

Also, many people are using Aeon controllers which have VERY old firmware). When making recommendations, I would strongly suggest to consider all users and not just those running the latest hardware.

On what basis do you make such claims? Clearly this is not the case as many people run the nightly heal with 100+ devices. Not withstanding that there is a problem with heal at the moment, it does not cause “armageddon”. It does however take a lot of bandwidth which is why it’s recommended to do it during the night (ie at 2AM). The binding has implemented such a heal for the last 6 years, so I think such statements are a little over the top

I would respectfully support the point made by @Bruce_Osborne - if you are making a statement like this, it would be a good idea to make it clear it’s just your view. There are a lot of lesser experienced people on the forum who want to learn and we should be careful about making sure that statements are personal views if they are not supported by fact. (I seem to recall only a couple of months back you were also a newbie in this area so I’m sure you understand )

Apologies for the robust responses, but glad to hear they don’t offend as this is never intended. I’ve been doing this a long time now (nearly 8 years working on Z-Wave and ZigBee with the last 3+ years being commercial). I certainly don’t consider that I know everything there is about these systems, and unfortunately ZWave is a standard that is ever evolving so I’m always learning. Opinions are always welcome, but I would suggest to consider the wider community .

The Explorer frames are a software feature - it is not linked at all to the chip series. As you say, some older 300 series will include Explorer, but also some 500 series chips do not include Explorer. It is down to the SDK firmware being used, and not the chip series.

Cheers
Chris

robmac · July 22, 2019, 1:22pm

So are you saying if a node successfully adds with no issues you need to heal as the routing table will not update? That is what I am saying and I believe that is fact not interpretation because that is what should happen when you add a node.

If you move or delete a node that is a different ball game.

To my knowledge the only thing that may not get updated to a better option is a direct association may now have another possible route. But again if it was working before you added a new node why would you run a heal as the route is good and the slave will only hold two possible return routes.

Do you run 100+ nodes and heal every night?

I am still a newbie on OpenHAB but I have used zwave for over 10 years with various controllers so also doing it a long time but always as a hobby. The only thing that has ever caused me issues is running a heal that was not needed but if it works for anyone let them continue. I will not take the risk of corrupting the routing table.

chris · July 22, 2019, 1:26pm

In some instances, YES. As I have said, the routes are defined by the controller using a command that is sent from the binding.

Possibly with newer devices, this may not be needed - I’m not 100% sure of this as the documentation is poor - hence my point.

Yes - 141 nodes at the moment.

Great, and your experience is for sure an asset. But just because someone knows how to drive a car, doesn’t make them a mechanic