[SOLVED] Z-Wave Network Heal.. when is it needed? does it always need to happen?

In theory even if you add a node it should not be needed. Remove or move to a very different location will be more likely to cause issues.

When you add the node, neighbours and node are added to the table and any devices associated to your new device get their routes set.

The node and controller should sort out the last working route and all being good always use that. No heal needed.

I am just a bit against a full network heal as on larger networks certainly 100+ my experience and reported experience by others is bad. Fibaro released a firmware for HC2 that issued a 2.a.m heal every day and that is why I am now here not using HC2.

It night by night destroyed my stable 144 node network as it wiped out all of the LNR and caused a retry storm. When they turned it off the network did fix itself but I had already decided to leave by the nearest exit.

It looks like there is a difference of opinions here based on difference in interpretation of the standards.
For the sake of us poor users, please do not state something as fact if it is just your interpretation & not undisputed fact.discussion among experts is good, but please be clear when responding to user queries.

Thank you very much for your caring to respond and your dedication.

Is the full statement and is fact. If it does not all sort itself out when left to its own devices by all means heal but if it sorts itself out don’t do it for the sake of it. No opinion here but by all means use heal but you may well regret it.

1 Like

Fair enough.
I was just trying to sort our fact from personal interpretation or opinion.
Not singling you out at all.

I hope I did not offend you too much…

Even Chris’s robust responses do not offend.

It is all healthy and we all learn. I am also not always very precise in my wording so if you do not understand my meaning please challenge.

2 Likes

I currently have my small network (7 nodes?) set up to the default daily heal. I am just trying to learn from others as I slowly expand my network & my learning.

My recommendation is turn periodic network heal off.

Use a controller with a reasonably recent version of Z-Wave and the SDK

Don’t use full network heal routinely when you add a node. When a node is added it will go through asking neighbours etc and the controller will add it into the topology. Unless the add fails for some reason you should not need to do anything.

If after leaving a good time the new node is not behaving well then try healing just that node. I always find it is best to let things settle before assuming it is not working. When you read that part of the developer guide you can see there is a lot that goes on and some of it can take some time to settle in my experience.

One thing I would say to anyone starting out and it is also in Chris’s guide. Start slow.

Plan so you do not need to remove or move devices. Those are the actions that may lead to needing to rebuild the network routing table.

After each change let things settle down and be sure the change has worked.

After your network is mature, I would recommend that you never add more than one node a day and only add the next when you are satisfied the last is good. Rebuilding 7 from scratch is not too bad re adding 140 is a lot of time and if things get really bad then that may be what you have to do.

The full network heal in my opinion is the nuclear option. If you have a big network it probably won’t help as it will escalate into armageddon on your network, if you have a small network it may work but don’t do it until you have checked for other reasons that things are not working. If you can work out from the logs that it is a particular device just treat that one it is a lot safer.

1 Like

Thanks.
I have a HUSBZB-1 I think @chris indicated it is old and cannot be updated.

We are going off topic but as it will avoid people hitting that nuclear button, you are going to spend a small fortune on devices so spending a little on a controller that can be updated with the latest SDK prior to the Z/IP tied SDKs is probably a good investment.

I just use a sigma designs UZB-3. It is a 500 chipset and can take the latest version of the firmware prior to the 700/Z/IP firmwares.

You can buy either the static controller or the bridge version but you do need to flash with the static controller firmware before using with OpenHab.

They are relatively cheap and last time I looked places like digikey and mouser have them in one or other flavour. If you buy two you can turn one into a zniffer and that is the best tool anyone can purchase to diagnose issues. Make sure you buy the correct frequency for your area. Other frequencies will sort of work but not very well.

Do not buy UZB-7.

1 Like

Thanks for the advice.
I have just started “dipping my toes” into Smart Home and my stick did not lock me into Z-Wave, Zigbee, or a software platform while minimizing expense while I learn.

Wow, this thread got interesting real fast!

I had never heard of Z/IP before. Interesting concept, not sure I understand what the purpose is. We’ve already had IP-capable Z-wave controllers forever, and must certainly easier to secure it than the all of the individual Z-Wave packets? Seems like it’d make it trivial to hack the lights from the home network. Has there been any more discussion of Z/IP here? I’d love to read it and learn more.

Am I covered with these ones? It’s the z-wave.me version. I think I have enough backups at least :slight_smile:.

image

I’ve had auto network heal disabled for a few days now and things seem to be working fine. It’ll probably be a while until I need to add more Z-Wave devices, since I now (through openHAB, its plethora of bindings, and rolling my own with ESP microcontrollers) have several viable alternatives.

They well may be as z-wave.me does keep firmware releases up to date. Check the firmware and SDK versions and take a bit of advice which is the most recent stable.

That is a big advantage of being openHAB and at some point it might get a Z/IP binding.

Z/IP is a whole other game and a long story. At the most basic level it does not look to add a lot but when you play with it you can see some advantages and some disadvantages.

I can totally believe what Chris says that some companies will see it as too big a departure from what they are used to and go elsewhere. It would have to be another thread though or the focus of this thread will get lost completely.

In summary:

First read Chris’s guide about the binding.

then

  1. You do not need to Heal all of the time and certainly not every night
  2. Later SDKs are more resilient and have mechanisms that make healing less likely to be needed
  3. Try to find the cause and treat that. Heal is not a magic solution that will fix everything. If a device is too far from the network it is too far and you will need to add nodes that connect it. If you have a device with bad firmware then healing will not fix that. If you have too many reports it will make the network less resilient and you will get nodes behaving strangely and dropping from the network etc.
  4. If you decide nothing else is possible then heal but use individual device heal and do it one at a time with time to settle.
  5. If you remove or move a node a heal is necessary to correct the routing table but even then just healing nodes that used to route through that node may fix. You may well find that the node will reconnect as explorer frames will find the node if supported by the device and controller.

The last point is an interesting one. With a Zniffer you can see what routes are being used and which routes are not stable. If you have issues this is the best way to diagnose if the logs do not yield the answer.

Full network heal is the nuclear option and only when nothing else works would be my position. It is that tempting quick fix as it sounds like it is the magic solution but it is not.

The routing table has a few purposes and needs to be correct for these things to work.

The neighbour topology is used as a backup mechanism when the primary routing mechanism fails. If the topology is wrong this mechanism can not work and will add no value. Having it correct will add resilience.

When setting a device to device association, I believe the table is used to set the initial two values in the return route. If these are incorrect associating devices may not work.

Adding a device should not need a heal and certainly not a full network heal. When the device is added it discovers all of the neighbours and it is added to the table. IF the add does not complete well then a level of heal may be necessary.

1 Like

Newbie question here. Doesn’t both the controller and device need to have the new sdk to receive the benefits?
I believe many devices are not upgradeable. You are then basically restricted to the lowest sdk on your network.

As you are a new hopefully most of the devices you can buy have the features. Worth making sure when you are shopping.

How do you tell that?
As I have discovered you cannot depend on model names to identify. You would need the IDs. Also due to the expense of Z-Wave I have been tending toward the lower cost Chinese devices. Even the supplied documentation may not match the device.

@robmac

Silabs SDK Version Overview

Series Branch SDK API Keil Release Date Life Cycle Release Notes
500 6.8x 6.82.01 6.09.00 9.54a 23/04/2020 xxxxx Z-Wave 500 Series SDK v6.82.01
500 6.8x 6.81.06 6.07.00 9.54a 19/07/2019 xxxxx Z-Wave 500 Series SDK v6.81.06
500 6.8x 6.81.05 6.06.00 9.54a 07/05/2019 xxxxx Z-Wave 500 Series SDK v6.81.05
500 6.8x 6.81.03 6.04.00 9.54a xxx xxxxx
500 6.8x 6.81.01 6.02.00 9.54a xxx xxxxx Z-Wave 500 Series SDK v6.81.01
500 6.8x 6.81.00 6.01.00 9.54a 27/09/2017 Monitored
500 6.7x 6.71.03 5.02.00 9.54a 08/01/2018 Monitored Z-Wave 500 Series SDK v6.71.03
500 6.7x 6.71.02 5.02.00 9.54a 13/07/2017 Monitored
500 6.7x 6.71.01 4.61 9.54a 01/03/2017 Monitored
500 6.7x 6.71.00 4.60 9.54a 20/01/2017 Monitored
500 6.6x 6.61.01 4.62 9.54a 06/04/2017 Active By Request
500 6.6x 6.61.00 4.33 9.54a 21/04/2016 Monitored
500 6.5x 6.51.10 4.54 9.54a 09/02/2017 Active By Request
500 6.5x 6.51.09 4.38 9.54a 01/07/2016 Monitored
500 6.5x 6.51.08 4.34 9.54a 04/05/2016 Monitored
500 6.5x 6.51.07 4.24 9.53 19/02/2016 Monitored
500 6.5x 6.51.06 4.05 9.53 26/06/2015 Monitored
500 6.5x 6.51.05 4.05 9.53 02/12/2014 Monitored
500 6.5x 6.51.04 4.01 9.53 08/05/2014 Monitored
500 6.5x 6.51.03 3.99 9.51a 18/07/2014 Monitored
500 6.5x 6.51.02 3.95 9.51a 08/05/2014 Monitored
500 6.5x 6.51.01 3.92 9.51a 07/04/2014 Monitored
500 6.5x 6.51.00 3.83 9.51a 13/12/2013 Monitored

Above overview shows the SDK versions.

Which of these is a prerequisite for it to work (“without daily heal”)?

Latest SDK version of zwave.me UZB1 is afaik: 6.81.01

Latest SDK version of Aeotec ZW090 is afaik: 6.51.06 (see here) How to determine, which version is on this stick? Maybe somebody knows this? And there is no new firmware available (see here)!!??

1 Like

All 500 should have explorer so could find a route even when the neighbours in table are not correct.

Some manufacturers may have also implemented on older chips.

All versions from way back only use neighbours during retry so could work even with some incorrect entries in the table.

Do be careful not to interpret what I am saying is that it is not good to have the routing table correct so you do not look at things that are not working and always expect it to repair.

In my experience the root cause of a lot of things that look like routing issues is not fixed by a network Heal. Best understand what is causing the issues before healing.

In the end your network is always going to be more resilient when the table is good but just because you see symptoms that look like routing issues the table is not necessarily bad.

An experiment for you to try.

Take a single node with power monitoring and install at the far end of your network where it will definitely route via only one or two nodes. Turn on all power monitoring and reports to the highest level the device allows. Now switch it on and off 10 times in quick succession and report back what happens.

I would predict that the first switch command is good and then things get slightly odd. You will get pauses and even a lockup. A few CAN on the serial interface possibly and the device will probably be marked dead.

You see it is dead and you assume the routing table is bad so you reach for heal? I promise you it is not and do not run heal. If the routing was good before it is still good now.

Now imagine you have got 10 of those devices with sensible parameters set but still all coming back to the controller through one or two nodes. Switch them all in a script ON then OFF with no delay 10 times. Is your routing table wrong or does it all sort out when that transient load has finished. And don’t expect anything else in the network to behave when you are doing that.

1 Like

Sorry for the slow response - I’ve been travelling and just catching up one emails…

Hmmm - I’m not sure it’s really right to make such rash statements :slight_smile: . I think it’s your interpretation of the documentation, which is (in fact) quite poor (I mean the documentation - not your interpretation :wink: ).

I’m not really sure how this will significantly help. It will NOT help slaves with their routing at all and that is one of the main purposes of the heal - to provide slaves routes to the controller and other nodes to which they are associated.

Also, many people are using Aeon controllers which have VERY old firmware). When making recommendations, I would strongly suggest to consider all users and not just those running the latest hardware.

On what basis do you make such claims? Clearly this is not the case as many people run the nightly heal with 100+ devices. Not withstanding that there is a problem with heal at the moment, it does not cause “armageddon”. It does however take a lot of bandwidth which is why it’s recommended to do it during the night (ie at 2AM). The binding has implemented such a heal for the last 6 years, so I think such statements are a little over the top :wink:

I would respectfully support the point made by @Bruce_Osborne - if you are making a statement like this, it would be a good idea to make it clear it’s just your view. There are a lot of lesser experienced people on the forum who want to learn and we should be careful about making sure that statements are personal views if they are not supported by fact. (I seem to recall only a couple of months back you were also a newbie in this area so I’m sure you understand :sunglasses:)

Apologies for the robust responses, but glad to hear they don’t offend as this is never intended. I’ve been doing this a long time now (nearly 8 years working on Z-Wave and ZigBee with the last 3+ years being commercial). I certainly don’t consider that I know everything there is about these systems, and unfortunately ZWave is a standard that is ever evolving so I’m always learning. Opinions are always welcome, but I would suggest to consider the wider community :wink: .

The Explorer frames are a software feature - it is not linked at all to the chip series. As you say, some older 300 series will include Explorer, but also some 500 series chips do not include Explorer. It is down to the SDK firmware being used, and not the chip series.

Cheers
Chris

2 Likes

So are you saying if a node successfully adds with no issues you need to heal as the routing table will not update? That is what I am saying and I believe that is fact not interpretation because that is what should happen when you add a node.

If you move or delete a node that is a different ball game.

To my knowledge the only thing that may not get updated to a better option is a direct association may now have another possible route. But again if it was working before you added a new node why would you run a heal as the route is good and the slave will only hold two possible return routes.

Do you run 100+ nodes and heal every night?

I am still a newbie on OpenHAB but I have used zwave for over 10 years with various controllers so also doing it a long time but always as a hobby. The only thing that has ever caused me issues is running a heal that was not needed but if it works for anyone let them continue. I will not take the risk of corrupting the routing table.

In some instances, YES. As I have said, the routes are defined by the controller using a command that is sent from the binding.

Possibly with newer devices, this may not be needed - I’m not 100% sure of this as the documentation is poor - hence my point.

Yes - 141 nodes at the moment.

Great, and your experience is for sure an asset. But just because someone knows how to drive a car, doesn’t make them a mechanic :wink: