Two thirds of Z-wave network remain uninitialized

After a reboot yesterday (Ubuntu kernel upgrade), something like two-thirds of my Z-wave network is disfunctional. Looking at the log, the Z-wave controller is ignoring messages from my nodes because they are uninitialized,

2018-05-26 22:10:24.245 [WARN ] [ssage.ApplicationCommandMessageClass] - NODE 28: Not initialized yet, ignoring message.

There are normal log entries from the few nodes that are functioning. For NODE 28 that is exemplified here, the only other log entry I can find is

2018-05-26 23:15:57.332 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Polling...
2018-05-26 23:15:57.338 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 28: Polling deferred until initialisation complete

After that reboot ~2 days ago, this has been the situation. I have tried rebooting multiple times, restarting OpenHAB (official version 2.2) to no avail. And waiting for initialization to happen for days. I have managed to get one Fibaro Dimmer 2 up and running by triple-clicking the switch, then it apparently managed to initalize, but I could not repeat on the next dimmer. A Fibaro Wall plug I did a full re-inclusion on, and that one is functioning.

The question is what to do next? Can this be fixed, or do I need to rebuild the entire Z-wave network from scratch (this will be many days of work)? Can I do something to understand why it happened, to ensure it will not happen again?

Looking at the log, the unitialized nodes seem to send messages (node 28 e.g is a Aeotec Multisensor 6, reporting the temperature), but they are ignored by the controller (Aeotec z-Stick Gen 5)

Any help possible??

Are the nodes that aren’t initialising all battery devices? Are they waking up at all - if not, they won’t initialise if the persistence file was somehow deleted. If this is the case, you should wake up the sleeping devices (possibly a few times) so that they can be initialised.

I think only three of the uninitialized nodes are battery devices, the rest are mains-connected.

OpenHAB persistence is accomplished through InfluxDB, and that is functioning well - is there another persistence concept for the z-wave binding?

Yes - the binding stores its own data internally so it doesn’t have to rediscover the devices each time.

Attached is also a 1000-line z-wave debug excerpt (of which the above log view is taken from).

I also guess the Network view in my first post shows something quite strange - the nodes should not be daisy-chained from the controller. Previously when the network was working fine, almost all nodes had direct connnection with the controller, even though the controller is non-optimally placed in the house (in a concrete garage attached to the house).

Not much to comment on with this - there’s only a single message being sent in this log, and it gets a response. I guess this short log is after the initialisation, and there are no commands being sent so very little is happening.

I did a restart of OpenHAB and captured the log from the start, available here:

Possibly of interest is that the controller seems to report that is has only 14 nodes (probably the few ones that are working):

In /var/lib/openhab2/zwave I have XMLfiles for all nodes, not only the functioning ones. All files seem ok.

If the controller is reporting it only knows about 14 nodes, then only these 14 nodes will work. This probably explains the problem you’re seeing.

I don’t think there’s any way out if the controller doesn’t know about the devices any more - they will likely need to be reincluded. Of course the bib question is why it changed - maybe the memory in the controller has an issue - who knows…

Thanks for the comments. I guess I’ll bite the bullet and rebuild the network after a factory reset of the stick.

It the stick is not to be trusted, I guess a mitigating action would be to buy a second spare stick, and keep a backup of the stick data (using the Aeotec software on some Windows computer).

One time I had a sequential block of nodes drop off the network from node 224-232. I was ready to rebuild the network, but on a whim I tried just putting the controller and devices into inclusion mode and they all rejoined. I can’t explain how/why this worked, but it never reoccurred and everything has worked fine since. Maybe this would work for you.

Thanks for the hint. Did you do that with the stick unplugged, or was it something you accomplished from the software side?

I had initiated the inclusion through OH, which is the safest way to do inclusion. Otherwise, you run the risk of an unhealthy mesh, which, as I understand it, cannot be healed through OH unless you are running the development zwave binding. Still, it can take a while to straighten it out.

I have never done inclusion through OH, always with the stick unplugged, maybe because I didn’t manage it in the first place. Trying to do it now I am getting errors that the binding/thing does not support discovery;

 karaf> discovery start zwave:serial_zstick:ff1568d3  
 log: 16:23:42.857 [WARN ] [internal.DiscoveryServiceRegistryImpl] - No discovery service for thing type 'zwave:serial_zstick:ff1568d3' found!

 karaf> discovery start 242 # for the z wave bundle
 log: 16:25:18.248 [WARN ] [internal.DiscoveryServiceRegistryImpl] - No discovery service for binding id '242' found!

And I get a similar red error message box from habmin while trying to click the magnifying glass with a ‘+’ sign in it.

Now that 2.3 is out, I will first upgrade to that.

Upgrade to 2.3 and rebuilt Z-wave network was done (many days ago). Topology looks much better now. For most of the nodes, inclusion via the binding was ok for most devices, but not all. Thanks for that tip, that saved me a LOT of time.

Not worried about those red nodes - they are currently without power…

1 Like

If you plan to have nodes powered off for a long while, consider excluding them and reincluding when you want to use them. Or just leave them powered up. I’ve run into routing issues when I’ve powered devices off for a while… like some power monitors that I basically use for testing. Your node 15 looks like it could be such a problem.

Glad you got it cleaned up!

No, it is not intentional to keep them offline, some await available time on my behalf, at least one is awaiting an electrician for final installation, and some red nodes are not really dead, it is just the binding that lost temporary contact - so it seems.