Z-Wave Stops Responding After Nightly Heal

  • Platform information:
    • Hardware: Supermicro server, 512 GB RAM, 24 x 8 TB drives
    • OS: Arch Linux kernel 5.4.6-arch3-1 with java-rxtx (2.2pre2-6)
    • Java Runtime Environment: OpenJDK 1.8.0_232-b09
    • openHAB version: 2.5.0 from openhab2 AUR package (no VM or container)
    • Z-Wave bundle: 2.5.0
    • Serial Binding bundle: 1.14.0

My ~85 node Z-Wave setup mostly contains mains or USB powered devices. It has been working relatively well under Hubitat but I wanted more power so moved the Aeon Z-Stick Gen5 to an OpenHAB install 2 days ago. The move went fairly well, with all devices immediately showing up as Things and their manufacturer and device info being fetched over the following hours. I could control the devices and it all seemed fine.

On the first night I left it running and by morning most Things had gone to an offline mode. I tried sending commands but they didn’t respond. Over the following few hours they progressively came back online. I assumed they’d just been marked offline as they hadn’t been heard from for so long, so to avoid it happening again I set all Z-Wave devices to a “polling period” of 30 minutes.

On the second night (last night) I left it running again. When I went to bed I could control lights via Alexa and all was fine. There are no other rules or add-ons.

When I woke up today I noticed all devices were again offline. However this time they did not progressively come back online. I could not control any. I tried to restart the OpenHAB systemd process and debug level logging showed a flood of Z-Wave traffic from several devices. I attempted to change 2 x Aeon MultiSensor 6 to “selective reporting” (which had been overlooked from the Hubitat era) which seemed to cause most traffic. However none of the attempts to send that configuration change were accepted. I did a soft reset on the serial Thing but then OpenHAB didn’t talk to the Z-Stick at all. I assumed I needed to do a network heal but couldn’t find a manual option to kick it off, so I set the next heal time to 2 pm (the next upcoming hour) and cold restarted the server (and attached Z-Stick).

Nearly 6 hours have now elapsed since restart. A small amount of traffic is received, but control is either impossible or significantly delayed (minutes to turn a light on or off, if it ever does). All Things have a status of “Online Node initialising: PING”.

I have uploaded the debug-level log to https://ufile.io/16mh2097. A look at the log via https://www.cd-jackson.com/index.php/openhab/zwave-log-viewer does not show anything obvious like a device causing a flood.

Any suggestions what to try next?

Welcome to openHAB.
This is a known issue for some zwave users, disable the nightly heal temporary:

Thanks. I disabled nightly heal and restarted, but it continued to not make any progress.

Because I need the ability to turn the lights off, I shutdown the server and moved the Z-Stick back to the Hubitat. It booted up and was able to control everything immediately (including reports from motion sensors etc). As such I do not believe the Z-Wave network is overloaded or defective as Hubitat could control it immediately.

What is the correct “polling period” to use when I have mostly mains or USB powered devices? Is there a requirement for polling at all?

If I deactivate nightly healing, all things go offline and nothing works anymore. What can I do there?

Start troubleshooting as described in the binding docs.

I will just warn that it may be difficult for us to troubleshoot since that is not an official installation package and has been patched by the AUR PKGBUILD. Just be aware of our potential limitations.
Apparently Christoph Scholz has been maintaining that build.

1 Like

https://aur.archlinux.org/packages/openhab2/ provides links to the patches. They’re just a couple of system startup and config files. I struggle to see how these patches would contribute to a Z-Wave network issue (based on over 20 years of Java programming experience). I chose OpenHAB over alternatives because it’s written in Java and I want to contribute. But I need to be able to control my lights first so the family are happy. :grinning:

OK, fair enough.
I found out recently the Linux installation documentation is a mess. I think the only other path would be to use Docker. The changes have moved directories around different than any of our defaults and the startup command is different than used by our systemd examples.

I doubt those changes would impact your issue but they make it more difficult for us to point you to particular files and location for troubleshooting. I was mainly alerting others here to those differences.