How to I get the Z-Wave binding to recognise that a device is dead?

waider · January 6, 2025, 10:55pm

Short version: I have some flaky devices that periodically fail and need to be reintegrated with the network. However, the Z-Wave binding seems convinced that they’re still alive, even when the devices themselves are powered down or factory-reset. How do I get the binding to accept that such a device is, in fact, dead, allowing me to invoke the replace-failed-node workflow?

OpenHAB 4.3.1 using a ~~Aeon~~ Aeotec Z-Stick (v5, I think)

Longer version: I have a number of POPP/Danfoss/Devolo TRVs[1] which seem prone to losing their connection to the Z-Wave network when the battery fails. I’m generally not able to recover them using OpenHAB: I’ve use the (now-defunct) Open ZWave Control Panel (ozwcp) to execute the following sequence of actions:

factory-reset the TRV - this usually isn’t necessary as it seems to get into this state itself
shut down OpenHAB
allow ozwcp to initialize the Z-Wave controller. it will try to initialise the broken TRV as part of this, but will generally not mark it as failed.
select the broken TRV
run ozwcp’s “has device failed” command
ozwcp will flag the device as DEAD (this may take a couple of attempts, since my understanding is that (a) Z-Wave prefers not to assume battery devices are dead and (b) the process of marking something as DEAD requires it to fail to respond to some number of communication attempts)
run ozwcp’s “replace failed device” command
press the inclusion button on the device
ta-da! device is back on the network
restart OpenHAB

Obviously having to shut down OpenHAB to do this is a bit annoying and I’d rather be able to do it from within OpenHAB. I’m currently looking through the code for the binding and it looks like both “has device failed” and “replace failed device” are implemented, but it looks like “has device failed” is only run as part of the “Set device as FAILed” action, and is run after “replace failed device”; there’s no independent way I can see to trigger the “has device failed” query. I’m honestly not even sure if this is needed, but it does seem to be a necessary part of the ozwcp process to get the controller to admit that the device is in fact dead.

(chasing further: “replace failed device” calls “requestSetFailedNode”, which sends “ReplaceFailedNodeMessageClass”, which is a tiny bit confusing - is there a reason it’s done this way?)

[1] This guy: POPP Wireless Thermostatic Valve TRV (010101) - popp.eu; regardless of the branding the devices all appear to be the same.

apella12 · January 7, 2025, 1:08am

Short version: It is nearly impossible to remove a dead battery device with the Zwave binding.

Because they are thought to be “sleeping” the check “if failed message” will not be sent. You could try to put the controller in exclude devices mode and press the button on the battery device. However, if you have already included it again, it will likely remove the new one. I’d recommend the use of the Silabs Simplicity studio. It doesn’t care if it is a battery device. Also note nodes are stored on the controller, not in OH. That is why if you try “Delete Thing” they will pop back up when you scan

waider · January 7, 2025, 9:09pm

That doesn’t really help, I’m afraid, since it leaves me with the problem of having to turn off OpenHAB in order to fix the network, which is what I’m trying to avoid having to do. Is this a known bug in the driver? As described, I’m able to use a different tool to get the desired outcome, and you’ve suggested a third tool, so that suggests that it’s definitely possible to do this, but that the OpenHAB implementation doesn’t support it?

apella12 · January 7, 2025, 11:02pm

In theory it is supported, in practice it is not. Also, the problem is with battery devices, not powered nodes. If powered nodes do not respond they can be removed.

waider · January 7, 2025, 11:04pm

Can you elaborate on “in theory it is supported”? How is it supposed to be used, so I can at least try that out?

(also, again, “the problem is with battery devices, not powered nodes” doesn’t really help. I’m not having a problem with powered nodes, as I indicated in the original post. Don’t get me wrong, I appreciate your desire to help.)

apella12 · January 7, 2025, 11:14pm

What has to happen is that the battery node fails to wake up within 2x the wake frequency (this is from memory, not the developer, but have some experience with the binding). However, any restart of OH within that time period and the timer is stopped. Basically the node itself has to declare itself dead, sending the check if failed will never work for a battery with the OH Zwave binding, sorry.

You could try a test; include a battery device, make sure it is fully configured, then pull the battery. It should get marked as dead in a day or two. Don’t stop or restart OH during the test period. Then it can be deleted.

waider · January 7, 2025, 11:22pm

Ok, so one of these devices has been offline since December 27th so I think that’s fairly reliably proving this doesn’t work. I guess I can file an issue and see what happens.

apella12 · January 8, 2025, 12:56am

Ok. Just check under device properties for last wake-up, since that what the timer is supposed to trigger off of. Also note the wake-up frequency in the issue. Lastly, if the Ui doesn’t have the “reinitialize device” bar at the bottom the node is not fully configured. Provide all that with the issue.

waider · January 8, 2025, 2:34pm

Thanks, I’ll try and get a reproducible failure and maybe do a bit of code diving before I file the issue as it’s not a lot of help otherwise.

waider · February 9, 2025, 11:15am

So I can sort of see the problem here. As best I can tell: If a device isn’t responding, then it never gets out of initialisation, and that seems to block any further attempts at communicating with the device, which means the binding never gets to the point where it considers the device dead. This may only apply to devices which are already dead when OpenHAB starts up, so it may be an edge case, but my experience seems to be that these devices fall off the network when they run out of power and the binding never detects that. It’s possible the heal process interferes here - healing requires the device to wake up and do a bunch of work, and I’ve seen that tip devices over from “barely enough power to operate” to “dead”, meaning they never complete the healing process, and again it seems like that may leave them in a state where the binding’s dead node logic never actually kicks in. Given I currently have two dead devices but no “nearly dead” devices it’s a little difficult for me to verify this right now but I’ll see what I can do with it.

cidi · March 1, 2025, 8:28am

Same problem with the same valves (popp or devolo with Aeotec Z-Stick) here.
Do you have another solution or does it always work as described in your 1st post above?

waider · March 1, 2025, 8:43am

It always works as described - sometimes I need to hit “has device failed” a couple of times. I still don’t have an alternative to this. I think I can see how I might make it work with the ZWave handler, but my concern is more that I’d create some new problem or incompatibility with how it currently works, and its current operation is based on years of running “in the wild” with all manner of devices whereas I’ve just got a dozen or so devices on my network to test against.