Zwave battery units randomly lost

Hi, I’ve been experiencing problems with my zwave devices. Once every few weeks, my Z-wave network looses touch with one random battery operated zwave device. I have an Aeontec Z-wave v5 stick and several powered zwave devices, mainly Qubino switches and Qubino and Fibaro plugs. They have no problems. I also have Fibaro Motion Sensors and Sunricher button panels, which are battery operated. Every couple of weeks, one (random) of the battery operated units looses touch with the network. Sometimes triggered by the battery dying, but often also when the battery is still OK.

With Zensys tools I am sometimes able to “replace failed” the units with themselves, but most of the time I have to re-include the sensor, which gives it a new nodeId in the stick and I need to re-confgure the Thing in openHAB.

I’ve added extra power-plugs all over the house to make the z-wave network more robust, because I suspected the nodes being too far from the stick. But this does not seem to make any difference, battery operated units les than 3 meters away from the stick also loose contact sometimes.

I’ve configured the zwave things in a zwave.things file, which makes the configurations read-only in openHAB. The network heal time setting is set to 2.

Does anyone recognise this, or have a hint for me how I can prevent the sensors from being lost? It’s really bad for the “family acceptance” that every few weeks, part of the house cannot be luminated… :wink:

thanks,
Iwan

There is an algorithm in the zwave binding to set a device to failed. I think it is something like 2x+ when the controller did not hear from the device as expected. Not a fix, but reducing any polling and not doing the heal may at least spread out the problem. If you are not adding to the network healing is not really needed and can disrupt routings (IMO). I have the heal disabled with 47 nodes (13 battery). They will figure it out on their own. Battery devices are not involved in routing traffic anyway (except their own)

Bob

Thanks for replying.
I think the units are not marked failed in the stick. That’s why I think I’m having trouble with the “replace failed” function, they are not marked failed. Will the stick no longer accept packages from a node that was formerly known but now marked failed? When I trigger the movement sensor or a button I can see the sensor responding with a led, but differently, indicating there is a problem communicating. The debug log shows no packets coming in from that node at the serverside.
If it is the stick that prevents the communication, do you think restoring a backup of the stick will solve the issue?

Iwan

I’m not following completely.

I have used the Zensys/Silabs PC controller, but I do not have it running right now. From memory to remove nodes that I could not remove using the controller UI page & “Exclude Devices”, I first did the “(Check) if failed”, then I thought “remove failed”. I do not recall a “replace failed” and not at all sure what that would do, but again that’s from memory.

If the node is still on the stick there should be debug logs, if the radio waves are getting to the controller.
It does sound like a communication problem from your original post. If OH “hears” from a node I think it will adjust the status from “not communicating” to “online”.

Bob

Hi. The Zensys Z-Wave PC Controller software has an “is failed”, a “remove failed” and a “replace failed” button. “is failed” will show “Device is functioning normally” for the unit that does not work anymore, which is strange because also in the Zensys log, there is no mention of messages from the no-longer-working node. “remove failed” does nothing, because it is not marked as failed. “replace failed” is an option that will allow you to replace an existing node with a new piece of hardware, I assume meant to replace hardware that really died. That option will take over the node-id of the existing node. This sometimes works, the stick will go into inclusion mode and if I set the sensor to inclusion mode too, sometimes it will pick it up and “replace” the node with itself. This is (given that it did not work anymore) the best option, because I do not need to adjust openHAB configurations, movement events from the unit will now come in with the same old node-id and the thing/items/channels keeps working. But most of the time I need to use “remove” and “add” on the no-longer-working unit, making it work again but with a new node-id, so I need to change openHAB configurations too. OpenHAB also marks the no-longer-working unit as online, but channels like motion_detected, battery_level etc. are not updated anymore.

But the main issue remains: every few weeks a random battery operated unit that still has battery, and still works given that it gives me feedback e.g. when it sees movement or when I press a button, but the messages do not reach openHAB anymore, so the rules no longer turn on the lights. Using above procedure, I can make it work again, but in the mean time, I get frowned upon by the other family members because “it is failing again” ;).

sounds like a formula for creating zombie nodes

yep. Exactly that. I have around 50 nodes, But I’m already at nodeId 75. I you see another way of keeping the nodes … as stated, the problem is that they stop functioning in the first place.

I did have occasional issues with battery motion detectors (mostly zooz). It seemed they stopped reporting motion around the wakeup time. After a while they did seem to reset, if left alone (so if it wakes in the middle of the night, it is never observed). To reduce this, I did set both the polling & wakeup to 86400, basically just to get a once-a-day battery reading. Also, as noted above, heal is disabled. Lastly, I found I could fix the stalled motion by taking out and then replacing the battery. As noted above, not exactly fixes, but it rarely happens now.

If I have a new node number for an excluded one, I link it to the old item, so the rules do not have to be changed. For instance, I have a thermostat node that has been 14, 23 and 63, but the item still has the 14 in it. If I had to do it over, I would not put the node number in the item.

Bob

I also have frustrating issues with battery devices. Not as frequent as your issues, but very similar. I have a Homeseer door contact sensor in which the battery ran down. Replacing the battery resulted in a device that is “online”, but no reports came in. The light on the sensor blinks as if it is alive, but nothing is getting to OH. I had to exclude and re-include. Same issue with a Homeseer leak sensor.

And just this last weekend I had trouble with a Ecolink door/window sensor. I think the battery got too low. Replacing the battery didn’t help. Eventually I had to exclude/re-include, but now I have a ghost node left over from this. Of course this sensor is under the house where it is hard to get to. For this device I am using a dry contact, so I am going to move its location and use a long wire instead of trying to keep the sensor close to the device it is monitoring. Then it should be easier in the future to replace the battery and exclude/include if needed.

Anyway, just to point out that this seems to be a weak spot with Zwave battery devices. I think it is a waste of batteries, but I may put them on a schedule and replace batteries early to avoid these troubles.

Hi guys, thanks for sharing your experiences.

In my experience, once lost, the units will not recover automatically. I suspect low battery has some influence, but not always. Most of the times I experience this, the devices still give (led) feedback so battery is not completely dead. Problem with the Fibaro sensors is that they have a very bad battery sensor, they report 100% for over a year and then go down suddenly. I have a new house, all dimmers for the whole house are in 1 technical room. I have zero wired switches, therefore I have relatively many battery units (around 20, all buttons and movement sensors). That (sheer numbers) might be why I’m having more problems than you guys. That is also the reason I have several dummy zwave power-plugs around the house, to make sure there is good coverage for routing events.

I expected the heal to be a good thing, that it would try to find lost units but I will try to remove the heal, and set a longer polling interval.

Iwan

I have also observed this with battery fibaro sensors

Random thoughts;
I converted a static controller to a Zniffer. I found I did waste some money on powered nodes that I thought would help routing, but once seeing the traffic did not.

I did replace some battery devices with motion sensors that had optional usb power. Not as neat with the wire and a compromise on the placement, but less problems and a repeater.

I invested in rechargeable 3v CR123A, so if readings get low, I just switch it out.

Lastly less healing and longer wakes will also save some battery life, so that should help there as well.

Best wishes

Bob

Lost another node today after removing the heal. This is a node that I replaced battery for a week ago, and it worked after that, but now (a few days later) the node stopped working.

What is the best way to remove nodes from the controller that are no longer there? Most of the battery units already lost are still a node in the nodelist with their previous nodeID(s). I’m unable to remove these nodeIDs with the Zensys tool, mostly because I cannot mark them failed. The tool keeps saying the nodes are functioning normally even though there hasn’t been a message for months. I cannot “remove failed” (because they aren’t), and “remove node” wants me to set the device to exclusion mode which I cannot due to obvious reasons.

I currently have a lot of ghost nodes (old nodes that currently already have a new nodeID). I do not know if this is bad, I read somewhere that it affects routing, but these are battery nodes without routing function. Second worry is that I will reach maxNodeID someday because I use a new nodeId every few weeks, and cannot re-use the nodeIDs while they are in use. Third worry is that I am currently the only one knowing what to do, so I need to live here forever…

I have never had a problem marking with a battery device being as failed using the Silabs tool, since by definition it is not listening. OH/Zwave binding will not do it because it will not send a message to a sleeping node. Can you use the Remove option?

Bob

edit: Perhaps Aeotec support has an idea. They are quite good.

Did anyone solve this; same issues very frustrating

I disabled the auto-heal, that seems to help a bit, but not 100%. I have now started to replace a failing z-wave switch with a friends-of-hue switch connected to the Philips Hue hub. These units haven’t failed up to now, and they are also quicker. With z-wave, I sometimes have up to 20 seconds delays between movement detected and lights on.

Replacing the units is however less than ideal, because the Hue hub is getting more and more important. There is no seamless replacement strategy for the Hue hub when it dies, other than re-pair everything and update the things in openHAB, which means significant downtime. It’s also expensive, to replace units that are not technically broken with new ones. But it saves a lot of frustration.

Have similar experiences with fibaro motion sensors - they disconnect after some weeks of working allthough battery is still ok. Even replacing battery and activating (by pressing button) does not help, Once disconnected they stay disconnected. Never found out what’s the reason. I can recover them by setting back to factory defaults and re-including them but that’s not a real solution.
Found out that not all brands have this issue - replaced my Fibaro sensors by Neo Coolcam devices and did not run into similar troubles - allthough these sensors are less featured they do not disconnect. Reliability is more important than fancy colors and earthquake sensoring.
The first Neo sensor I used is now more than one year working without troubles.
There may be other brands also working but my personal experience is that with the Neo devices.
What would be of interest is if users of other automation systems like home assistant face the
same issues. If this is the case the issue is located between device<->controller, what I strongly assume, However if not there is probably an issue with the binding.

Hi,

same issue on my smart home. battery powered devices die and I have to reinclude them.
But this is not from the beginning. I have this sensor since 3 years, never had a problem. Since a few month this problems started. I replaced the sensor with a new one, same problem. Did something changed in the binding from 3.1.0 to 3.3.0? Maybe there is a bug? @chris if I’m right you know very well about the binding?!