Zigbee and Zwave configuration fails after reboot

What I mean by 32 devices is think about it like a network switch if it only has 24 ports that you can plug in network cables to then it only handles 24 directly attached devices you can of course have more then 24 devices passing traffic through the switch by hooking another switch to one of those ports and all the traffic from the other switch will also be handled but only on the one port it comes in from on other switch and goes out on to whatever your next hop is such as a gateway or controller or some other network device etc… Check with your original device vendor of the usb stick they will state the number of directly attached devices the stick can support. as an example, if you look at the specs for a sonoff usb dongle E that is a ember based and fully listed/advertised as OPenHab compatible they state that the max number of direct connections is limited to 32 many other controller stick also have same limits. Hopefully that clears up what I mean by only ~32 devices directly attached.
Also keep in mind in may not be a Enviromental change that occurred in your house if you have a nearby neighbor they could have added something in their home that is causing interference with your air space.
With regard to Zwave slow coming up I have also seen that in the past on my systems and found it to be a race condition between the Zigbee binding and the Zwave binding and a 3rd party binding I use all trying to initialize at same time. I have seen this behavior going all the way back to version 3 but I have 2 separate USB devices so not quite the same as your configuration so I only mention this as an antidotal thought, seem recall other folks have run into that same race condition back in early versions of OH and may chime in,on that topic, as well. Also, many folks moved over to zigbee2mqtt as a way to avoid that situation among many other reasons from expanded device support and so on.

1 Like

So I brought the spare Pi-4 up with the same hardware, it came up with all zigbee ONLINE, then a few hours later, a similar set of Zigbee devices started dropping to OFFLINE. I checked 7 of them and the last 2 log entries for each was:

2024-02-02 14:07:14.663 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker Timeout has been reached for thingUID=zigbee:device:ZigBee_Bridge:282c02bfffe899c4
2024-02-02 14:07:14.665 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:device:ZigBee_Bridge:282c02bfffe899c4' changed from ONLINE to OFFLINE

Which might be perfectly normal for a device that is not responding for some reason.

All of the ones that went OFFLINE that I looked at had the same final end.

The logs previous to this occurring for this above device were:

cat /var/log/openhab/openhab.log.6 | grep -i "282c02bfffe899c4"
2024-02-02 14:03:01.014 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling 6 channels
2024-02-02 14:03:01.017 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling zigbee:device:ZigBee_Bridge:282c02bfffe899c4:282C02BFFFE899C4_1_batteryvoltage
2024-02-02 14:03:01.018 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=0, state=WAITING, sendCnt=0, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=--, identifiers=[32]]]
2024-02-02 14:03:01.021 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=3, state=WAITING, sendCnt=0, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:05.742 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4723, state=FAILED, sendCnt=1, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:05.744 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:05.745 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4727, state=WAITING, sendCnt=1, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:05.747 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4729, state=WAITING, sendCnt=1, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:10.468 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9450, state=FAILED, sendCnt=2, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:10.470 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:10.471 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete exceeded max retries 2
2024-02-02 14:03:10.472 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9453, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:10.473 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling zigbee:device:ZigBee_Bridge:282c02bfffe899c4:282C02BFFFE899C4_1_iaslowbattery
2024-02-02 14:03:10.473 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9455, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=04, identifiers=[32]]]
2024-02-02 14:03:10.475 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:03:10.478 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=0, state=WAITING, sendCnt=0, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=--, identifiers=[2]]]
2024-02-02 14:03:10.480 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=2, state=WAITING, sendCnt=0, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:15.202 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4725, state=FAILED, sendCnt=1, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:15.204 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:15.205 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4728, state=WAITING, sendCnt=1, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:15.207 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4730, state=WAITING, sendCnt=1, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:19.930 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9452, state=FAILED, sendCnt=2, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:19.931 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:19.932 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete exceeded max retries 2
2024-02-02 14:03:19.933 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9456, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:19.934 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9457, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=07, identifiers=[2]]]
2024-02-02 14:03:19.934 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Not polling zigbee:device:ZigBee_Bridge:282c02bfffe899c4:282C02BFFFE899C4_1_batteryalarm - channel is not linked
2024-02-02 14:03:19.935 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling zigbee:device:ZigBee_Bridge:282c02bfffe899c4:282C02BFFFE899C4_1_contact1
2024-02-02 14:03:19.936 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:03:19.938 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=0, state=WAITING, sendCnt=0, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=--, identifiers=[2]]]
2024-02-02 14:03:19.940 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=2, state=WAITING, sendCnt=0, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:24.664 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4726, state=FAILED, sendCnt=1, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:24.666 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:24.667 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4728, state=WAITING, sendCnt=1, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:24.668 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4730, state=WAITING, sendCnt=1, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:29.407 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9469, state=FAILED, sendCnt=2, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:29.409 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:29.410 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete exceeded max retries 2
2024-02-02 14:03:29.412 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9473, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:29.413 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9474, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [IAS Zone: 0000/0 -> A444/1, cluster=0500, TID=09, identifiers=[2]]]
2024-02-02 14:03:29.413 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Not polling zigbee:device:ZigBee_Bridge:282c02bfffe899c4:282C02BFFFE899C4_1_tamper - channel is not linked
2024-02-02 14:03:29.414 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling zigbee:device:ZigBee_Bridge:282c02bfffe899c4:282C02BFFFE899C4_1_batterylevel
2024-02-02 14:03:29.415 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:03:29.418 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=0, state=WAITING, sendCnt=0, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=--, identifiers=[33]]]
2024-02-02 14:03:29.420 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=2, state=WAITING, sendCnt=0, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:34.141 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4722, state=FAILED, sendCnt=1, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:34.142 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:34.144 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4725, state=WAITING, sendCnt=1, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:34.146 [DEBUG] [transaction.ZigBeeTransactionManager] - A444/1: Sending ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=4727, state=WAITING, sendCnt=1, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:38.868 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9450, state=FAILED, sendCnt=2, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:38.870 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:03:38.871 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete exceeded max retries 2
2024-02-02 14:03:38.872 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9454, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:38.874 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=282C02BFFFE899C4 queueTime=9455, state=CANCELLED, sendCnt=2, command=ReadAttributesCommand [Power Configuration: 0000/0 -> A444/1, cluster=0001, TID=0A, identifiers=[33]]]
2024-02-02 14:03:38.875 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 282C02BFFFE899C4: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:06:44.659 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker LastChance Timeout has been reached for thingUID=zigbee:device:ZigBee_Bridge:282c02bfffe899c4
2024-02-02 14:06:44.661 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling stopped
2024-02-02 14:06:44.662 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 282C02BFFFE899C4: Polling initialised at 1862388ms
2024-02-02 14:07:14.663 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker Timeout has been reached for thingUID=zigbee:device:ZigBee_Bridge:282c02bfffe899c4

The entire log file can be found here: https://drive.google.com/file/d/1-5T3YLUpdMC4MoOXI96yeW4LnjvhKf38/

So at this point I believe I have ruled out PI-4 hardware (and USB Ports), the Nortek HUSBZB-1 stick, the original SD card, going from 3.4.x to 4.1.1 Openhabian … all have pretty much the same results.

Clearly something changed. Could there be a bad zigbee device that is generating noise and killing other communication? The fact that these devices are not all in the same part of the house makes me doubt that. But it is a similar set of devices that are going offline (range of mfg’ers).

I should also add, I turned off debug log for zigbee because it was filling up the log files rapidly, and then later (~30 mins) I noticed all the devices had now come back ONLINE. But no logs for this.

Sure, not literally because of noise (zigbee is on 2.4GHz just like wifi) but by disturbing the mesh-topology. Almost all AC-powered zigbee devices in a mesh act as routers. This cannot be changed as it’s written in the firmware of each of those devices. If one router behaves badly (e.g. faulty zigbee implementation, hardware problem, switched off…) the connected devices (battery powered end devices or other routers) drop off. They should find other routers but depending on the quality of the zigbee implementation some never do.

I guess you are using the zigbee binding? What could be done for testing is to install zigbee2mqtt via openhabian (don’t want to start a discussion which is better…). Ember-sticks are not perfectly supported afaik but for testing it should be enough. The coordinator has to be disabled in openhab of course. Then the zigbee-map in the http-frontend can be checked: Which are your routers? Is one missing? If you find the villain you can easily remove zigbee2mqtt again without rebooting.

1 Like

Hi,
I tried to look at your log file but it was locked prompting for a user and password so I could not review it.
I think you next step is to bounce the Zigbee network it could be you have a device that is starting to fail transceiver is losing receive or transmit power due to a failing component internally. It may generate RF noise or just simply be flapping (coming up to power then dropping off.
This could trigger a healing by the coordinator, and it starts to reroute from that disappearing device. I included a few articles that explain mesh networks pretty good so you can better visualize what may be occurring. hope they help. Both are pretty good and informative reads.

1 Like

Thanks Larsen -

So, will I have both ember and zigbee2mqtt up? or disable the Ember while I am testing with the other?

I have more than a dozen AC powered zigbee devices. I am leaning towards this being the issue, currently removing a group of them at a time to see if the problem reoccurs. it’s not clear how often if happens so difficult to know how long to test, but its at least something to try. I did have 2 AC Powered zigbee devices that were behaving differently (not working consistently), so starting with those.

I’ll look at the zigbee2mqtt to see what can be learned as well. Thanks.

Thanks Justan - very helpful.

And sorry, I fixed the access rights on the log file.

There are two ways to run zigbee in openhab: Either directly via zigbee binding or zigbee2mqtt (which transfers to mqtt and then to openhab).
Ember is just the chipset. Most zigbee sticks have either ember (from silicon labs) or TI chipsets. You have to disable the stick in openhab if you want to try with zigbee2mqtt, e.g. by removing the binding (you can re-add it again later).

1 Like

Yes, mine is from Silicon Labs.

So the steps are:

  1. uninstall the Zigbee Binding,
  2. Install the zigbee2mqtt binding.
  3. See what I can learn.
  4. Uninstall zigbee2mqtt, reinstall Zigbee

I’ll give it a shot.

Now that I have a reliable way to bring up Zwave and Zigbee I can do many reboots as necessary to experiment :slight_smile: Really surprised I am the only one that sees that issue. I must be lucky :slight_smile:

zigbee2mqtt is not a binding. It’s installed via openhabian:

  • ssh -l openhabian your_IP
  • sudo openhabian-config
  • 20 optional components
  • 2A zigbee2mqtt
1 Like

This in itself is just one more reason to approach this as a device in the network that is failing and not a software version or OH related issue.

Yes, but to be clear, in the past, after I got Zwave to come up, both were very stable for many many months. I pretty certain this particular issue I am dealing with is a new issue.

these log snippets clearly indicate that you have a device in a failed state and 13 attempts to heal the routes it was a member of "occured

.2024-02-02 14:06:54.149 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: ManagementRoutingRequest returned CommandResult [TIMEOUT]
2024-02-02 14:06:54.149 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=18901, state=CANCELLED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=30, startIndex=0]]
2024-02-02 14:06:54.150 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: request ROUTES failed. Retry 11, wait 20007ms before retry.
2024-02-02 14:06:54.151 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:07:14.158 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: running ROUTES
2024-02-02 14:07:14.159 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=0, state=WAITING, sendCnt=0, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=--, startIndex=0]]
2024-02-02 14:07:14.159 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=0, state=WAITING, sendCnt=0, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=--, startIndex=0]]
2024-02-02 14:07:14.162 [DEBUG] [transaction.ZigBeeTransactionManager] - 2AD6/0: Sending ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=2, state=WAITING, sendCnt=0, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:18.882 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4723, state=FAILED, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:18.884 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:07:18.885 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4725, state=WAITING, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:18.885 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4725, state=WAITING, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:18.887 [DEBUG] [transaction.ZigBeeTransactionManager] - 2AD6/0: Sending ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4728, state=WAITING, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:23.606 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=9446, state=FAILED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:23.608 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:07:23.609 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete exceeded max retries 2
2024-02-02 14:07:23.610 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=9451, state=CANCELLED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:23.611 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: ManagementRoutingRequest returned CommandResult [TIMEOUT]
2024-02-02 14:07:23.611 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=9452, state=CANCELLED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=3B, startIndex=0]]
2024-02-02 14:07:23.612 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: request ROUTES failed. Retry 12, wait 24453ms before retry.
2024-02-02 14:07:23.613 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:07:48.066 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: running ROUTES
2024-02-02 14:07:48.068 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=0, state=WAITING, sendCnt=0, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=--, startIndex=0]]
2024-02-02 14:07:48.068 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=0, state=WAITING, sendCnt=0, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=--, startIndex=0]]
2024-02-02 14:07:48.070 [DEBUG] [transaction.ZigBeeTransactionManager] - 2AD6/0: Sending ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=2, state=WAITING, sendCnt=0, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:52.810 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4743, state=FAILED, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:52.812 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:07:52.814 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4746, state=WAITING, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:52.814 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: Added transaction to queue, len=1, transaction=ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4746, state=WAITING, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:52.816 [DEBUG] [transaction.ZigBeeTransactionManager] - 2AD6/0: Sending ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=4748, state=WAITING, sendCnt=1, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:57.535 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=9468, state=FAILED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:57.537 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete, state=FAILED, outstanding=0
2024-02-02 14:07:57.538 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete exceeded max retries 2
2024-02-02 14:07:57.539 [DEBUG] [zigbee.transaction.ZigBeeTransaction] - Transaction cancelled: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=9471, state=CANCELLED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:57.540 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: ManagementRoutingRequest returned CommandResult [TIMEOUT]
2024-02-02 14:07:57.540 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=00124B001CD5243E queueTime=9472, state=CANCELLED, sendCnt=2, command=ManagementRoutingRequest [0000/0 -> 2AD6/0, cluster=0032, TID=45, startIndex=0]]
2024-02-02 14:07:57.541 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: request ROUTES failed after 13 attempts.
2024-02-02 14:07:57.542 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - 00124B001CD5243E: transactionComplete but not outstanding, state=CANCELLED, outstanding=0
2024-02-02 14:07:57.542 [DEBUG] [iscovery.ZigBeeNodeServiceDiscoverer] - 00124B001CD5243E: Node SVC Discovery: complete
2024-02-02 14:07:57.543 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - 00124B001CD5243E: Updating node NWK=2AD6
2024-02-02 14:07:57.544 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - 00124B001CD5243E: Node update. NWK Address=2AD6
2024-02-02 14:07:57.545 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - 00124B001CD5243E: Node 2AD6 is not updated from ZigBeeNode [state=UNKNOWN, IEEE=00124B001CD5243E, NWK=2AD6, endpoints=[]]
2024-02-02 14:07:57.545 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - 00124B001CD5243E: Node 2AD6 is not updated from ZigBeeNode [state=UNKNOWN, IEEE=00124B001CD5243E, NWK=2AD6, endpoints=[]]```

Thanks … not sure if your reply was cut off … what indicates an attempt? and how many tries does it normally do before giving up?

this is a set of attempts end of on attempt and beginning of next try.
The self-healing will keep trying until the mesh network is healed and all devices are reporting or until the next scheduled mesh network update occurs then it “should” remove the failed device.

1 Like

Just an update after removing 5 AC powered zigbee devices the network was stable but overnight, and all within about 10 minutes, the similar set of zigbee end sensors (contacts and motion) went OFFLINE. So if it is an interference issue, it wasn’t that set of units. I am removing another set to test.

One question: if a devices goes OFFLINE, should I expect that device to return to ONLINE after some time? Seems like the answer has to be yes (its not that unusual for a device to go offline for a few minutes and then return).

My assumption is (maybe incorrect) that if I had a bad zigbee device, after removing it, I would expect the OFFLINE devices to return to ONLINE. But how long should I wait? (EDIT: other sites seem to indicate it could take 24 hours or when the device is woken up, if I activate the sensor, it is woken up but doesnt rejoin)

Classic answer: It depends. Zigbee is optimized for minimal battery consumption, so network traffic is intended to be minimal.
Zigbee implementations are quite different, many of the cheap ones (and some expensive ones too) do not fully adhere to the specification

  • Regular messages
    • My laziest battery-devices send messages at least every 5 hours. But most devices send more frequent
    • AC-powered devices send messages every few minutes
  • Recovery after a router is missing
    • some never find a new router, they have to be put in connection mode manually
    • it can last quite long for a device to find a new router

To sum it up: Reliable routers are very important for a stable zigbee network.

1 Like

Thanks. So to be clear, “router” here refers to an AC Powered zigbee device that also acts as a Router i…e properties shows: zigbee_logicaltype ROUTER

Yes, if this is related to a ROUTER device going bad then it would explain this behavior. As I mentioned, the Zigbee network has been very stable for a few years with minimal issues (but of course devices go bad) … So I restarted Openhab and it seems that if the failure happens it is within the first 2 hours. So we’ll see. (Edit: I removed another set of AC devices)

As long as all of those devices have a new route back and they stay on long enough to discover that new route (many battery-operated devices may take much longer to accomplish that). No, it is unusual for a device to go offline for a few minutes then return That is not normal. Can you elaborate more on what you see there. I have lots of ZigBee devices and once the system is up and stable, I do not ever see then go offline and I review my logs in a log reader constantly.

That is correct it can take up to 24 hours and 2 things come into play. First what is your mesh network update interval.
Second Based on what is your child aging timeout is set at.
You might also want to consider as you test things if the device that is having issues is coming online then after a period time going off line and that device is also router then every time that happens the mesh network will begin a heal process and every device that was using that router a hop will require a new route and also since battery devices come on line so short a duration that will take longer for them to get that update. Also, that assumes another device is close enough to hear the request and the failing device has already stayed offline long enough for the coordinator to reevaluate the RSSI and update what other devices should begin to accept as a new threshold. once for it being there second when it drops off and again when it comes back online (The flapping condition) as I mentioned previously. As you can see it is not as simple as some folks lead it to be to troubleshoot a mesh network lots of moving parts to consider.

correct (at least for the devices I know). You can look it up here:

Agreed. It is very rare, I should have emphasized that … just didn’t want to sound too absolute. I too watch my log regularly and I have a lot of code that monitors devices for OFFLINE and temperature reports etc that I use to determine when a device has gone dead (battery). I get Telegram messages on critical changes. Having said this, I do have 7 Keen vents that go OFFLINE then can be brought back ONLINE by disable/enable sequences. But these devices are really bad in a lot of ways. They are battery powered but I believe the battery connection disconnects when the unit gets hot (inside the duct). But thats a differnet story :slight_smile:

Mine is set to 1 Hour. Child aging is set to 24 hours.

All the devices that are going OFFLINE are battery powered end devices.

Regarding the Mesh … yes definitely a concern. I have removed 9 AC which I believe are all routing so reaching devices now is different than yesterday. But to be honest, most devices are within 200 feet of the coordinator.