Zigbee Re-Joining After Child Aging Timeout

chris · March 3, 2019, 5:23pm

Currently, your device does not rejoin. This could be because the network key is lost, or it could be another reason. I would suggest to sniff the network to see what is happening.

This isn’t a binding issue though as the binding has no control over this, so I think it will be a config issue in your device.

One question - do you persist the network or do you reinitialise it after the power cycle?

flashbac · March 21, 2019, 10:27pm

I just checked my Xbee and the network PAN ID is lost when I reboot the device.

I have configured the XBee as per this link from Digi: Zigbee Home Automation

flashbac · March 21, 2019, 10:30pm

Between 14:20 and 14:23 I was testing the delay between clicking on the on/off switch in the Control page of PaperUI and when OH sent the TX CMD to the zigbee device.

I’m still noticing this delay. I can reproduce it every time. This does not happen with OH 2.4.

flashbac · March 29, 2019, 3:51am

Quick update, the issue happens in Control panel in PaperUI and it also happens in Habpanel.

I dont’ know what else to do… Anybody?

chris · March 29, 2019, 10:53am

Unless anything has changed from what you provided, then the same statement as I made above is still relevant. The issue is not with the binding and is with the device trying to join.

If you are unclear, then you should get a sniffer log - it’s the only way to see what is happening at the 802.15.4 layer.

I don’t really understand this? It’s not related at all to UI - or maybe I’m not understanding the problem you are now reporting?

flashbac · March 29, 2019, 1:07pm

Hi Chris,

Sorry for the confusion. Towards the beginning of the thread, you asked me to upgrade the library 1.1.11 to see if the rejoin issue persisted. After doing so, I noticed a few issues. For two of these issues you asked me to provide logs for:

So for the rejoin issue we left it at that I need to provide you sniffer logs, which I’m still working on.

The delay issue is what remains and it’s the one I was referring in the last couple of posts. This issue happens when the device has happily joined the ZigBee network.
At your request I have provided logs from different occasions as I can reproduce it every time. So I’m not sure if it’s a UI issue, a library or a binding issue, etc… I have done some more testing to try to narrow down the root cause.

Thanks.

flashbac · March 29, 2019, 3:08pm

Another update, the delay issue only happens with zigbee library 1.1.11-SNAPSHOT. I tried 1.1.6 and 1.1.10 (both on OH2.5.0.M1) and they both work fine without any delays.

chris · March 29, 2019, 4:53pm

Do you know how old the snapshot is you are using? If it’s not the latest, please use the latest and provide a new log. It’s possible that this was fixed already as it was reported a month ago and I did fix an issue like this since.

flashbac · April 22, 2019, 1:53pm

Hi Chris,

So last week I reinstalled the latest 1.1.11 snapshot but I just got around to test it today. There is a big improvement but there is an initial delay of about 10 secs but after that, OH is responding in real-time.

Logs attached. Initial on/off command triggered at 2019-04-22 09:42:XXevents - Copy.log (17.8 KB)
openhab - Copy.log (252.7 KB)

I’m going to retest after 5 or 10 or 30 mins to see after what elapsed time the 10 sec delay occurs.

flashbac · April 22, 2019, 2:12pm

actually, i just retested and the 10sec delay occurs after 10mins of inactivity. Logs attached, same as before but with the additional testing… issue occurs at 2019-04-22 09:42:07 and 2019-04-22 09:55:59.events - Copy (2).log (35.9 KB)
openhab - Copy (2).log (409.5 KB)

chris · April 22, 2019, 6:52pm

It looks like there might be a bug with the retrying of transactions, but it looks like there’s a problem with the device in that it is not responding to attribute reads for the level control cluster. These always time out, and this might then be triggering a bug, but you should also look at why the device doesn’t respond.

Here we see the ON_OFF request gets a response, and the LEVEL CONTROL is ACK’d by the device, so it received it, but it never sends the attribute update.

If you do a search for LEVEL_CONTROL, you will find that this happens at the times you list above, and is the trigger for the problem. Even if there is a bug in the transaction handling (and there might be - I need to look at this further) there will still be delays if the device is not responding to messages like this.

flashbac · April 23, 2019, 8:21am

Hi Chris,

The device is not responding because the Level Control EP is disabled in software, and I haven’t implemented a default response for when the destination EP is not found. But I can easily do that.

Another thing I noticed is that the when coordinator sends a ReadAttributesCommand or the On/OffCommand, it says that the Source EP is 0 but when it sends the frame, it sends the Source EP as 1.

From the logs:

2019-04-23 02:28:24.554 [DEBUG] [transaction.ZigBeeTransactionManager] - 28270/2: Sending ZigBeeTransaction [queueTime=11, state=WAITING, sendCnt=0, command=ReadAttributesCommand [On/Off: 0/0 → 28270/2, cluster=0006, TID=51, identifiers=[0]]]

2019-04-23 02:28:24.557 [DEBUG] [transaction.ZigBeeTransactionManager] - addTransactionListener: ZigBeeTransaction [queueTime=13, state=WAITING, sendCnt=0, command=ReadAttributesCommand [On/Off: 0/0 → 28270/2, cluster=0006, TID=51, identifiers=[0]]]

2019-04-23 02:28:24.562 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX CMD: ReadAttributesCommand [On/Off: 0/0 → 28270/2, cluster=0006, TID=51, identifiers=[0]]

2019-04-23 02:28:24.569 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX APS: ZigBeeApsFrame [sourceAddress=0/1, destinationAddress=28270/2, profile=0104, cluster=0006, addressMode=DEVICE, radius=31, apsSecurity=false, apsCounter=9E, payload=00 51 00 00 00]

2019-04-23 04:16:03.578 [DEBUG] [transaction.ZigBeeTransactionManager] - 28270/2: Sending ZigBeeTransaction [queueTime=5, state=WAITING, sendCnt=0, command=OffCommand [On/Off: 0/0 → 28270/2, cluster=0006, TID=62]]

2019-04-23 04:16:03.581 [DEBUG] [transaction.ZigBeeTransactionManager] - addTransactionListener: ZigBeeTransaction [queueTime=8, state=WAITING, sendCnt=0, command=OffCommand [On/Off: 0/0 → 28270/2, cluster=0006, TID=62]]

2019-04-23 04:16:03.587 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX CMD: OffCommand [On/Off: 0/0 → 28270/2, cluster=0006, TID=62]

2019-04-23 04:16:03.594 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - TX APS: ZigBeeApsFrame [sourceAddress=0/1, destinationAddress=28270/2, profile=0104, cluster=0006, addressMode=DEVICE, radius=31, apsSecurity=false, apsCounter=AF, payload=01 62 00]

chris · April 23, 2019, 10:05am

Then I would recommend not to support the cluster, as fundamentally you are non-compliant to ZigBee.

I think this is just a logging issue and it’s an issue with where in the chain different information is added.

An endpoint of 0 is invalid anyway.

flashbac · April 23, 2019, 2:25pm

ok, I’ll implement that shortly.

I’m glad that it’s a cosmetic/logging thing since the device is replying back to 0/1 instead of 0/0.

chris · April 23, 2019, 4:45pm

Endpoint 0 is ZDP - not ZCL, so it would interpret the cluster IDs completely differently if it were sending to endpoint 0.

flashbac · April 24, 2019, 12:00pm

Agreed. Let me know when those bugs are fixed and I’ll retest.

chris · April 24, 2019, 12:08pm

What bugs? What we’ve discussed above is the device should either not report that it supports the LEVEL_CONTROL cluster, or, the device should support the LEVEL_CONTROL cluster. These need to be implemented in the device.

flashbac · April 24, 2019, 1:58pm

I was talking about this:

and this:

Although, having the device reply did get rid of the delay.

chris · April 24, 2019, 1:59pm

Ok, but it’s not a bug is it? I’m not sure there will be any change.

Again - I’ve not confirmed that there is any bug. If there is, I will fix it, but at the moment the issue I saw is that the device is non-compliant to the ZigBee standard, and the lack of response was the reason for your delays.

flashbac · April 24, 2019, 2:09pm

About the logging issue, my thought is that If the log says it’s about to send a command where the source EP is 0/0 but then it sends a frame with source EP 0/1, then it’s conflicting information. Shouldn’t the logs be consistent with what the software is doing? Also this can be confusing when troubleshooting/debugging, and like you said EP 0/0 is an invalid EP anyways.

About the other issue, if you do determine that it should be addressed, let me know so that I can help test it. Just offering my help