Hi @chris This issue fell off my radar screen for a while because I got busy with some other things. I updated my ticket with another example of the behavior. It would be great if you can take a look.
To summarize, I have several battery-powered nodes that get into a weird state after the nightly heal. Node 109 is the specific example in the log file linked to below. The DeleteSUCReturnRoute fails and never seems to recover. The device also is never polled after that.
The log file is posted here. Node 109 is the one to look at. It seems to behave normally until the heal, then gets into a weird state. The log file starts at binding startup.
Edit: I should add that I’m not using the latest binding. I don’t think anything has changed in this area of the binding, but I can upgrade to the latest if you want. I’ve been hesitant to do this because it’s my production system…
We did fly straight over DC and NYC on the way down on Sunday, but I didn’t spot you
What is a “weird state”? It looks like the device is reporting correctly (?) but there are some issues with the binding I think.
I see lots of reporting - probably normally - in the morning (0635 and after), but the binding is still trying to heal the device, so there is no polling.
I don’t see anything that is otherwise “weird” though?
I do see something strange with the transactions in the heal (see the PR below), and maybe the heal should time out quicker on battery devices than it does, but I’d like to understand if there is anything other than this that you see that’s wrong?
However, starting at 03:24:07.870, when the first failure occurs, the device is never polled again. Whenever the device wakes up, the binding immediately sends a WAKE_UP_NO_MORE_INFORMATION, and the node goes back to sleep. Then after the binding sends the WAKE_UP_NO_MORE_INFORMATION, the binding sends the AssignSucReturnRoute, which fails because the device is asleep. It does this over and over, and never recovers.
Polling is disabled during initialisation and heal to try and reduce traffic to allow the heal to word without interference. This is why I said that maybe we could timeout or cancel the heal quicker in such cases.
Or you can try an convince me it’s a good idea to poll during a heal - I can see some benefit, but it is likely to cause more delays to the heal.