just a short thing that I figured you some minutes ago for Bubs connected to the Mains.
If I disable the Thing (the Bulb), enable it again it remains in “Unknown”.
Now If I turn on the Mains => It immediately comes online from “Unknown”.
It means in the device - there is no table in the coordinator.
That is not related to the binding table - it is related to the size of routing tables. If you have a larger network, then you csn try increasing the size, however most firmware has not been compiled correctly and doesn’t increase the memory, so this can cause problems.
How large is your network?
Hue bulbs definitely do send reports.
Yes. Before setting a device to offline the binding also tries to poll, so if it has not sent a report, it still should answer a poll to stay online.
Apologies it has taken me a while to look at these logs - I’ve been quite busy over the past 10 days or so.
I’ve just had a quick look, and all the OFFLINE devices in the log occur right at the beginning during the initialisation. It seems that they are not communicating - at least commands sent to them do not receive a response as far as I can see.
One interesting thing I noted is one of the devices (a battery device, but I don’t know what it is - 0017880104B777D4) is continually changing parent - so it leaves and rejoins the network. When it leaves, it gets marked offline by the framework, and then it normally rejoins with a new parent. This might just be that the device is “half way” (in an RF sense) between multiple devices, and as the RF environment changes, it swaps parent. This happens over 100 times in a 1 hour log, so something doesn’t really seem right here but I can’t really say what. It does seem strange as the device uses quite a few different parents so maybe there is something else going on, or maybe the device has a problem. Really a device should stick with a single parent unless the parent is not contactable - this device seems to change every 30 or so seconds, so I think it might have a problem.
This device failed to send the reports, AND it failed to respond to the “last chance” poll -:
2021-12-19 12:55:41.615 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker LastChance Timeout has been reached for thingUID=zigbee:device:62ec522f14:00178801042f8978
2021-12-19 12:55:41.617 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 00178801042F8978: Polling stopped
2021-12-19 12:55:41.618 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 00178801042F8978: Polling initialised at 941807ms
2021-12-19 12:55:58.952 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker LastChance Timeout has been reached for thingUID=zigbee:device:62ec522f14:001788010204b546
2021-12-19 12:55:58.954 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 001788010204B546: Polling stopped
2021-12-19 12:55:58.955 [DEBUG] [ng.zigbee.handler.ZigBeeThingHandler] - 001788010204B546: Polling initialised at 943865ms
2021-12-19 12:56:11.620 [DEBUG] [.zigbee.handler.ZigBeeIsAliveTracker] - IsAlive Tracker Timeout has been reached for thingUID=zigbee:device:62ec522f14:00178801042f8978
I don’t see the EZSP TX frames in your log so I can’t verify 100% that the poll was sent, but the “last chance poll” is sent by the binding when a binding has failed to receive a report from the device within the expected time - it then tries to poll the device, and if it still doesn’t receive a response it marks the device as offline.
I do see this with a few devices - it looks like the binding is working correctly here, but devices are not responding for some reason.
Sorry - this isn’t particularly helpful I know, but it does seem that devices aren’t responding on the network, so the binding is correct (IMHO) in marking them as offline. A sniffer log might provide more information about what is happening down in the network layer (the binding doesn’t get to see this as it’s managed by the NCP) - or the sniffer log might just confirm that devices aren’t talking at the network layer either (which seems likely based on the information here)…
My guess is that maybe there’s a rogue device in your system - possibly the battery device that is changing parents every 30 seconds or so, but this could be a consequence of something else.
It would be interesting to see the full output from the ncpscan command to see if you have an RF issue - possibly running this a few times to try and get the energy values (unfortunately I don’t think it’s possible to just request the energy scan).
But on the other hand the Motion sensor is not too far (2 walls on the same level) away from the coordinator and really close to a Zigbee panel that is always connected to the MAINS.
In theory it should work when the system is running - there are two scans - the active scan (which may give errors when the system is running - and this is what you see) and the energy scan, which doesn’t seem to give an error, but also doesn’t print data.
I recall you provided a debug log in the past for this, but can I ask for another one please so I can take a second look at this second part.
No - the problem is much earlier than this - it’s to do with the way the handler associates the responses to this command as it receives multiple responses to a single command. I need to look into this more as it’s normally only used before the network is online and I suspect it’s getting another message being received when it’s waiting for the scan responses.
I’ve found the cause of the scan problem. It is a timeout in the scan commands - it’s something I’d raised an issue on a while ago but then forget about
This will require an update of the libraries in the binding - I’ll look to do this in the next week or so.
I guess this is an issue that happens quite often with the new learning of the Button.
If i recapitulate the only thing that makes the device work again (It did not worked during the Parent change was to reset it.
This thread is not really relevant - as far as I understand you are using the Ember chipset (or am I wrong)? This thread is about the CC2531 and it’s not really possible to use source routing as the NCP does all the routing.
Thanks for the update here.
Yes I’m using an Ember chipset EFR32MG21.
Okay I’m still in the learning phase and not yet aware about what does what when.
But the Symptoms are exactly what the same here. The device disconnects and creates trouble in the Network.
The most weired thing here is that i have 3 SML001 Hue Sensors.
One of them required to be learned multiple times (0017880104B75CAC), that is showing the name also twice below but is working now fine since round about 14 days.
The Second one is the one where you’ve seen the log already. (Disconnects daily)
And the 3rd one is also disconnecting nearly daily.
For this one I’ve took a closer look in the file file below.
The whole thing worked till 20:18 and then the sensor did no longer report data and stopped at “ON”.
The next transaction was then 10 minutes later at 20:28.
I guess one defect device could maybe be an hardware issue but two is very unlikely.
But for sure I would be also willing to exchange the Sensors.
Is there a better compatible Sensor known?
And one more thing
I found this statement. Is it really true that Osram devices can prevent re join of Hue devices?
Did you have some Osram plugs in your network? In #1474 it’s described that Osram plugs prevent a re-join of some devices (maily Hue motion sensors). By myself I got two Osram plugs in the network and Hue motion and Hue dimmer are randomly disconnected. After I removed the plugs from the network it seems to be stable.
It looks like my devices have the latest firmware.
I use the “Plug 01” ones as engine heater switches. I have not found other outdoor wallplugs, so they are a bit difficult to replace.
But, it looks like I have to. I have Hue outdoor sensors i planned to mount next to them…
Related to the Hue sensor problem or not. It is not clear to me how this bug are problematic for the other devices in the mesh.
Edit to add:
It serms like the Osrams do not have a good radio.
My bulb is only a couple of meters from a bunch of Hues and Ikeas.
As for the "plug 01"s, they are at the edge of the mesh, but LQI values are lower than expected.
I want to give you a short update on this topic (Maybe also for other guys that read this article)
I’ve replaced all of my Osram Smart Plug+ with a TS011F Plug (Neo NAS-WR01B).
Since then I never needed to repair the Hue Motion Sensors anymore!
@chris: Thanks again for the great analytics that figured out there is an issue with the ZigBee motion sensor
@NilsOF Thanks a lot sharing your experience with the Smart+ Plug.
Just for the record:
These Hue motion sensors seems to have problem taking some Ikea bulbs as ‘parents’ too.
It is obvious when sniffing; the motion sensors will try several times before they give up and look for another parent. This can not be good for battery life…