Zigbee -- IKEA and other EFR32MG1P based products occasionally crashing

Tags: #<Tag:0x00007efec6917118>

There appears to be a SiLabs bug that affects most EFR32MG1P based products on the market where the device will occasionally / rarely crash. After the crash, the device will no longer respond to APS frames, but will MAC ACK anything. Group/broadcast frames, including OnOff (0x6) still work fine. The device will still broadcast link status frames, but will report zero links.

Joined end device children to the crashed device will not find a new parent. They seem to think the parent is still working fine and will never find a new parent. Crashed router will not forward all frames from end devices. For example, a aqara motion sensor will send an occupancy attribute report which is lost but the lux attribute report is successfully delivered.

This bug has been confirmed by manup, one of the deconz developers and also by some of the Home Assistant ZHA developers. It seems more prevalent in large (100+) networks but even then devices can go for a few days to a few months before crashing.

I’m posting this here to just help spread the word – I’ve tried to put a few messages forward to the IKEA Embedded System Engineering group to notify them of the issue. This bug may be fixed in Emberznet 6.7.3 as these symptoms match perfectly, but the triggering conditions do not as ECC/SmartEnergy is not being used:

I know Chris is an excellent EZSP resource (thanks for the amazing sniffer, I use it daily) and didn’t know if he had any other thoughts he could share.

Thank you!

Thanks for pointing this out. If I understand this correctly it will not be a problem for any users here since CBKE is not used, and I would expect that the ECC libraries are not compiled into any normal user code.

The ECC libraries are not provided by Silabs as standard - they require additional registration, and are only normally used for SmartMeter (SEP) systems. We use this with one customer who is an energy supplier in the USA, but no other systems that I know of that are being used with OH have the ECC libs compiled into their firmware.

Thanks @chris – It does actually affect all users of a normal HA 1.2 / ZB 3.0 network. I have confirmed cases on deconz, zigbee2mqtt, home assistant ZHA and hue bridges so far – none of which are using CBKE/ECC or SmartEnergy. I also have some packet captures using your tool showing the issue.

The bug I picked out is just the closest match so far in the EmberZNet release notes for the symptoms.

Things we have tried so far to “rescue” a crashed IKEA device:

  • ZDP leave request (with rejoin) unicast to the IEE
  • ZDP leave request (with rejoin) broadcast to all mains routers (0xFFFE) for the IEEE
  • NWK leave request (with rejoin)
  • Spoofing NWK link status with valid entry for the device to hopefully get the coordinator in it’s neighbor table
  • Artificially increasing frame counters

If interested the deconz issues chasing these down are here:


Ok, I was simply going on what I read in the image you highlighted above which distinctly talks about the ECC libraries which should not be incorporated in most devices as far as I know (certainly they aren’t included in any of the dongles).

I obviously can’t confirm this, but I would doubt that the Ikea bulbs include the ECC libraries - unless you know differently?

It sounds like the issue you’re reporting isn’t therefore related to the CBKE/ECC fix that you highlighted, which is a little confusing, and makes it a little hard to understand the issue :wink:

I’m happy to take a look - it might help my understand the issue at least :slight_smile:

Thanks.

Went back and I’ve trashed the cleanest examples of one falling off the network. I’ll try to recapture and post a PCAP file later.

The issue is definitely exacerbated when a large number of devices rejoin the network (for example if you power off a room full of bulbs, then power them back on)

Finally identified the bug - there is a low level stack bug in the SiLabs EFR32 used by IKEA devices that causes them to crash.

This bug is fixed in Emberznet 6.7.7 and higher from SiLabs – but a majority of IKEA devices are running earlier 6.0 to 6.5 releases that are all affected by this bug.

The bug occurs when a bulb is processed a ZDO parent announcement and other traffic arrives. Seems to be triggering more often on large networks.

Have some test code that does a ZDO parent flood and can now reproduce on demand.

Anyone know which is the latest version of EmberZNet FW for IKEA Trådfri Signal Repeater E1746?

That is, what version of EmberZNet is the latest official OTA update from IKEA for E1746 is based on?