Zigbee binding stopped working since 2.5 M4

More FWIW on the CC2531 (in partiucular)

The “default” zigbee firmware on zigbee2mqtt pages I mentioned above is 1.2 HA

I’ve just moved mine to the 3.0.x firmware - it seems to be binding much more easily (firmware available from the same site. I grabbed and used CC2531_20190425.zip, which contains CC2531ZNP-with-SBL.hex)

It also appears to be more stable

HOWEVER: There are some caveats

  • 1: by default Zigbee 3.0 only allows binding for 180 seconds after startup
  • 2: The discovery timeout is still too short at times (busy mesh and/or interference)
  • 3: Initializing aqara devices is problematic - the pressure sensors in particular
  • 4: mesh devices don’t seem to reliably rejoining automatically after a restart

I’ll address these in order. Some of this may be down to impatience on my part but not having a fully “OK” mesh an hour after restarting OH doesn’t seem right.

  • 1: enforced zigbee timeouts

    • Adding a device outside the 3 minute startup period doesn’t seem to “simply” require going into paperui inbox and scanning as usual.

    • I’m having to use openhab-cli and issue the following commands if I want to add devices without starting zigbee, restarting the stick (or restarting OH)

         openhab> zigbee join 180 
         openhab> smarthome:discovery zigbee
      

      and then click in the GUI

    • (NB: 180 is the default. It can’t be set higher than 199 - this is a zigbee3 security setting)

    • If the device isn’t added, then you need to start over with those commands until it happens (this is a PITA with Aqaras)

  • 2: Until a device is fully discovered, about the only way to keep things going (especially Aqaras) is to keep firing the discovery command in openhab-cli whilst clicking on their button, until they show back up in HabMin or paperui as “discovered” (if already configured) or in the inbox with their fully identified device name.

    • NB: “zigbee nodes” will show the device as discovered, online and OK,long before OH interfaces still say it’s working properly.

    • The time taken to do a discovery is definitely related to distance from the stick (signal strength) and wifi interference. One device latched onto a Tradfri repeater in another room and took more than 20 minutes to become active. That’s a lot of button presses.

  • 3: Aqara sensors usually show up as “unitialized” in paperui or “zigbee_device_initialised false” in Habmin

    • I’m using paperui to initialise but regularly see this:

      [ERROR] [r.ZigBeeConverterAtmosphericPressure] - 00158D0005404958: Error 0xffff setting server binding
      [INFO ] [ng.zigbee.handler.ZigBeeThingHandler] - 00158D0005404958: Channel zigbee:device:562cd04e:00158d0005404958:00158D0005404958_1_pressure failed to initialise device
      

      it’s always the pressure sensor that fails to initialise. The odd part is that even with the failure, I’m seeing pressure changes.

      [vent.ItemStateChangedEvent] -  zigbee_device_562cd04e_00158d0005404958_00158D0005404958_1_temperature changed from 29.73 °C to 29.83 °C
      [vent.ItemStateChangedEvent] - zigbee_device_562cd04e_00158d0005404958_00158D0005404958_1_humidity changed from 42.83 to 54.94
      [vent.ItemStateChangedEvent] - zigbee_device_562cd04e_00158d0005404958_00158D0005404958_1_pressure changed from 1014.8 hPa to 1014.9 hPa
      
    • Library issue perhaps?

  • 4 - this is problematic:

    Network  Addr  IEEE Address      Logical Type  State      EP   Profile                    Device Type                Manufacturer     Model          
          0  0000  FFFFFFFFFFFFFFFF  COORDINATOR   UNKNOWN  
          0  0000  FFFFFFFFFFFFFFFF  COORDINATOR   UNKNOWN  
       1449  05A9  00158D0004A06BB6  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
       2901  0B55  00158D0005404958  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
       4152  1038  00158D00052DBD59  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
       7898  1EDA  00158D00052DBE17  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
       8368  20B0  680AE2FFFE4D8651  ROUTER        UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      15793  3DB1  680AE2FFFEF8F56E  ROUTER        UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      16769  4181  680AE2FFFEBE4AB6  ROUTER        ONLINE      1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      27063  69B7  00158D00047EC063  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      29420  72EC  00158D00047D0AC9  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      29711  740F  00158D00045C1557  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      31712  7BE0  00158D00045C9424  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      32869  8065  00158D00049FD8E1  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      33143  8177  00158D00054043BF  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      35502  8AAE  00158D0004521FD5  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      39619  9AC3  00158D000486718D  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      42070  A456  00158D000485A106  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      46200  B478  00158D0004A06BC8  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      46440  B568  00158D00052DBD98  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      46537  B5C9  680AE2FFFE9ADECC  ROUTER        UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      47266  B8A2  00158D0004832B6B  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     DOOR_LOCK                  LUMI             lumi.vibration.aq1
                                                               2  ZIGBEE_HOME_AUTOMATION     5F02                                                       
      50550  C576  00158D000321890B  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      50853  C6A5  00158D00053174F4  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      54326  D436  00158D00047EBF73  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
    
    
    
      58104  E2F8  00158D0005406815  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      63104  F680  00158D00045C0A72  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
    
    • Before restarting OH, ALL of these devices were online.

    • “zigbee node N” or “zigbee info N/1” shows they’re actually there, but the only way that they can be convinced to show up as “state online” in the CLI is to power cycle the tradfri lights

      • shut down for 10+ seconds
      • they come back into the mesh instantly
      • more importantly - until you do this, they won’t act as routers
    • the only way I’ve found that works for the Aqara devices is to re-pair them, which is painful. Just bouncing on the button to force a poll isn’t enough - and popping the battery out doesn’t work either

    • Even despite this, all these devices are showing as connected in paperui/habmin

There’s one other problem - which I think needs a separate ticket - the Aqara vibration sensors have 2 endpoints in them, but paperui/habmin is only showing the “door lock” one (whatever that is). Endpoint 2 - (device type 5F02) doesn’t show up. Sevice 47266 in the table above

The binding defines this - from memory the binding uses 60 seconds.

The maximum is 254 seconds.

This should not be required - it won’t really do anything with the binding or the device. Pushing the button on the device is what is required to keep things moving. Continuing to request join through the coordinator will potentially slow things down.

Don’t worry about the status here - this has nothing to do with openHAB status.

The binding will show all endpoints that provide clusters it supports. Presumably if there’s a second endpoint it doesn’t provide any features known by the binding. Alternatively it was just not discovered properly.

The binding defines this - from memory the binding uses 60 seconds.

OK, I was working from other docs

The maximum is 254 seconds.

CC3351 3.0 docs point to a 199 limit for this device. I did think it was strange with most limits being defined by binary transitions. Presumably someone in TI just arbitrarily chopping it…

about the only way to keep things going (especially Aqaras) is to keep firing the discovery command in openhab-cli

This should not be required - it won’t really do anything with the binding or the device

Should, but I was finding that just bouncing on the button wasn’t enough

Before restarting OH, ALL of these devices were online.

Don’t worry about the status here - this has nothing to do with openHAB status.

Confusing that it’s in the CLI then. Hopefully anyone in future will see this thread and be less worried

the Aqara vibration sensors have 2 endpoints in them

The binding will show all endpoints that provide clusters it supports.

Let’s setup a separate ticket for this. Right now the endpoint that is being detected shows up as an electronic lock , which isn’t right for a vibration/rotation and temperature sensor :slight_smile:

Another FWIW: You’ve said that battery powered ZB devices are supposed to wake up every ~6-8 seconds. We already know that Xiaomi/Aqara devices are out of spec for this, but it looks like they’re out by a factor of 10 - waking up about once every 80 seconds when nothing’s happening if my endpoint query responses are anything to go by

The Aqara devices also report battery voltage and signal quality as part of their reporting (zigbee2mqtt already implements this), It’d be nice to be able to extract this to know when batteries are getting low and keep an eye on network quality etc

I would strongly recommend not doing this. It will cause additional traffic that is likely to delay sending of data needed for the discovery. I think this is one of those things where you do something, and it happens to work, so you inadvertently attribute it to that event.

The CLI is not written for openHAB. It is a general CLI used for testing the low level ZigBee framework - it is just imported into OH for the same purpose. The binding uses different means to decide if a device is online.

If it’s confusing we can remove the CLI?

I can only reitterate what I’ve said. The binding will systematically search through all endpoints looking for services that it can support through a channel. Either the discovery was incomplete, or the device doesn’t support any functions that the binding can use.

That’s not uncommon. There are many different systems employed by battery devices and some are not necessarily respecting the rules…

I believe that battery is reported in a custom cluster so is not supported by the binding - you’d have to ask Xiaomi why they didn’t use standard functions here.

Likewise, the signal integrity is monitored by the binding using standard functions - and not the Xiaomi specific function.

The binding uses different means to decide if a device is online.
If it’s confusing we can remove the CLI?

I think that will cause even more problems that it solves.

A note in the general help that the online/unknown state in the CLI may differ to the GUI should be sufficient

(Today 3 of the Ikea bulbs were showing online in the CLI and offline in the GUI. Sigh… They came back by toggling the devices disabled/enabled in paperui so they can’t have been too far gone but it’s a little worrying - and just to add to the confusion they were showing updates in the logfile despite the GUI thinking they were offline)

[re battery deviecs and long wakeup intervals]

That’s not uncommon.

Is there any way to accomodate this? Ideally whilst flagging that the device implementation is non-compliant but it’s being accepted anyway?

I think the Robustness principle/Postel’s law applies here (RFC 1122)
[yes, I know the counterpoints to this, having contributed to the Internet drafts. Martin Thomson’s point is valid. Flagging noncompliance both helps ensure people don’t make assumptions and puts pressure on suppliers to fix things]

I believe that battery is reported in a custom cluster so is not supported by the binding

I’m not entirely sure what you mean by a custom cluster

The way I read the coverage is that it’s reported as part of the return line for each of the sensors. zigbee2mqtt shows battery status/battery mV/signalquality/sensor for each sensor at every sensor update and it appears to be embedded in there when looking at the raw return string.

Let’s break this part out into a new ticket

Please feel free to propose a PR.

Sorry - I don’t actually know what the issue is? Does it matter if it doesn’t wake up every 7 seconds- normally not. It will simply mean you need to manually wake up the device if you want to change the configuration - on these devices, that is normally approximately never, so it should not impact the devices support.

Or do I misunderstand what you mean here?

It’s non-standard - ie not defined by the ZigBee standard. There is a standard way to report battery level, and this device does something non-standard.

This has been discussed previously - unfortunately at the moment it is not something that I personally intend to support as I simply have too much else on my plate so I try to focus on supporting standard ZigBee devices at the moment. If someone else wants to add this support, and it doesn’t compromise the structure of the binding, then that is of course great :slight_smile:

Noted regarding busy-ness but if there’s a ticket then someone might look at it. :slight_smile:

I assume for the purposes of sorting the things out on a longer term basis that an official xiaomi specification would be the best thing you (or anyone else) could lay your hands on?

It was looked at in the past and discussed here -:

The problem is that this is really non-standard. The binding is designed to handle non-standard attributes, but unfortunately in this case they’ve not even used standard ZigBee concepts, so it would require probably a custom handler which could be done through a new completely binding extension.

The issue is primarily the door/window switches - they’re “dropping out” and being reported as offline after a while (3-6 hours), which makes my noddy scripts using their status go loopy (I’m intending to use these to control radiator valves - open window == no point in attempting to heat the room)

The things rejoin instantly when the status does change from ON to OFF or vice versa. The problem is that in the meantime the magnet sensor is reporting neither ON or OFF

Unlike cheaper magnet sensors which simply report a “state change” when a door/window is opened (nothing when the door/window is closed), these units actually give “on/off” results, which means keeping them online or caching their results is relatively important as a result.

The thermometers are less critical if they drop out - they do rejoin instantly when things change. The big problem I’m encountering at the moment is initializing them - until “device initialised” is true, they simply display NaN in the PaperUI control panel and they’re consistently failing on pressure sensor initialization (they may eventually synchronise but it’s taking anywhere between 5 and 40 minutes of attempts to make this happen and the more sensors there are the harder it seems to becomes (I’m up to 12 of these units in the mesh and likely to want to add another 8)

Understood - but having the document is a reasonable step to understanding how badly they’ve borked things without having to rely on guesswork

My experience dealing with chinese designers is that once you get past the “Why on earth are you crazy enough to want to do that?” barrier they can be extremely helpful (especially when they realise they can usually ship more product by publishing their specs without NDA encumberances, or by moving to “standardised” specs in firmware updates), but they tend to start with a mindset of not thinking about interoperabliity. The number of reinvented wheels you run into is staggering even at companies like Huawei in their network kit (Lots of custom SNMP MIBs here instead of using standard ones, then when customers complain loudly about not being able to use standard NMS software the response is to “use our software”)

Sure. I referred you to the discussion above if you want to understand it?

Just a reminder that a Thing OFFLINE/ONLINE status is not an accurate mirror of target device status.
A sleeping battery device may be ONLINE, as might one smashed with a hammer.
An OFFLINE Thing can pass along communications (else how would we ever find out a device is alive?)

I spoke too soon.

2 restarts later and the json for the cc22531 in org.eclipse.smarthome.core.thing.Thing.json is now reset to what appears to be factory defaults - and won’t take changes (alterations take in the gui, but they don’t survive a restart)

I’ve lost control of PANID, Extended PANID, channel and mesh update period

Surprisingly the network key is unchanged - but of course with all the above having changed I’ve lost the entire mesh. A re-paired device won’t stay paired after a OH restart (but will survive the stick being pulled out/reinserted, etc whilst OH is running - this is pointing towards a OH issue rather than a device issue.)

Altering the json whilst openhab is stopped doesn’t take either

I’ve never set any manual files for things in /etc/openhab2/ so I’m not sure what’s going on.

Yes but losing the existing sensor status when this happens is breaking other stuff.

Door/window sensors are the kind of items where data may not change for days on end.

Whilst the Things in question reappear instantly when a door/window is opened or closed, the problem is that the presumed current status of that object can’t be read

Why is this a problem though? We normally do not read the state - these device must instead report their state autonomously.

We don’t tend to poll devices - they are configured to report. You don’t want to be polling a device every 10 seconds, or the battery will go flat. Additionally, you don’t want to be polling a door sensor every 10 seconds anyway as the door can be opened and closed in that time, so you’d miss it.

Devices need to report their state through reporting - instantly.

We’d expect an Item state to remain valid in between “stuff happening” e.g. if a battery device periodically reports “closed”, our Item retains “closed” between the reports.

As I understand @Stoatwblr problem, after lots of sidetrack about USB communication and Thing status, they’re now saying Items are not behaving like that? (Though I’ve no idea what they are doing instead.)

You’re right - we have been kind of drifting between issues and maybe it’s worth taking a step back. @Stoatwblr can you describe the issue you’re having? It’s probably best to start a new thread as well if it’s no longer related to this original issue.

Because it’s not being stored/cached to be fed to other scripts. When the device goes “offline” the presumed status is lost

This has nothing to do with polling the device and is about how the code keeps results.

Ok, so you’re talking about within openhab - not the binding? So you mean when a device is OFFLINE, you can no longer see the status?

I guess then we need to work out why your devices are going OFFLINE. This discussion started by you asking about the polling of the devices every 6 to 8 seconds (or the devices waking up every 6 to 8 seconds, which to me is the same). That has nothing to do with the online status of a device - we need to work out why your devices are going offline then.

I’d suggest to check the logs to see what is going on. There will be logging in the binding logs about the alive tracker. Let’s see what that says.

Apologies for the confusion - we are changing topics a lot here. I’d suggest to start a new thread, clearly stating what the issue is.

Where o where? You haven’t mentioned Item at all. Item.state is where we expect any presumed status to be held, regardless of any communication churning about. What state do you get instead?

To recap; Things are about managing comms with devices, and know nothing about meaningful data e.g. temperature readings.
That’s the business of channels, interpreting data.
But channels are stateless, passing messages only.
Items hold data with duration over time, between messages.