Zigbee binding stopped working since 2.5 M4

Perhaps not to you. Post unfiltered debug logs here for Chris to look at.

I have had the exact same experience for, well a coupple of years now, for my Zigbee devices.
Only way to get it back online, is to remove it and add it again, (you probably need to reset it, scan and then add it again). And its not a matter of which devices is in use… It happens to all devices I´ve got, battery powered or main powered. Some devices last longer than others. But eventually they will all go offline. It can take from just a few days or a coupple of months. The time can be lowered by restarting OH. This would normally result in some of the devices never return back online.

This is my main reason for given up on Zigbee. I wish there were some better ways to troubleshoot, in order to get it to run stable, but its has not been possible in my situation. :frowning:

Would it be better if using a xiaomi gateway instead of the zigbee stick? Gateway should be then used with the Xiaomi Binding as bridge for the sensors.

I’ve just bought the USB and one aqara door sensor to test functionality/reliability, but very disappointed because of this issue with going offline.

If you only plan to use Xiaomi devices, then it is probably better to use a Xiaomi hub.

I’m afraid that attitude is not helpful.

The CC2531 is widely available - perhaps the most widely available USB zigbee device, thanks to banggood and aliexpress - and at $5 or less, it’s effectively ubiquitous. People don’t know about the problems until after they acquire the things.

If you don’t have one or more CC2531s and it would be helpful to have 'em, then please contact me and I’ll order a couple for for you. They really are dirt cheap

NB: RF interference

One of the issues I ran into in getting things running in the first instance was the problem with overlap of zigbee and Wifi channel numbers (they’re not the same, although they occupy the same space) and a diagram showing this would help explain why channels 11/15/20/25 are the best ones to try

Zigbee being wiped out by Wifi is becoming more and more of a problem. There are 30 different 2GHz APs in earshot of my hub. People need to be aware they’ll need to use a spectrum analyser (“wifi analyser” on android works reasonably well) to find the least crowded spot to put their zigbee stuff (I’d suggest channel 15 == wifi channel 3, or channel 20 == wifi channel 9 - See metageek - ZigBee and WiFi Coexistence for the reason why)

Sorry - there was not an “attitude” applied - it was an honest response and I’m sorry if you took offence.

The question that was asked was “Would it be better if using a xiaomi gateway instead of the zigbee stick?”, and I simply stated that if the user was only using Xiaomi devices, then yes, it seems sensible to use their hub. I’m confused why you think this was not helpful and came with “attitude”?

I have many of them, but I do not tend to use them - most of my testing is done with other chipsets and there’s only so much that I can do by myself.

This shouldn’t really be a big issue. I worked with a large hotel chain - they implemented ZigBee networks in every hotel room (hundreds of rooms in the hotels) and also have 2.4GHz Wifi. It should work ok, but of course you’re right, there can be issues.

However, I’m not sure what that has to do with the Xiaomi issue that you seem to have taken offence with? Xiaomi devices tend to be a little problematic - we’ve seen that. They do work ok, but many people have problems and I still stand by statement above.

@Stoatwblr my question is why are you running a version older than 2.5M4?

If you are not doing that, why thread crap to denigrate our expert developer?

Xiaomi like many vendors “bend” the standards in an attempt to lock you in to their products and cloud services. Your gripe is with them!

Bruce, I’m running the latest release version. I’m commenting on the point about CC2531 specifically as I’m seeing the same issue

I’m aware of the Xiaomi issues but that’s different to losing the USB device - and comments about losing the CC2531 are legion

So start a new thread then!

EDIT: You can legitimately say I should have reported your post as off-topic to the thread, I try to avoid doing that.

I wasn’t responding to the Xaiomi vs USB question but one further up the chain about USB devices. For some reason it wasn’t threaded properly

WRT zigbee in every hotel room (wifi and zigbee hubs/routers everywhere) that’s somewhat different to dense european housing where there is likely to be only one zigbee hub on site, but wifi hubs on all sides.

What I found was that on zigbee’s default channel 11, with my battery devices 1 internal brick wall away, the neighbours either side on wifi channel 1 wiped things out to the extent that handshaking took about 3-5 minutes if it completed at all. Moving to zigbee 15 mostly solved the issue. My own 2GHz wifi is on channel 11

The 2GHz band has over 60 visible wifi networks in a typical London block of flats I was at yesterday, with neighbouring units coming in only 20dB lower than the local one(*) - and people are wondering why throughout is so rotten

(*) Actually they appeared to be across the courtyard - transmitting through glass front windows rathe than the brick walls of adjacent units.

The topic is “Zigbee binding stopped working since 2.5M4” and specficially mentions what I’m seeing - which is the USB device disappearing after several hours and only able to be recovered by replugging it and disabling/reenabling the device in openhab (or restarting OH)

This does not happen using zigbee2mqtt

So whatever was wrong, seems to STILL be wrong.

FWIW: I purchased the CC2531 (and a hardware debugger) in the last couple of months because it was listed as supported in the Openhab documentation and was reasonably priced (the whole kit was cheaper than any Ember device), whilst everything points at not buying proprietary hubs due to lockin issues

If there are issues with them - as you pointed out back in December 19 (which is the posting I responded to) then noting them in the docs and pointing towards better supported devices would head off a lot of queries at the pass

I’ve spent a few weeks banging my head against RF interference issues with both zigbee2mqtt and the native zigbee bindings, resorting to using a spectrum analyser to confirm suspicion that crowded 2GHz bands and low power battery Zigbee devices (my use case is a bunch of thermometers/hygrometers and door sensors) are a bad combination.

Even the ‘problematic’ Xiaomi and Aqara sensors setup and work fine when parked next to a CC2531 on most zigbee channels. It’s getting through walls that’s the issue even when the absolute maximum distance here is only 8 metres - signal is degraded enough to make handshaking/staying connected “difficult” with zigbee2mqtt but it stays working for weeks. Having the usb device dropout entirely under the native OH binding is just an added complication - but of course doing it natively is preferable to using a 3rd party shim (no matter what I’ve tried, I can’t quite get the configurations identical between zigbee2mqtt/OHzigbee, so changing between them requires repairing everything.)

It’s a shame Zigbee isn’t channel-agile as it would probably solve a bunch of issues, but battery devices will always aim to be flea-power entities. The real “arrgh” moment is that I just ordered a bunch of Ikea bulbs to act as routers before reading the woes being encountered with them (at £7 each, they’re relatively cheap repeaters/boosters)

Regarding the “nothing changed” comments about the driver: There are so many interactions in just about everything that just because Chris’ code hasn’t changed doesn’t mean that it hasn’t been affected by something else (the issue could always be a firmware bug locking up the USB device, but the fact that it’s not showing as going offline isn’t encouraging in that respect)

Chris: The test box here is a x86 monster with enough horsepower to debug effectively penalty-free. I’m happy to try out stuff if you want to get to the bottom of it.

To be honest, I’m not 100% clear what the problem is. What are you actually seeing? I guess devices are dropping off the network or something?

If you have a log of what’s happening - both OH and Wireshark logs would be good - then I’m very happy to take a look. If the issue is poor links, then I’m not sure what I can do to help that.

I know there are different firmware versions out there for the 2531 - I personally don’t use this so it’s hard for me to support it. Most of the customers we support commercially are using Ember based systems and that’s what I spend most of my time supporting.

To be a little picky :wink: , this isn’t really a very specific description. Often the external effect that people see has very different underlying cause, so it would really be good if you can describe what you see - it will help me better help you. If you can provide the logs mentioned above, that might help, but also please describe what you see and what time things happen so that I can try and correlate the problem you’re having with the different logs.

I’m not familiar with Ikea bulb problems - they should work in a similar way to any other ZigBee bulb. What problem are you, or others having?

What I’ve been seeing is simply that communications with the USB device appears to stop, but OH doesn’t recognise it’s gone. All the downstream devices are offline (of course)

Current firmware on the CC2531 is CC2531_DEFAULT_20190608.zip (CC2531ZNP-Prod.hex), grabbable from https://www.zigbee2mqtt.io/getting_started/flashing_the_cc2531.html

(https://github.com/Koenkk/Z-Stack-firmware/raw/master/coordinator/Z-Stack_Home_1.2/bin/default/CC2531_DEFAULT_20190608.zip)

Lsof showed that /dev/ttyACM0 was no longer held open (which might be a big clue)

I got it back by replugging the USB and then disabling/reenabling the CC2531EMK Coordinator in paperUI’s thing configuration.

WRT the bulb problems: various complaints that after such events, they don’t re-handshake until everything’s power cycled. I’ll know more when they show up.

Ok, I’m not sure what that would be - if you can get a binding log with debug enabled we can see if that shows anything. I won’t need a sniffer log for this. I can’t think of anyhting in the binding that could cause this sort of behaviour - but maybe something in the wider system could contribute to this of course…

Strange - I’ve not seen such reports. We have some here in the test system and they have worked fine. Often it’s hard to pinpoint the root cause of such problems, so it may not be the Ikea bulb issue.

To be honest I’m inclined to blame most 2.4GHz issues on band crowding/margiunal signals unless proven otherwise. Firmware in a lot of stuff is quite manky

I’ve found at least part of the reason for my issues

If the CC2531 well and truly crashes it will actually disconnect from the USB bus.

At this point OH is still holding onto /dev/ttyACM0 (verified with lsof), so udevd assigns it to /dev/ttyACM1 when it reconnects.

OH only lets go of /dev/ttyACM0 later on during the reset process - hence why I saw it not attached with lsof several hours later - then gripes that /dev/ttyACM0 doesn’t exist before giving up

I’ve added a udevd ruleset to setup a symlink and repointed OH to that.

/etc/udev/rules.d/99-usb-zigbee.rules 

# TI CC2531 Stick (Zigbee)
KERNEL=="ttyACM*", ATTRS{idVendor}=="0451", ATTRS{idProduct}=="16a8", MODE="0666", SYMLINK+="usbzigbee"
# TI CC1353R Devboard (Zigbee)
KERNEL=="ttyACM*", ATTRS{idVendor}=="0451", ATTRS{idProduct}=="bef3", ENV{ID_USB_INTERFACE_NUM}=="00", MODE="0666", SYMLINK+="usbzigbee3"

SUBSYSTEM=="tty", ATTRS{idVendor}=="0451", ATTRS{idProduct}=="16a8", ATTRS{serial}=="__0X00124B001CD4xx", SYMLINK="ttyUSB.CC2531-01", OWNER="openhab"

#  udevadm control --reload-rules && udevadm trigger

OH won’t follow the symlink change when the CC2531 reconnects, but when it disconnects from the serial port it should then follow the changed link to the correct ttyACM device

As for why the stick is crashing… no idea at the moment.

(setup last week was 8 Aqara window sensors and 8 Aqara thermometers

I’ve added 3 Ikea 470lm WS Tradfri lights as repeaters since then and am falling over on trying to add a 4th lamp, 4 more Aqara thermometers and 2 vibration sensors - I can get one of the above connected but it then fails discovery. On top of that I keep getting initialization failures on the atmospheric pressure sensors of the known devices)

Is there any way of increasing the discovery period?

As I pointed out last week in an interference-prone environment (and I assume a busy mesh), when using zigbee2mqtt I could see discovery of even a single zigbee device going from 5-10 seconds under ideal conditions (clear channel and endpoint next to the coordinator) to 3 minutes (co-channel interference from next-door’s wifi plus 3 metre separation of coordinator/endpoint with a brick wall in the way - I think zigbee was relying on multipath propagation via an open doorway)

More FWIW on the CC2531 (in partiucular)

The “default” zigbee firmware on zigbee2mqtt pages I mentioned above is 1.2 HA

I’ve just moved mine to the 3.0.x firmware - it seems to be binding much more easily (firmware available from the same site. I grabbed and used CC2531_20190425.zip, which contains CC2531ZNP-with-SBL.hex)

It also appears to be more stable

HOWEVER: There are some caveats

  • 1: by default Zigbee 3.0 only allows binding for 180 seconds after startup
  • 2: The discovery timeout is still too short at times (busy mesh and/or interference)
  • 3: Initializing aqara devices is problematic - the pressure sensors in particular
  • 4: mesh devices don’t seem to reliably rejoining automatically after a restart

I’ll address these in order. Some of this may be down to impatience on my part but not having a fully “OK” mesh an hour after restarting OH doesn’t seem right.

  • 1: enforced zigbee timeouts

    • Adding a device outside the 3 minute startup period doesn’t seem to “simply” require going into paperui inbox and scanning as usual.

    • I’m having to use openhab-cli and issue the following commands if I want to add devices without starting zigbee, restarting the stick (or restarting OH)

         openhab> zigbee join 180 
         openhab> smarthome:discovery zigbee
      

      and then click in the GUI

    • (NB: 180 is the default. It can’t be set higher than 199 - this is a zigbee3 security setting)

    • If the device isn’t added, then you need to start over with those commands until it happens (this is a PITA with Aqaras)

  • 2: Until a device is fully discovered, about the only way to keep things going (especially Aqaras) is to keep firing the discovery command in openhab-cli whilst clicking on their button, until they show back up in HabMin or paperui as “discovered” (if already configured) or in the inbox with their fully identified device name.

    • NB: “zigbee nodes” will show the device as discovered, online and OK,long before OH interfaces still say it’s working properly.

    • The time taken to do a discovery is definitely related to distance from the stick (signal strength) and wifi interference. One device latched onto a Tradfri repeater in another room and took more than 20 minutes to become active. That’s a lot of button presses.

  • 3: Aqara sensors usually show up as “unitialized” in paperui or “zigbee_device_initialised false” in Habmin

    • I’m using paperui to initialise but regularly see this:

      [ERROR] [r.ZigBeeConverterAtmosphericPressure] - 00158D0005404958: Error 0xffff setting server binding
      [INFO ] [ng.zigbee.handler.ZigBeeThingHandler] - 00158D0005404958: Channel zigbee:device:562cd04e:00158d0005404958:00158D0005404958_1_pressure failed to initialise device
      

      it’s always the pressure sensor that fails to initialise. The odd part is that even with the failure, I’m seeing pressure changes.

      [vent.ItemStateChangedEvent] -  zigbee_device_562cd04e_00158d0005404958_00158D0005404958_1_temperature changed from 29.73 °C to 29.83 °C
      [vent.ItemStateChangedEvent] - zigbee_device_562cd04e_00158d0005404958_00158D0005404958_1_humidity changed from 42.83 to 54.94
      [vent.ItemStateChangedEvent] - zigbee_device_562cd04e_00158d0005404958_00158D0005404958_1_pressure changed from 1014.8 hPa to 1014.9 hPa
      
    • Library issue perhaps?

  • 4 - this is problematic:

    Network  Addr  IEEE Address      Logical Type  State      EP   Profile                    Device Type                Manufacturer     Model          
          0  0000  FFFFFFFFFFFFFFFF  COORDINATOR   UNKNOWN  
          0  0000  FFFFFFFFFFFFFFFF  COORDINATOR   UNKNOWN  
       1449  05A9  00158D0004A06BB6  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
       2901  0B55  00158D0005404958  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
       4152  1038  00158D00052DBD59  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
       7898  1EDA  00158D00052DBE17  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
       8368  20B0  680AE2FFFE4D8651  ROUTER        UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      15793  3DB1  680AE2FFFEF8F56E  ROUTER        UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      16769  4181  680AE2FFFEBE4AB6  ROUTER        ONLINE      1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      27063  69B7  00158D00047EC063  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      29420  72EC  00158D00047D0AC9  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      29711  740F  00158D00045C1557  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      31712  7BE0  00158D00045C9424  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      32869  8065  00158D00049FD8E1  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      33143  8177  00158D00054043BF  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      35502  8AAE  00158D0004521FD5  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      39619  9AC3  00158D000486718D  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      42070  A456  00158D000485A106  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      46200  B478  00158D0004A06BC8  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
      46440  B568  00158D00052DBD98  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      46537  B5C9  680AE2FFFE9ADECC  ROUTER        UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     COLOR_TEMPERATURE_LIGHT    IKEA of Sweden   TRADFRI bulb E14 WS 470lm
                                                             242  A1E0                       ZGP_PROXY_BASIC                                            
      47266  B8A2  00158D0004832B6B  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     DOOR_LOCK                  LUMI             lumi.vibration.aq1
                                                               2  ZIGBEE_HOME_AUTOMATION     5F02                                                       
      50550  C576  00158D000321890B  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      50853  C6A5  00158D00053174F4  END_DEVICE    ONLINE      1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      54326  D436  00158D00047EBF73  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     5F01                       LUMI             lumi.sensor_magnet.aq2
    
    
    
      58104  E2F8  00158D0005406815  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
      63104  F680  00158D00045C0A72  END_DEVICE    UNKNOWN     1  ZIGBEE_HOME_AUTOMATION     TEMPERATURE_SENSOR         LUMI             lumi.weather   
    
    • Before restarting OH, ALL of these devices were online.

    • “zigbee node N” or “zigbee info N/1” shows they’re actually there, but the only way that they can be convinced to show up as “state online” in the CLI is to power cycle the tradfri lights

      • shut down for 10+ seconds
      • they come back into the mesh instantly
      • more importantly - until you do this, they won’t act as routers
    • the only way I’ve found that works for the Aqara devices is to re-pair them, which is painful. Just bouncing on the button to force a poll isn’t enough - and popping the battery out doesn’t work either

    • Even despite this, all these devices are showing as connected in paperui/habmin

There’s one other problem - which I think needs a separate ticket - the Aqara vibration sensors have 2 endpoints in them, but paperui/habmin is only showing the “door lock” one (whatever that is). Endpoint 2 - (device type 5F02) doesn’t show up. Sevice 47266 in the table above

The binding defines this - from memory the binding uses 60 seconds.

The maximum is 254 seconds.

This should not be required - it won’t really do anything with the binding or the device. Pushing the button on the device is what is required to keep things moving. Continuing to request join through the coordinator will potentially slow things down.

Don’t worry about the status here - this has nothing to do with openHAB status.

The binding will show all endpoints that provide clusters it supports. Presumably if there’s a second endpoint it doesn’t provide any features known by the binding. Alternatively it was just not discovered properly.

The binding defines this - from memory the binding uses 60 seconds.

OK, I was working from other docs

The maximum is 254 seconds.

CC3351 3.0 docs point to a 199 limit for this device. I did think it was strange with most limits being defined by binary transitions. Presumably someone in TI just arbitrarily chopping it…

about the only way to keep things going (especially Aqaras) is to keep firing the discovery command in openhab-cli

This should not be required - it won’t really do anything with the binding or the device

Should, but I was finding that just bouncing on the button wasn’t enough

Before restarting OH, ALL of these devices were online.

Don’t worry about the status here - this has nothing to do with openHAB status.

Confusing that it’s in the CLI then. Hopefully anyone in future will see this thread and be less worried

the Aqara vibration sensors have 2 endpoints in them

The binding will show all endpoints that provide clusters it supports.

Let’s setup a separate ticket for this. Right now the endpoint that is being detected shows up as an electronic lock , which isn’t right for a vibration/rotation and temperature sensor :slight_smile:

Another FWIW: You’ve said that battery powered ZB devices are supposed to wake up every ~6-8 seconds. We already know that Xiaomi/Aqara devices are out of spec for this, but it looks like they’re out by a factor of 10 - waking up about once every 80 seconds when nothing’s happening if my endpoint query responses are anything to go by

The Aqara devices also report battery voltage and signal quality as part of their reporting (zigbee2mqtt already implements this), It’d be nice to be able to extract this to know when batteries are getting low and keep an eye on network quality etc