Few Zigbee bulbs not receiving commands when triggering a scene

I have 14 zigbee bulbs (innr). They are all working fine.

I have a strange minor issue with them: when launching a scene for all 14 bulbs, for example to all switch them OFF, some of them (i.e. 1 or 2) are not switched off. It is not always the same that have the issue. If I trigger the scene again, then, the zigbee messages are resent and the two remaining bulbs go to OFF. So, each time I want to trigger a scene, I have to trigger it twice, because each time 1 to 3 bulbs are not receiving the command. If I wait few minutes, sometimes the ones not responding finally receive the message (because openhab resends it). So my guess, is that the original message have never been received.

Thus, I investigated the quality of the network. All my bulbs are in the same room, but the LQI is around 120 and I see in Openhab that the coordinator repors a LQI of 43 (which seems low). So I bought an Aeotec Range Repeater Zi, to improve the covering and the routing. But I think that it was useless: even with the repeater in the room (LQI of 190 with the coordinator), the issue remains: some bulbs do not receive the message.

Because the issue appears only when triggering a scene (sending a unique common to a bulb works always fine), I have the feeling that the issue is linked to the fact that openhab is sending a lot of zigbee messages in a short time, and then, because of the routing between the bulbs, some collisions make that some of the last messages are lost. But, I am surprised if this is the explanation, as triggering a scene with a lot of bulbs is a classical usage that have probably been tested.

At some point, I suspected my openhab configuration to be the problem: currently, I associated one item to several channel things (for example 1 dimmer item controls 4 dimmer channels of 4 bulbs). I have seen other users doing that in the forum, but I am not sure it is good or not.

As an example, I just did a short test by triggering a scene that basically put to off almost all bulbs. The command is issued at 13:58, and 00158d0004b0ef99 does not go OFF. Then, at 14:05, it finally goes off (probably because openhab has retried. The filtered log is below:

filtered_log_00158d0004b0ef99.txt (3.3 KB)

I have attached the full log here:
zigbee.log (50.0 KB)

Here are the list of nodes and neigbors:
nodes.txt (4.8 KB)
neighbors_coordinator.txt (2.5 KB)
neighbors_5447.txt (2.0 KB)

If you have any idea … :slight_smile:

It’s almost certainly this, in my opinion, which makes it a Zigbee issue.

Scenes aren’t the problem. You’re likely just noticing the issue for the first time due to how many commands the scenes are sending on top of each other.

Keep in mind that scenes have one job: send commands to item. They don’t know the nature of those items (e.g. Zigbee, Z-Wave, WiFi), so they can’t adapt their behaviour to different bindings/things. Once commands are sent, the scenes end and are no longer involved.

Where scenes may have an impact is that the commands might be sent simultaneously (I don’t know how they work in the back end).

I suspect that if you put all of the devices into a group and then toggle the group item, you might see similar behaviour. But I’m not 100% certain of that.

If you search the community, you might find someone talking about this with respect to Zigbee limitations.

If you don’t find anything, then the hack solution would be to remove some of your devices from the scene and then add in a proxy item. When the scene switches on the proxy item, you could then run a rule to trigger the remaining items. If you still get a collision, you could add a delay to the rule.

A few years ago I did some work for a German company (a manufacturer of zigbee bulbs) and we tested switching 50 bulbs and this worked ā€œfineā€. I say ā€œfineā€ because unless you use a zigbee group (not the same as an OH group) then the binding will send individual commands to every device and this takes a finite amount of time. From memory, switching 50 bulbs took something like 1.5 seconds, and in my test environment, you could clearly see all the bulbs switching at different times… Still, they all turned on (or off) correctly - the only problem was the delay from the first to last bulb.

I will add that this wasn’t with openHAB, but it uses the same Java ZigBee library as OH and it’s the library that actually manages the zigbee network and communicating with devices, retries etc.

I’m not sure exactly what this link is for since an LQI should be between two devices, but yes, in general anything below 180 may cause issues. That said, different manufacturers do report LQI differently, and I’ve seen LQIs in the 50-100 range working quite well.

There is no such thing as simultaneous… Ultimately, there are items and channels in OH - an ultimately the binding will receive a command on a channel. It will in this case receive a lot of commands in short succession, and the binding will queue all these commands and send them when it can - meeting the various constraints imposed by the zigbee protocol (and other protocols along the way).

Correct - using an OH group will make no difference - unicast commanding is still required. OH breaks out the individual item commands so individual commands will still be sent out to each device separately.

This won’t really work - or at least not well. The rule has no way to know about network congestion etc. Also, the rule will potentially just add more congestion - the zigbee binding already performs 3 retries (well, 3 tries - 2 retries) so for a device not to respond there’s probably something else wrong. Using a rule to send the command again will add another 3 commands into the queue (ie the 3 tries) and given zigbee has limitations on the number of commands that can be ā€œin flightā€ at any point in time, this will just add to congestion.

As above, the way to resolve this ā€œproperlyā€ is to use zigbee groups. This uses a broadcast system, so all bulbs can switch at the same time (so long as they all receive the broadcast!). There are limitations in that broadcasts are not routed, so this sets a maximum physical distance that can be supported for a group (eg a single room). Unfortunately groups aren’t really supported - the binding does have group support, but IIRC there’s no way to hook a group to an item, and configuring groups is problematic.

So, back to the original issue - in theory, this should work fine in that all bulbs should switch ok with the limitation that they won’t all switch at exactly the same time. Without a full debug log it’s difficult to say what is happening here - the log provided just doesn’t have a lot of info in it an we’d likely need to see logging from the com.zsmartsystems.zigbee package at least. I’m happy to take a look if you can provide a better log as it’s always interesting to see how things perform in some of these stress situations :slight_smile: .

1 Like

Minor clarification. I meant that the scene would command maybe half of the Zigbee devices, plus the proxy item. The proxy item would then trigger a rule to command the remaining Zigbee devices (which would be removed from the scene).

I was thinking this would give some space to reduce the congestion, rather than repeating the commands again. But it’s an ugly hack. :wink:

What is the QOS of the device set to?
Screenshot from 2024-11-11 08-30-40

This is not related to the zigbee binding - this looks like MQTT QoS…

Thank you for all your comments.

I did a new test with more info from com.zsmartsystems.zigbee:
zigbee.log (745.3 KB)

Two devices had problems during the test:

  • 804B50FFFE422D94 (a bulb): never get updated to OFF (even after 5 minutes of waiting)
  • 00158D000736C5C7 (a dimmer controller): get finally updated to OFF after 5 minutes

I understand from the discussion that my 14 devices should work (compared to the 50 bulbs of chris…) and a small delay would be of course acceptable. It is true that the scene does not know about what is behind, and sends a lot of commands to the zigbee binding, but chris seems to say that all is done in a queue, thus there is no reason that the binding generates collisions (and yes, I do not use MQTT so the QOS is not available).

Please can you provide a list of all 14 bulbs and their IEEE address (or thing UID) and their zigbee address. You can use the console openhab:zigbee nodes command to do this -:

Already provided in my first post :slight_smile: . See nodes.txt.

Thanks - I managed to reconstitute this from the log…

So… Firstly, thanks for actually providing a complete set of data, and a nice, clear example of the problem - it’s not always the case, and especially when the issue is a bit more convoluted like this is (ie with a lot going on).

Give me a day or two to think about this. There definitely seems to be something wrong with the queue management which is strange to find since there are customers with tens of thousands of systems out there using this library and I’ve not seen this reported before. From my first look, it seems that the commands get queued, but the queues for those 2 devices are never processed. The device that updates after 5 minutes or so has its queue processed because the system receives an attribute update, and that triggers the sending of the queued commandd - possibly the other device would be the same if/when there is an attribute update…

I need to look through this in more detail over the next day or two, but for sure will be following this up…

1 Like

You’re welcome. Do not hesitate to ask for more tests or logs, if needed. Note also that it happens not always for the same devices. Most of the times 2 devices are concerned, but it’s random among all devices. It also appear when switching them ON with my other scene.

Thanks. Probably I’ll look to distribute a test binding if that’s ok? It will probably be in the ā€œKARā€ format - so a single binary that you just drop in your ā€˜addons’ folder.

My first problem is to remember how this part of the cods works as I wrote it around 5 years ago now :wink:

Hey @jfl. So today I started to take a look at this… I resurrected my ā€œbox of bulbsā€ that I used for testing this a bunch of years ago and populated that with 27 bulbs. I ran the test turning all bulbs on and off multiple times and didn’t find any failures… In my test it seems to take around 2 seconds to turn all 27 bulbs on or off.

(sorry for the poor quality of the video - I had to cut it back to create an animated GIF that fitted into 1MB).

My feeling is that this issue is related to multiple threading. In theory, the library should be multi thread compatible, but some of the queue management may be the source of your problem. In this respect, my test may not be 100% representative of the OH system since I think all commands will come from a single thread.

So, a question for you, and possibly some homework…

Question: What are you running OH on? Is it a Pi, or something ā€œmore capableā€?

Homework: Can you jump on the OH console and try the following command:

openhab:zigbee on *

and see what happens. You can also use the off command in place of on. This will cycle through all devices that support the OnOff commands and send the command to them in short time. I’m interested to see if this misses some lights as you’re seeing when running through an OH scene/group/rule…

IMG_9679 (1) (1)

Hi @chris

Sorry for the long delay. The bug is still there, do not worry :slightly_smiling_face:

So I just did my homework (on the raspberry pi):

openhab> openhab:zigbee off *                                                                                                                     
[Endpoint: C036/1] Command has been successfully sent
[Endpoint: 02B0/1] Command has been successfully sent
[Endpoint: 6BDE/1] Command has been successfully sent
[Endpoint: D788/1] Command has been successfully sent
[Endpoint: 1547/1] Command has been successfully sent
[Endpoint: D31A/1] Command has been successfully sent
[Endpoint: 7829/1] Command has been successfully sent
[Endpoint: 0FA2/1] Command has been successfully sent
[Endpoint: 2AA7/1] Command has been successfully sent
[Endpoint: 179A/1] Command has been successfully sent
[Endpoint: B86E/1] Command has been successfully sent
[Endpoint: 13E6/1] Command has been successfully sent
[Endpoint: 6D86/1] Command has been successfully sent
[Endpoint: 2E36/1] Command has been successfully sent

openhab> openhab:zigbee on *                                                                                                                     
[Endpoint: C036/1] Command has been successfully sent
[Endpoint: 02B0/1] Command has been successfully sent
[Endpoint: 6BDE/1] Command has been successfully sent
[Endpoint: D788/1] Command has been successfully sent
[Endpoint: 1547/1] Command has been successfully sent
[Endpoint: D31A/1] Command has been successfully sent
[Endpoint: 7829/1] Command has been successfully sent
[Endpoint: 0FA2/1] Command has been successfully sent
[Endpoint: 2AA7/1] Command has been successfully sent
[Endpoint: 179A/1] Command has been successfully sent
[Endpoint: B86E/1] Command has been successfully sent
[Endpoint: 13E6/1] Command has been successfully sent
[Endpoint: 6D86/1] Command has been successfully sent
[Endpoint: 2E36/1] Command has been successfully sent

All the lights are switched off/on correctly. I tested several times. The whole process goes faster compared to my Scene. So it’s working fine with this command, but there is still the problem when performed from a rule/scene…

Thanks. That’s useful to confirm…

I’ve kinda lost track of my thinking with this - I made some low level changes to the library back in November so I need to remind myself what my thinking was, and then see if I can create a debug version for you to test…