Few Zigbee bulbs not receiving commands when triggering a scene

I have 14 zigbee bulbs (innr). They are all working fine.

I have a strange minor issue with them: when launching a scene for all 14 bulbs, for example to all switch them OFF, some of them (i.e. 1 or 2) are not switched off. It is not always the same that have the issue. If I trigger the scene again, then, the zigbee messages are resent and the two remaining bulbs go to OFF. So, each time I want to trigger a scene, I have to trigger it twice, because each time 1 to 3 bulbs are not receiving the command. If I wait few minutes, sometimes the ones not responding finally receive the message (because openhab resends it). So my guess, is that the original message have never been received.

Thus, I investigated the quality of the network. All my bulbs are in the same room, but the LQI is around 120 and I see in Openhab that the coordinator repors a LQI of 43 (which seems low). So I bought an Aeotec Range Repeater Zi, to improve the covering and the routing. But I think that it was useless: even with the repeater in the room (LQI of 190 with the coordinator), the issue remains: some bulbs do not receive the message.

Because the issue appears only when triggering a scene (sending a unique common to a bulb works always fine), I have the feeling that the issue is linked to the fact that openhab is sending a lot of zigbee messages in a short time, and then, because of the routing between the bulbs, some collisions make that some of the last messages are lost. But, I am surprised if this is the explanation, as triggering a scene with a lot of bulbs is a classical usage that have probably been tested.

At some point, I suspected my openhab configuration to be the problem: currently, I associated one item to several channel things (for example 1 dimmer item controls 4 dimmer channels of 4 bulbs). I have seen other users doing that in the forum, but I am not sure it is good or not.

As an example, I just did a short test by triggering a scene that basically put to off almost all bulbs. The command is issued at 13:58, and 00158d0004b0ef99 does not go OFF. Then, at 14:05, it finally goes off (probably because openhab has retried. The filtered log is below:

filtered_log_00158d0004b0ef99.txt (3.3 KB)

I have attached the full log here:
zigbee.log (50.0 KB)

Here are the list of nodes and neigbors:
nodes.txt (4.8 KB)
neighbors_coordinator.txt (2.5 KB)
neighbors_5447.txt (2.0 KB)

If you have any idea ā€¦ :slight_smile:

Itā€™s almost certainly this, in my opinion, which makes it a Zigbee issue.

Scenes arenā€™t the problem. Youā€™re likely just noticing the issue for the first time due to how many commands the scenes are sending on top of each other.

Keep in mind that scenes have one job: send commands to item. They donā€™t know the nature of those items (e.g. Zigbee, Z-Wave, WiFi), so they canā€™t adapt their behaviour to different bindings/things. Once commands are sent, the scenes end and are no longer involved.

Where scenes may have an impact is that the commands might be sent simultaneously (I donā€™t know how they work in the back end).

I suspect that if you put all of the devices into a group and then toggle the group item, you might see similar behaviour. But Iā€™m not 100% certain of that.

If you search the community, you might find someone talking about this with respect to Zigbee limitations.

If you donā€™t find anything, then the hack solution would be to remove some of your devices from the scene and then add in a proxy item. When the scene switches on the proxy item, you could then run a rule to trigger the remaining items. If you still get a collision, you could add a delay to the rule.

A few years ago I did some work for a German company (a manufacturer of zigbee bulbs) and we tested switching 50 bulbs and this worked ā€œfineā€. I say ā€œfineā€ because unless you use a zigbee group (not the same as an OH group) then the binding will send individual commands to every device and this takes a finite amount of time. From memory, switching 50 bulbs took something like 1.5 seconds, and in my test environment, you could clearly see all the bulbs switching at different timesā€¦ Still, they all turned on (or off) correctly - the only problem was the delay from the first to last bulb.

I will add that this wasnā€™t with openHAB, but it uses the same Java ZigBee library as OH and itā€™s the library that actually manages the zigbee network and communicating with devices, retries etc.

Iā€™m not sure exactly what this link is for since an LQI should be between two devices, but yes, in general anything below 180 may cause issues. That said, different manufacturers do report LQI differently, and Iā€™ve seen LQIs in the 50-100 range working quite well.

There is no such thing as simultaneousā€¦ Ultimately, there are items and channels in OH - an ultimately the binding will receive a command on a channel. It will in this case receive a lot of commands in short succession, and the binding will queue all these commands and send them when it can - meeting the various constraints imposed by the zigbee protocol (and other protocols along the way).

Correct - using an OH group will make no difference - unicast commanding is still required. OH breaks out the individual item commands so individual commands will still be sent out to each device separately.

This wonā€™t really work - or at least not well. The rule has no way to know about network congestion etc. Also, the rule will potentially just add more congestion - the zigbee binding already performs 3 retries (well, 3 tries - 2 retries) so for a device not to respond thereā€™s probably something else wrong. Using a rule to send the command again will add another 3 commands into the queue (ie the 3 tries) and given zigbee has limitations on the number of commands that can be ā€œin flightā€ at any point in time, this will just add to congestion.

As above, the way to resolve this ā€œproperlyā€ is to use zigbee groups. This uses a broadcast system, so all bulbs can switch at the same time (so long as they all receive the broadcast!). There are limitations in that broadcasts are not routed, so this sets a maximum physical distance that can be supported for a group (eg a single room). Unfortunately groups arenā€™t really supported - the binding does have group support, but IIRC thereā€™s no way to hook a group to an item, and configuring groups is problematic.

So, back to the original issue - in theory, this should work fine in that all bulbs should switch ok with the limitation that they wonā€™t all switch at exactly the same time. Without a full debug log itā€™s difficult to say what is happening here - the log provided just doesnā€™t have a lot of info in it an weā€™d likely need to see logging from the com.zsmartsystems.zigbee package at least. Iā€™m happy to take a look if you can provide a better log as itā€™s always interesting to see how things perform in some of these stress situations :slight_smile: .

1 Like

Minor clarification. I meant that the scene would command maybe half of the Zigbee devices, plus the proxy item. The proxy item would then trigger a rule to command the remaining Zigbee devices (which would be removed from the scene).

I was thinking this would give some space to reduce the congestion, rather than repeating the commands again. But itā€™s an ugly hack. :wink:

What is the QOS of the device set to?
Screenshot from 2024-11-11 08-30-40

This is not related to the zigbee binding - this looks like MQTT QoSā€¦

Thank you for all your comments.

I did a new test with more info from com.zsmartsystems.zigbee:
zigbee.log (745.3 KB)

Two devices had problems during the test:

  • 804B50FFFE422D94 (a bulb): never get updated to OFF (even after 5 minutes of waiting)
  • 00158D000736C5C7 (a dimmer controller): get finally updated to OFF after 5 minutes

I understand from the discussion that my 14 devices should work (compared to the 50 bulbs of chrisā€¦) and a small delay would be of course acceptable. It is true that the scene does not know about what is behind, and sends a lot of commands to the zigbee binding, but chris seems to say that all is done in a queue, thus there is no reason that the binding generates collisions (and yes, I do not use MQTT so the QOS is not available).

Please can you provide a list of all 14 bulbs and their IEEE address (or thing UID) and their zigbee address. You can use the console openhab:zigbee nodes command to do this -:

Already provided in my first post :slight_smile: . See nodes.txt.

Thanks - I managed to reconstitute this from the logā€¦

Soā€¦ Firstly, thanks for actually providing a complete set of data, and a nice, clear example of the problem - itā€™s not always the case, and especially when the issue is a bit more convoluted like this is (ie with a lot going on).

Give me a day or two to think about this. There definitely seems to be something wrong with the queue management which is strange to find since there are customers with tens of thousands of systems out there using this library and Iā€™ve not seen this reported before. From my first look, it seems that the commands get queued, but the queues for those 2 devices are never processed. The device that updates after 5 minutes or so has its queue processed because the system receives an attribute update, and that triggers the sending of the queued commandd - possibly the other device would be the same if/when there is an attribute updateā€¦

I need to look through this in more detail over the next day or two, but for sure will be following this upā€¦

1 Like

Youā€™re welcome. Do not hesitate to ask for more tests or logs, if needed. Note also that it happens not always for the same devices. Most of the times 2 devices are concerned, but itā€™s random among all devices. It also appear when switching them ON with my other scene.

Thanks. Probably Iā€™ll look to distribute a test binding if thatā€™s ok? It will probably be in the ā€œKARā€ format - so a single binary that you just drop in your ā€˜addonsā€™ folder.

My first problem is to remember how this part of the cods works as I wrote it around 5 years ago now :wink:

Hey @jfl. So today I started to take a look at thisā€¦ I resurrected my ā€œbox of bulbsā€ that I used for testing this a bunch of years ago and populated that with 27 bulbs. I ran the test turning all bulbs on and off multiple times and didnā€™t find any failuresā€¦ In my test it seems to take around 2 seconds to turn all 27 bulbs on or off.

(sorry for the poor quality of the video - I had to cut it back to create an animated GIF that fitted into 1MB).

My feeling is that this issue is related to multiple threading. In theory, the library should be multi thread compatible, but some of the queue management may be the source of your problem. In this respect, my test may not be 100% representative of the OH system since I think all commands will come from a single thread.

So, a question for you, and possibly some homeworkā€¦

Question: What are you running OH on? Is it a Pi, or something ā€œmore capableā€?

Homework: Can you jump on the OH console and try the following command:

openhab:zigbee on *

and see what happens. You can also use the off command in place of on. This will cycle through all devices that support the OnOff commands and send the command to them in short time. Iā€™m interested to see if this misses some lights as youā€™re seeing when running through an OH scene/group/ruleā€¦

IMG_9679 (1) (1)