Sending off command to group of lights results in some lights staying on, with missing "changed to" log message

Tags: #<Tag:0x00007f5c986b0368> #<Tag:0x00007f5c986b0138> #<Tag:0x00007f5c986b0020>

I have a group of lights that corresponds to All First Floor lights (FF_All_Lights) I have this for Basement, and Second floor also, the problem happens on all groups.

I am running openHAB M5. On the RPI 4.

Problem: When send a group command OFF some of the lights stay on. The lights that stay on varies. In the sitemap the group is shown as OFF, and all lights are shown as OFF. The log file NEVER shows that whatever lights are still on has changed to OFF. See log below for comments.

In the below sample FF_Kitchen_Group1_Light stays on
SENDING ALL LIGHTS OFF TO GROUP

2019-11-19 20:35:45.657 [ome.event.ItemCommandEvent] - Item 'FF_All_Lights' received command OFF
2019-11-19 20:35:45.675 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_Group1_Light' received command OFF
2019-11-19 20:35:45.691 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_Group2_Light' received command OFF
2019-11-19 20:35:45.707 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_Group3_Light' received command OFF
2019-11-19 20:35:45.721 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_Group4_Light' received command OFF
2019-11-19 20:35:45.735 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_External1_Dimmer' received command OFF
2019-11-19 20:35:45.751 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_External2_Dimmer' received command OFF
2019-11-19 20:35:45.760 [ome.event.ItemCommandEvent] - Item 'FF_Utility_Group1_Light' received command OFF
2019-11-19 20:35:45.768 [ome.event.ItemCommandEvent] - Item 'FF_Utility_Group2_Light' received command OFF
2019-11-19 20:35:45.775 [ome.event.ItemCommandEvent] - Item 'FF_Office_Group1_Light' received command OFF
2019-11-19 20:35:45.784 [ome.event.ItemCommandEvent] - Item 'FF_Office_Group2_Light' received command OFF
2019-11-19 20:35:45.792 [ome.event.ItemCommandEvent] - Item 'FF_Bathroom_Group1_Light' received command OFF
2019-11-19 20:35:45.807 [nt.ItemStatePredictedEvent] - FF_Kitchen_Group1_Light predicted to become OFF
2019-11-19 20:35:45.824 [nt.ItemStatePredictedEvent] - FF_Kitchen_Group2_Light predicted to become OFF
2019-11-19 20:35:45.842 [nt.ItemStatePredictedEvent] - FF_Kitchen_Group3_Light predicted to become OFF
2019-11-19 20:35:45.859 [nt.ItemStatePredictedEvent] - FF_Kitchen_Group4_Light predicted to become OFF
2019-11-19 20:35:45.877 [nt.ItemStatePredictedEvent] - FF_Kitchen_External1_Dimmer predicted to become OFF
2019-11-19 20:35:45.903 [nt.ItemStatePredictedEvent] - FF_Kitchen_External2_Dimmer predicted to become OFF
2019-11-19 20:35:45.927 [nt.ItemStatePredictedEvent] - FF_Utility_Group1_Light predicted to become OFF
2019-11-19 20:35:45.948 [nt.ItemStatePredictedEvent] - FF_Utility_Group2_Light predicted to become OFF
2019-11-19 20:35:45.965 [nt.ItemStatePredictedEvent] - FF_Office_Group1_Light predicted to become OFF
2019-11-19 20:35:45.988 [nt.ItemStatePredictedEvent] - FF_Office_Group2_Light predicted to become OFF
2019-11-19 20:35:46.015 [nt.ItemStatePredictedEvent] - FF_Bathroom_Group1_Light predicted to become OFF
2019-11-19 20:35:46.023 [GroupItemStateChangedEvent] - FF_All_Lights changed from OFF to ON through FF_Kitchen_Group1_Light
2019-11-19 20:35:46.026 [vent.ItemStateChangedEvent] - FF_Kitchen_Group3_Light changed from ON to OFF
2019-11-19 20:35:46.027 [vent.ItemStateChangedEvent] - FF_Kitchen_Group4_Light changed from ON to OFF
2019-11-19 20:35:46.036 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_External1_Dimmer_Color2' received command 74
2019-11-19 20:35:46.037 [vent.ItemStateChangedEvent] - FF_Kitchen_External1_Dimmer changed from 100 to 0
2019-11-19 20:35:46.038 [GroupItemStateChangedEvent] - FF_All_Lights changed from ON to OFF through FF_Kitchen_External1_Dimmer
2019-11-19 20:35:46.049 [nt.ItemStatePredictedEvent] - FF_Kitchen_External1_Dimmer_Color2 predicted to become 74
2019-11-19 20:35:46.055 [vent.ItemStateChangedEvent] - FF_Kitchen_External1_Dimmer_Color2 changed from 0 to 74

Notice in the above that it NEVER says FF_Kitchen_Group1_Light changed from ON to OFF, some of the other lights also do not get a changed message, but they were already off, and showing as OFF in the sitemap.

Obviously I can Manually turn the missing light on

2019-11-19 20:37:10.596 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_Group1_Light' received command ON
2019-11-19 20:37:10.642 [nt.ItemStatePredictedEvent] - FF_Kitchen_Group1_Light predicted to become ON
2019-11-19 20:37:10.655 [vent.ItemStateChangedEvent] - FF_Kitchen_Group1_Light changed from OFF to ON
2019-11-19 20:37:10.668 [GroupItemStateChangedEvent] - FF_All_Lights changed from OFF to ON through FF_Kitchen_Group1_Light

And manually turn the missing light off

2019-11-19 20:37:20.768 [ome.event.ItemCommandEvent] - Item 'FF_Kitchen_Group1_Light' received command OFF
2019-11-19 20:37:20.807 [nt.ItemStatePredictedEvent] - FF_Kitchen_Group1_Light predicted to become OFF
2019-11-19 20:37:20.819 [vent.ItemStateChangedEvent] - FF_Kitchen_Group1_Light changed from ON to OFF
2019-11-19 20:37:20.822 [GroupItemStateChangedEvent] - FF_All_Lights changed from ON to OFF through FF_Kitchen_Group1_Light

What binding is linked to these Items?

Do you have autoupdate=true or false? Based on the log it looks like autoupdate=false. That means that the command from openHAB is getting lost among the barrage of commands from openHAB to the devices so the light never changes and therefore the Item never changes. If this is the case, you need to spread out the commands to the Items so they don’t all occur within milliseconds of each other. See Design Pattern: Gate Keeper.

Thanks for looking into it. This used to work without issues.

I have not explicitly set the autoupdate (so it must be the default state?) all of the items are bound to KNX. Some lights are also bound to zigbee or zwave devices (but always with a KNX device, since all lights in the house must be able to be controlled by a push button somewhere) I have not experienced that the zwave or zigbee devices not turn off.

The only change recently is that KNX is no longer going through KNXD (local serial device) but directly through its own serial bridge thing.

How would I spread out items commands when I send a command to the group?

There are a few things I find weird here:

  • First why would the lights not turn of by the group command
    (You answered this: maybe the commands are comming too fast for the KNX binding through its own Serial Bridge, and KNXd was better at handling that) which again leads to to new questions
    • How would I fix this immediately
    • How would I go about finding the root cause in the KNX binding and sending a PR back (who is the active maintainer on the KNX binding?)
  • Second: Why does the item report being off when it does not show a log entry for changing to OFF from ON?

No. autoupdate=true is the default state. And now that I look again, I see the “predicted” log statements.

You don’t. You send the command to a proxy Item and use the Gatekeeper DP I linked to above to add a gap between each command.

I don’t use KNX nor know anything about it so can’t help with that stuff.

As for your second, without more experimentation and tracing the state of the specific Item through the events.log I couldn’t say. events.log doesn’t show updates so it’s possible that the Item isn’t actually changing to OFF but the UI switched it to showing that it’s OFF in response to the command.

That is a good question, actually. While a UI might update a graphic switch in response to click, pre-empting a “real” change of Item state - that isn’t going to happen for a click on a Group master Item, I’m sure.

Not being funny … but was this Item OFF beforehand? That would explain a logged prediction of OFF having no apparent consequence.

It was definitly ON beforehand.

I have tested, and retested again and again. I can keep reproducing the issue (it happens with different lights but same issue)

I have not much going on besides this. I means there is not a lot of openHAB work in the background. And as I said this is the latest RPI4 with 4GB ram so the system is not working hard.

Okay, take your word for that - this example is just two minutes after the supposed missing “changed to OFF”

I feel this is part of the clue worth exploring.

I think we can trust the logging service - I’ve never seen any serious suggestion that it drops messages, either on highly loaded or high performance systems.

Autoupdate’s predictions can be vetoed by bindings. I think the mechanism is something like autoupdate asking each linked binding for its “vote”. Potentially that’s gone wrong here - perhaps because the binding is flooded with such requests.
That would explain the missing “changed” - but it appears the change did at least partially happen.

If autoupdate worked normally, we would expect to see a “changed to OFF” log, whether or not the command successfully got to the device.

It’s then possible for the binding to change it back, in the case of a status or response back from the device “still ON”. You’d expect that to take a finite time, slower than autoupdate, so if that happened correctly we should see both OFF - ON changes logged.

It’s easy to see how a flurry of commands through one binding can lose one somewhere, depending on the many moving parts. While it’s “nice” to have the binding queue up such commands and manage them cleanly, it’s not imperative - some technologies simply will never handle this well.
Which is to say, it’s easy to see how a random light gets left on, and not that surprising. The mystery is where the invisible state change occurs.

Hmm, multiple channels per Item with mixed technologies.

And a box with undetermined buffering skills.

We might usefully know more about that.

This is a common issue in question of knx (but there are other bindings for which this is true, too). By sending a command to a group Item, this is serialized to many Items. Each Item is commanded at once, but sadly, the Binding does not buffer the commands and so the commands are sent too quick (and this results in loosing some commands).
It’s best practice to configure knx items to use a state return group address. To ensure that the Item always show the real state, you have to set the item to autoupdate=“false”.
Channel

Type switch : ch1 "My switch" [ga="1/0/1+<1/0/2"]  // 1/0/1 : set actuator to ON/OFF, 1/0/2 : get state of actuator

Item:

Switch MySwitch "My Switch [%s]" { channel="knx:bridge:mything:ch1", autoupdate="false" }

If there are lost commands, the common way would be to use a proxy item and a rule instead of the group Item directly:

rule "send all off"
when
    Item MyProxyItem received command OFF
then
    MyGroupItem.members.filter[ m|m.state != OFF ].forEach[ i|  // each Light which isn't OFF
        i.sendCommand(OFF)                                      // switch a light OFF
        Thread::sleep(100)                                      // wait 0.1 Seconds
    ]
end

This way the commands are sent with a little delay.

1 Like

Hmm, that is quite interesting.

Do you know where the commands are lost? I never experienced this issue when I used the knx binding with KNXD only after giving the binding direct access to the knx serial device.

On the zigbee binding I notice a delay in changing many lights. Perhaps that buffers commands?

Well I had command loss regularly with my roller shutters. But I changed the roller shutter actuators (the relays, not the motors) to models which allow direct positioning.
After that upgrade I also changed my rules to control all shutters via direct positioning instead of simple UP/DOWN group commands, so I have no command loss any more, but I do know that there are several people out there with a similar problem.

Do you use KNX through direct serial bridge thing or ip bridge thing to KNXD / ip hardware?

I’m using a Weinzierl 730 knx/IP Interface.

Just to follow up.

My plan is to go back and look at calimero and see if I can recreate the flooding of commands there, I think that would be the place to handle it. I already have some experience with that library since I have managed to port the latest version to Java 8 from Java 11 in order to support cEMI over FT12 (I really do not like KNXD)

Until then I was lazy and your suggestion (@Udo_Hartmann) sounds like the perfect and correct way to do it, the way KNX was intended, but it just sounded like too much work.

So, I decorated all groups that have the problem with another group X_Force (X are my special groups) and created a simple rule to just push the command sent to the group again, with a slight delay between them.

This is not how it should be done, but for now it works until I have more time.

rule "Force Lights"
    when 
        Member of X_Force received command
    then        
        val group = triggeringItem as GroupItem
        
        group.members.forEach[ memberItem|         
            memberItem.sendCommand(receivedCommand)
            Thread::sleep(200) 
            ]
end

I have now been in contact with the maintainer of Calimero the library that the KNX addon is using, he has implemented a fix in the latest source in order to fix this.

I have setup a CI that builds the KNX addon with the latest Calimero lib here:

KNX Addon + Calimero build service status latest build

The direct link to the latest jars is here:
knx-addon.bundle.zip

And it works with that build now. Without the “Force Lights” group hack.

1 Like

That’s pretty cool!

Must have lost this thread at some point, but better a late answer than no answer… :wink:

Did you already consider to make an issue for that? an update of calimero in the official sources would be the best solution…