[LCN] OH3 fails to receive status messages after some time

Hello,

as there don’t seem to be many LCN users in the german openhabforum.de where I have already asked for help on this topic ([LCN] OH3 empfängt nach einiger Zeit keine Statusmeldungen mehr? - openhabforum.de), I am looking for assistance here now.

OH3 runs on a Raspberry 4B. In addition to the LCN binding I have a deCONZ/conbee to manage some motion sensors, lamps and humidity sensors.
Generally the LCN binding works: modules have been auto discovered, outputs can be set via OH, binary sensor states appear in the OH logs and rules react on those state changes.
But after a certain time (sometimes a few minutes, sometimes hours, sometimes a few days) the following warnings appear in the logs and no status messages of the modules come in any more.

2022-11-15 21:40:26.397 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M008: Module did not respond to command: A1DI000000
2022-11-15 21:46:51.715 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M009: Failed to receive status message: Binary Sensors: Failed finally after 3 tries
2022-11-15 21:47:25.750 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M008: Failed to receive status message: Output 1: Failed finally after 3 tries

In this state it is still possible e. g. to set the outputs of a module, but lacking the status feedback.
On the LCN side, i. e. in the bus log (LCN-Pro and LinHK), the said modules are still sending status messages – only OH has ceased to receive them.
I can remedy this state by disabling/enabling the PCK gateway Thing – until, after some time, the said warnings reappear.
Increasing the ‘connection timeout’ from 3s to 5s or 15s in the PCK gateway Thing doesn’t change the behaviour.
In the original post in the german forum I wondered about one module sending status messages containing a ‘4’ in the module name, which I suspected to be a possible reason for the LCN binding to not correctly interpret the status message.

2022-11-15 22:11:18.128 [TRACE] [g.lcn.internal.connection.Connection] - Received: ':M004009Bx008'
2022-11-15 22:11:23.631 [TRACE] [g.lcn.internal.connection.Connection] - Received: ':M004009Bx000'

This was due to setting status messages to ‘global’ in the property settings of this very module (LCN-Pro); ‘global’ messages are sent to the special segment ‘4’.
But resetting this setting back to ‘local’ didn’t change the issue.
My LCN modules are older ones with firmware versions 13 and 14.

In the german bus-profi forum I found a discussion (Feb. 2016, OH1 or OH2?) about delayed forwarding of (dimming) commands to the LCN bus.
The LCN binding was suspected (and criticised) to work based on acknowledge messages (Quittungsmeldungen), i. e. every command sent to the bus requesting an acknowledgement.
The problem being that these acknowledge (ack) messages cannot be ascribed to a specific command (as they do not contain any reference to the command that caused the ack message).
Furthermore the loss of such ack messages (e. g. due to bus collisions) or the fact(?) that no ack and status messages are sent, if a command does not cause a change of a module’s state (e. g. output or relay state) – if not adequately handled in the binding – could lead to unnecessary resending of commands or being caught in timeouts (waiting for ack messages).
In the case of missing ack messages the binding repeatedly sends the same commands to the bus, while increasing the timeouts between two commands.
Subsequent commands being queued during these timeouts, the user has the impression of a blocked/irresponsive system. (This could e. g. happen when slowly moving a dimming slider and the binding repeatedly sending identical commands, that are not acknowledged, as no state changes…).
It was suggested to rather rely on status messages (and track intended changes of an output or relay) instead of waiting for unspecific ack messages.

I don’t know if these assumptions of working principles (still) hold true for the current LCN binding, but there might be a relation to the issue of missing status messages described above?

Could it be that a module is ‘switched off’ in a certain way by the binding, after a number of timeouts?
(NB: the module is still sending status messages (cf. LCN bus log); after disable/enable of the gateway Thing the module’s status messages are ‘live’ again.)

Best,
Jo.

If you set the log level for the LCN binding to TRACE, you should see all telegrams you also see in the LCN-PRO bus monitor. Is that true, if the lock-up happens you described?

The binding should also react to status messages sent to segment 4. Seems like a bug as you describe it. But it should have no relation to your actual issue.

Hello Fabian,

thanks for replying.

I have set the log level to ‘TRACE’:
In the locked-up state I see the following in the log:

2022-11-23 11:08:48.197 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M011: Failed to receive status message: Output 2: Failed finally after 3 tries

2022-11-23 11:08:51.200 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M008: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:08:51.202 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M007: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:08:51.203 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M012: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:08:52.206 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M012: Failed to receive status message: Output 2: Failed finally after 3 tries

2022-11-23 11:09:02.216 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M009: Failed to receive status message: Binary Sensors: Failed finally after 3 tries

2022-11-23 11:18:58.706 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M011: Failed to receive status message: Output 2: Failed finally after 3 tries

2022-11-23 11:19:01.710 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M008: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:19:01.711 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M007: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:19:01.712 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M012: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:19:02.715 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M012: Failed to receive status message: Output 2: Failed finally after 3 tries

2022-11-23 11:19:12.725 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M009: Failed to receive status message: Binary Sensors: Failed finally after 3 tries

2022-11-23 11:24:17.870 [TRACE] [g.lcn.internal.connection.Connection] - Sent: ^ping1636

2022-11-23 11:25:17.870 [TRACE] [g.lcn.internal.connection.Connection] - Sent: ^ping1637

2022-11-23 11:27:17.872 [TRACE] [g.lcn.internal.connection.Connection] - Sent: ^ping1639

2022-11-23 11:28:17.873 [TRACE] [g.lcn.internal.connection.Connection] - Sent: ^ping1640

2022-11-23 11:28:54.197 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000011.SMA2

2022-11-23 11:28:57.200 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000008.SMA1

2022-11-23 11:28:57.203 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000007.SMA1

2022-11-23 11:28:57.205 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000012.SMA1

2022-11-23 11:28:58.202 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000012.SMA2

2022-11-23 11:28:59.203 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000011.SMA2

2022-11-23 11:29:02.206 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000008.SMA1

2022-11-23 11:29:02.208 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000007.SMA1

2022-11-23 11:29:02.211 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000012.SMA1

2022-11-23 11:29:03.207 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000012.SMA2

2022-11-23 11:29:04.208 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000011.SMA2

2022-11-23 11:29:07.211 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000008.SMA1

2022-11-23 11:29:07.213 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000007.SMA1

2022-11-23 11:29:07.216 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000012.SMA1

2022-11-23 11:29:08.212 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000009.SMB

2022-11-23 11:29:08.214 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000012.SMA2

2022-11-23 11:29:09.215 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M011: Failed to receive status message: Output 2: Failed finally after 3 tries

2022-11-23 11:29:12.221 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M008: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:29:12.223 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M007: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:29:12.224 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M012: Failed to receive status message: Output 1: Failed finally after 3 tries

2022-11-23 11:29:13.227 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000009.SMB

2022-11-23 11:29:13.227 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M012: Failed to receive status message: Output 2: Failed finally after 3 tries

2022-11-23 11:29:17.874 [TRACE] [g.lcn.internal.connection.Connection] - Sent: ^ping1641

2022-11-23 11:29:18.233 [TRACE] [g.lcn.internal.connection.Connection] - Sent: >M000009.SMB

2022-11-23 11:29:23.237 [WARN ] [ding.lcn.internal.connection.ModInfo] - S000M009: Failed to receive status message: Binary Sensors: Failed finally after 3 tries

Status messages are requested but no replies are received.
The LCN log (LinHK) does show the requested status messages:

|11:29:40.196326|9|>M000009.A1DI000002|
|---|---|---|
|11:29:18.406066|9|:M000009Bx000|
|11:29:18.258825|4|>M000009.SMB|
|11:29:13.405956|9|:M000009Bx000|
|11:29:13.253437|4|>M000009.SMB|
|11:29:08.407503|9|:M000009Bx000|
|11:29:08.237001|4|>M000009.SMB|
|11:29:07.404117|9|&M000009.MST2SC=139581;TS=...|
|11:28:17.271235|9|&M000009.MST122d,16:55:39;...|
|11:27:39.093486|9|&M000009.MW000000119600000|
|11:27:27.187909|9|=M000009.TLAAAAAAAAAAAANNNN|
|11:26:37.209731|9|%M000009.S101165|
|11:25:40.232769|9|>M000009.A1DI000002|
|11:24:06.152660|9|&M000009.MST2SC=139572;TS=...|
|11:23:40.243483|9|>M000009.A1DI000002|
|11:23:15.931493|9|&M000009.MST122d,16:50:37;...|
|11:22:39.170322|9|&M000009.MW000000119600000|
|11:22:26.133891|9|=M000009.TLAAAAAAAAAAAANNNN|
|11:21:40.260233|9|>M000009.A1DI000002|

Best,
Jo

PS: the LCN log was filtered with expression: ‘m000009’

Intresting behavior. I’ll have a look at the weekend.

1 Like

I looked into the code but couldn’t find any particular reason that could cause these lock-ups. As there are several users using openHAB with PCHK, one could come to the conclusion it’s something related to LinHK.

I have had feedback that the binding isn’t working with the DOMIQ gateway, due to some line ending issues. Unfortunately the communication with the user stalled, so I couldn’t investigate it further.

Can you post the logs at TRACE level when the transition from working to the locked-up state happens? Could be that there’s a deadlock in the connection state machine, especially if something unusual happens, which could be caused by LinHK.

I validated that the openHAB binding is processing status messages to segment 4 correctly. This is necessary if the LCN module is configured to send status messages globally to all segments.