Modbus binding, problems with silently disconnected items

:smile:
No, thanks - Wago is connected and working, IP was set by Wago BootP-Server.
Only sometimes Openhab loses the connection, for whatever reason.

Well, it’s your problem. You don’t have to investigate it. What would you like to happen next?

Sorry, I think there’s been a bit of a misunderstanding.
I understood stamate_viorel’s contributions to mean that he wanted to show me how to make the connection in general.

If that was not what was meant, please explain what was actually meant.

I wanted to wait for the next incident (in the hope that this will never happen, because firstly the parameter “timeBetweenTransactionsMillis” has been added and secondly I have changed something in the wiring).

It is for sure something in your configuration (wiring or timing).

The basic rule would be not to try to read values faster than necessary.

I use Modbus TCP and have no problem with stops of communication. Also I won’t doubt on the Wago since that thing is industrial device and has been tested a lot in the world.

Hello, just a quick feedback:
No problems for 10 days


1 Like

That’s good, sounds like configuring the TCP to be a bit more gentle to the WAGO has avoided some issues.

Appreciate the conversation is a good 5 month old.
I am/experiencing the exact same symptom/issue hence wanted to share some additional insights and pointers.
Same to what Duke_Jones reported, I ran into same issue over past 24h with all items connected to modbus things staying “flat” not changing value anymore.

Setup:
Remarkably comparable setup as inferred from Duke_Jones description and logs:

  • modbus tcp things via modbus binding
  • a central boiler /heating system being queried
  • system connected to a network where a FritzBox providing DHCP and DNS service
    Only difference - the control unit of my KWB boiler supports modbus tcp out of the box (no WAGO/ESP serial interface)

Additional insights:
From the openhab logs it seems that after the last, terminal attempt to contact the modbus server, the complete modbus tcp binding - more precise the ModbusSlaveConnectionFactory - erases/loses the connection class:

2023-03-20 21:44:52.966 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - Connect reached max tries 1, throwing last error: connect timed out. Connection TCPMasterConnec
tion [m_Socket=Socket[unconnected], m_Timeout=3000, m_Connected=false, m_Address=bge-ef2-ps8.fritz.box/192.168.178.104, m_Port=502, m_ModbusTransport=net.wimpi.modbus.i
o.ModbusTCPTransport@7b5007b5, m_ConnectTimeoutMillis=10000, rtuEncoded=false]. Endpoint ModbusIPSlaveEndpoint [address=bge-ef2-ps8.fritz.box, port=502]
2023-03-20 21:44:52.968 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - Error connecting connection TCPMasterConnection [m_Socket=Socket[unconnected], m_Timeout=3000,
m_Connected=false, m_Address=bge-ef2-ps8.fritz.box/192.168.178.104, m_Port=502, m_ModbusTransport=net.wimpi.modbus.io.ModbusTCPTransport@7b5007b5, m_ConnectTimeoutMilli
s=10000, rtuEncoded=false] for endpoint ModbusIPSlaveEndpoint [address=bge-ef2-ps8.fritz.box, port=502]: connect timed out

After this terminal attempt, we can see java.lang.InterruptedException reporting null pointers/objects:

2023-03-20 21:46:35.356 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - KeyedPooledModbusSlaveConnectionFactory: Unknown host: bge-ef2-ps8.fritz.box: Temporary failure in name resolution. Connection creation failed.
2023-03-20 21:46:35.359 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - Error connecting connection null for endpoint ModbusIPSlaveEndpoint [address=bge-ef2-ps8.fritz.box, port=502]: null

This is the point in time from which onwards any further connection attempts stop. No attempts reported in log anymore.

Attempt to restart bridge

  1. Attempted restart via disable-enable:
    Trying to get the modbus bridge (or the poller bridges) restarted by toggling disable->enable the bridges does NOT reactivate a proper connection.
    The poller bridges seem to fire up again, but all are reporting a connection exception:
[INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'modbus:poller:XXX:YYY' changed from OFFLINE (COMMUNICATION_ERROR): Error with read: org.openhab.core.io.transport.modbus.exception.ModbusConnectionException: Error connecting to endpoint ModbusIPSlaveEndpoint
  1. Attempted restart by parsing updated .things file*
    Modiyfing the .things file (e.g. by modifying a timer of the modbus bridge, just to trigger re-reading) is immediately detected by openhab, however also NOT reestablishing a proper connection; HANDLER_MISSING_ERROR
2023-03-21 10:00:02.033 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'modbus:poller:KWBCombifire2:universal' changed from ONLINE to UNINITIALIZED
2023-03-21 10:00:02.038 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'modbus:poller:KWBCombifire2:universal' changed from UNINITIALIZED to OFFLINE
2023-03-21 10:00:02.042 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'modbus:poller:KWBCombifire2:universal' changed from OFFLINE to UNINITIALIZED (HANDLER_MISSING_ERROR)

I would have expected at the minimum from the binding, that once the thing file is re-read, that the complete binding/bridge is re-initialised from scratch (involving all classes re-created, retry counters etc set back to 0 etc.)

My modest conclusion:

  • The triggering error is a temporary unreachability of the modbus server OR the openHAB system. More particularly, from what my log reports it is a clear DNS resolution error.
    openHAB cannot resolve the hostname was the initial error.
    In any case, the FritzBox has lost connection to the mosbus server or the openHAB system.
    As both are connected vie WiFi, this lost-detection is rather fast (radio client lost), and not unlikely

  • Still there is an odd behaviour with the modbus binding. I would say there is a bug:
    While on one had one could argue, after the maximum retries of the modbus bridge has been reached : (e.g. Bridge:modbus:tcp:... [...connectMaxTries=1]), running into a permanet/terminal error is correct, configured behavior,
    on the other hand, when the .thing file is being re-read, or the bridge being re-enabled (after disable), it certainly should completely reinitialise and start over again.And it does not.
    It seems the ModbusSlaveConnectionFactory does not “recover” (somwehow in interplay with KeyedPooledModbusSlaveConnectionFactory)

  • As the item does not change anymore(is not updated by the binding), it is also clear that the item holds the last value it was set to infinetley.
    Now it is further debatable → could be a feature request to a new configuration option, if upon a binding/bridge error, all connected items should be updated and set to NaN.
    This way an item under a stalled connection would clearly differentiate from a item (connected to a thing) that is online but not changing its value.

I hope this will be useful to one or another. I welcome your feedback. If the community agrees with my conclusion I would look into proposing a feature and filing a bug ID.

Best regards,

1 Like

Hi, I am maintainer for the binding

Much appreciate the logs, it shows some clear insights

I think indeed this is a bug and really corner case - Connection interruptions are generally well handled by the binding, it recovers well based on my experience

Where do you have the InterruptedException - I see none. Perhaps you are missing some log lines?

Can you output any relevant log lines that follow this?

UPDATE: potential fix here [modbus] Fix error handling with DNS resolution / Unknown host errors by ssalonen · Pull Request #3490 · openhab/openhab-core · GitHub – I think it might be an unhandled exception

Hi, many thanks for your response and reaction to my post.
Please find below relevant excerpts from log with the reported Exception.

2023-03-21 10:00:02.536 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Error getting a new connection for endpoint ModbusIPSlaveEndpoint [address=bge-ef2-ps8.fritz.box, port=502]. Error was: java.lang.InterruptedException null 
2023-03-21 10:00:02.538 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Could not connect to endpoint ModbusIPSlaveEndpoint [address=bge-ef2-ps8.fritz.box, port=502] -- aborting request ModbusReadRequestBlueprint [slaveId=1, functionCode=READ_INPUT_REGISTERS, start=8197, length=59, maxTries=3] [operation ID 14e6d99f-c13e-4b7b-8e86-3615c813ca1f]

To clarify the exact chronology:
The original event reported above at around 21.44h -21.46h in the logs, leading to a stalled modbus bridge/connection. The system remained in that state until I discovered next morning.
These Exceptions were then logged 12h14min later, at 10h sharp, just the moment I tried to restart the bridge via cli (start/stop)

bundle:stop org.openhab.binding.modbus
bundle:start org.openhab.binding.modbus

and/or by updating the .things file to trigger re-loading respectively.
The binding would not reinitialise. The connection was not reestablished. This i concluded to be in relation to this Exception reported.

Finally i restarted the complete service (sudo systemctl restart openhab.service) which made the modbus bridge come up properly again.

Ok thanks for the extra info.

The fix above ought to help, I hope. I could see that with this error (dns resolution fails), there will be eventually an unhandled exception. The unhandled exception might be the reason the modbus transport bundle (not the binding) goes into unrecoverable error state.

I suspect you are ok to wait for the next release with the fix?

Thanks a lot Sami,
if there was an option to just swap the four compiled .class files that are part of the build from this pull request it would be lovely, since connection dies every second day


2023-03-30 15:25:27.307 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 1 out of 3 failed when executing request (ModbusReadRequestBlueprint [slaveId=1, functionCode=READ_INPUT_REGISTERS, start=8608, length=98, maxTries=3]). Will try again soon. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: SocketTimeoutException Read timed out [operation ID 0468c4b1-a6f6-4d4c-b6e0-4ca919893944]
2023-03-30 15:25:43.536 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - KeyedPooledModbusSlaveConnectionFactory: Unknown host: REMOVED.fritz.box: Temporary failure in name resolution. Connection creation failed.
2023-03-30 15:25:43.539 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - Error connecting connection null for endpoint ModbusIPSlaveEndpoint [address=REMOVED.fritz.box, port=502]: null
2023-03-30 15:25:43.540 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Could not connect to endpoint ModbusIPSlaveEndpoint [address=REMOVED.fritz.box, port=502] -- aborting request ModbusReadRequestBlueprint [slaveId=1, functionCode=READ_INPUT_REGISTERS, start=8573, length=28, maxTries=3] [operation ID 6d1d043e-5c99-44c4-a386-0b9ec6a07a5a]

Else I will survive till next release.

->EDITED 20230401: I rahter chose to not fiddle and completely upgrade to 4.0 SNAPSHOT build. Has worked flawless over past 14h. Will keep monitoring and post back after a while. Many thanks again!

BR I

Unfortunateley the patch did not resolve the issue.
Connection dies after temporary name resolution error.
While it feels a more “controlled” dying of the connection- “Cannot connect because “connection” is null”, the result is the same:

  • There is no recoverey
  • According to logs connection dies after first connection attempt, no retries albeit configured.
2023-04-02 18:07:35.248 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 1 out of 3 failed when executing request (ModbusReadRequestBlueprint [slaveId=1, functionCode=READ_INPUT_REGISTERS, start=9497, length=2, maxTries=3]). Will try again soon. Error was I/O error, so resetting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: SocketTimeoutException Read timed out [operation ID dc83ce11-9579-432c-a58c-7e2a14cd323a]
2023-04-02 18:07:51.286 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - KeyedPooledModbusSlaveConnectionFactory: Unknown host: REMOVED.fritz.box: Temporary failure in name resolution. Connection creation failed.
2023-04-02 18:07:51.289 [WARN ] [ing.ModbusSlaveConnectionFactoryImpl] - Error connecting connection null for endpoint ModbusIPSlaveEndpoint [address=REMOVED.fritz.box, port=502]: Cannot invoke "net.wimpi.modbus.net.ModbusSlaveConnection.isConnected()" because "connection" is null
2023-04-02 18:07:51.290 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Could not connect to endpoint ModbusIPSlaveEndpoint [address=bge-ef2-ps8.fritz.box, port=502] -- aborting request ModbusReadRequestBlueprint [slaveId=1, functionCode=READ_INPUT_REGISTERS, start=8708, length=70, maxTries=3] [operation ID 0d6a4f65-5b7b-48df-8fb9-50f94fadefd2]

BR I

Where did you get the snapshot build? The pull request has not been yet merged, thus it is not available via normal means.

I am on Build #3390 from 31.03.2023

   ___   ___   ___   ___  | | | |   / \   | __ )
  / _ \ / _ \ / _ \ / _ \ | |_| |  / _ \  |  _ \
 | (_) | (_) |  __/| | | ||  _  | / ___ \ | |_) )
  \___/|  __/ \___/|_| |_||_| |_|/_/   \_\|____/
       |_|       4.0.0-SNAPSHOT - Build #3390

As I’m testing on a rapi, I was simply using openhabian-config to go to latest snapshot. My understanding was/is it pulls directly from artifactory on https://openhab.jfrog.io/

BR I

As @ssalonen pointed out, the patch / PR has not been merged, therefore is not included in latest SNAPSHOT.

1 Like

Thanks a lot for this confirmation.
I’ve seen the build run through ok ([modbus] Fix error handling with DNS resolution / Unknown host errors · openhab/openhab-core@905d5ae · GitHub) hence I wrongly concluded this was already merged and part of the snapshot

Apologies for this. And thanks again to all for your responsiveness.
BR I

1 Like