[SOLVED] Can not get modbus tcp to work

jesperdk · March 2, 2019, 4:54pm

Platform information:
- Hardware: Pi 3 B+
- OS: Rasberian 4.14.79, a few months old
- Java Runtime Environment: from openhabian, a few months old
- openHAB version: 2.4.0-1
Issue of the topic:
I can successfully read registers with QModMaster, as you can see below:

QModMaster583×433 15.4 KB
Please post configurations (if applicable):
I am trying to read the same with OpenHAB.
The Modbus TCP slave:

OpenHABmodbus1983×220 11 KB

And the poller:

If logs where generated please post these here using code fences:
I get the following in the log:

2019-03-02 17:36:50.095 [hingStatusInfoChangedEvent] - 'modbus:poller:4df4df10' changed from UNINITIALIZED to OFFLINE
2019-03-02 17:36:50.108 [hingStatusInfoChangedEvent] - 'modbus:poller:4df4df10' changed from OFFLINE to UNINITIALIZED (DISABLED)
2019-03-02 17:36:51.316 [hingStatusInfoChangedEvent] - 'modbus:tcp:003bb2e0' changed from ONLINE to UNINITIALIZED
2019-03-02 17:36:51.323 [hingStatusInfoChangedEvent] - 'modbus:tcp:003bb2e0' changed from UNINITIALIZED to UNINITIALIZED (DISABLED)
2019-03-02 17:38:28.220 [hingStatusInfoChangedEvent] - 'modbus:tcp:003bb2e0' changed from UNINITIALIZED (DISABLED) to INITIALIZING
2019-03-02 17:38:28.230 [hingStatusInfoChangedEvent] - 'modbus:tcp:003bb2e0' changed from INITIALIZING to ONLINE
2019-03-02 17:39:10.526 [me.event.ThingUpdatedEvent] - Thing 'modbus:poller:4df4df10' has been updated.
2019-03-02 17:39:21.043 [hingStatusInfoChangedEvent] - 'modbus:poller:4df4df10' changed from UNINITIALIZED (DISABLED) to INITIALIZING
2019-03-02 17:39:21.064 [hingStatusInfoChangedEvent] - 'modbus:poller:4df4df10' changed from INITIALIZING to ONLINE
2019-03-02 17:39:34.614 [ERROR] [wimpi.modbus.io.ModbusTCPTransaction] - execute try 1/1 error: I/O exception: SocketTimeoutException Read timed out. Request: net.wimpi.modbus.msg.ReadMultipleRegistersRequest@f96306 (unit id 1 & transaction 67). Address: /192.168.86.221:502
2019-03-02 17:39:34.618 [ERROR] [wimpi.modbus.io.ModbusTCPTransaction] - execute reached max tries 1, throwing last error: I/O exception: SocketTimeoutException Read timed out. Request: net.wimpi.modbus.msg.ReadMultipleRegistersRequest@f96306 (unit id 1 & transaction 67). Address: /192.168.86.221:502
2019-03-02 17:39:34.622 [ERROR] [rt.modbus.internal.ModbusManagerImpl] - Last try 1 failed when executing request (ModbusPollerThingHandlerImpl.ModbusPollerReadRequest@c6fa41[slaveId=1,functionCode=READ_MULTIPLE_REGISTERS,start=8195,length=4,maxTries=1]). Aborting. Error was I/O error, so reseting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: SocketTimeoutException Read timed out [operation ID 9f4b1d46-1940-4784-b6ee-8d282b2b089d]
2019-03-02 17:39:34.641 [hingStatusInfoChangedEvent] - 'modbus:poller:4df4df10' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Error with read: org.openhab.io.transport.modbus.internal.ModbusSlaveIOExceptionImpl: Modbus IO Error with cause=ModbusIOException, EOF=false, message='I/O exception: SocketTimeoutException Read timed out', cause2=null
2019-03-02 17:39:45.102 [hingStatusInfoChangedEvent] - 'modbus:poller:4df4df10' changed from OFFLINE (COMMUNICATION_ERROR): Error with read: org.openhab.io.transport.modbus.internal.ModbusSlaveIOExceptionImpl: Modbus IO Error with cause=ModbusIOException, EOF=false, message='I/O exception: SocketTimeoutException Read timed out', cause2=null to ONLINE

I am clueless about what to do and would appreciate any help, thank you

rossko57 · March 2, 2019, 5:48pm

Thing settings look reasonable.

That’s the key bit, no response.

Did you run QModMaster on the same pi box? Thinking ethernet pathway issue.

Might be out-by-one addressing/numbering; try polling just one register 8196.

jesperdk · March 2, 2019, 5:54pm

I have tried different registers, does not seem to change the response.

I did run the QModMaster on my WIndows pc, but they are on the same WiFi. A ping works fine on the Pi. Does modbus tcp use any kinds of broadcast? Because straight TCP should have no trouble reaching the device.

Do you know of any tool I might try to use for test on the Pi?

And thanks for the reply!

rossko57 · March 2, 2019, 7:29pm

It’s just TCP.

Assuming you haven’t got the tcp bridge parameter connectTimeoutMillis set to something daft, this isn’t a simple poll-timeout event.
Here’s my log for a TCP poll of a completely non-existent slave (OH 2.4)

2019-03-01 00:19:26.647 [ERROR] [ing.ModbusSlaveConnectionFactoryImpl] - connect try 1/2 error: connect timed out. Connection TCPMasterConnection@7e9356[socket=Socket[unconnected]]. Endpoint ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502]
2019-03-01 00:19:34.697 [ERROR] [ing.ModbusSlaveConnectionFactoryImpl] - connect try 1/2 error: connect timed out. Connection TCPMasterConnection@2b36a4[socket=Socket[unconnected]]. Endpoint ModbusTCPSlaveEndpoint@d35dd4[address=192.168.1.14,port=502]
2019-03-01 00:19:36.708 [ERROR] [ing.ModbusSlaveConnectionFactoryImpl] - connect try 2/2 error: connect timed out. Connection TCPMasterConnection@7e9356[socket=Socket[unconnected]]. Endpoint ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502]
2019-03-01 00:19:36.709 [ERROR] [ing.ModbusSlaveConnectionFactoryImpl] - re-connect reached max tries 2, throwing last error: connect timed out. Connection TCPMasterConnection@7e9356[socket=Socket[unconnected]]. Endpoint ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502]
2019-03-01 00:19:36.710 [ERROR] [ing.ModbusSlaveConnectionFactoryImpl] - Error connecting connection TCPMasterConnection@7e9356[socket=Socket[unconnected]] for endpoint ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502]: connect timed out
2019-03-01 00:19:36.711 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Could not connect to endpoint ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502] -- aborting request ModbusPollerThingHandlerImpl.ModbusPollerReadRequest@190d1e8[slaveId=24,functionCode=READ_MULTIPLE_REGISTERS,start=20,length=2,maxTries=3] [operation ID 1206a95b-d10c-4967-81c3-1109d2abe653]
2019-03-01 00:19:36.712 [ERROR] [ernal.handler.ModbusDataThingHandler] - Thing modbus:data:th2h:th2h_hold:th2h_h20 'Modbus data' had ModbusConnectionException error on read: ModbusConnectionException(Error connecting to endpoint=ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502])
2019-03-01 00:19:36.714 [ERROR] [ernal.handler.ModbusDataThingHandler] - Thing modbus:data:th2h:th2h_hold:th2h_h21 'Modbus data' had ModbusConnectionException error on read: ModbusConnectionException(Error connecting to endpoint=ModbusTCPSlaveEndpoint@a462ce[address=192.168.1.12,port=502])
2019-03-01 00:19:44.759 [ERROR] [ing.ModbusSlaveConnectionFactoryImpl] - connect try 2/2 error: connect timed out. Connection TCPMasterConnection@2b36a4[socket=Socket[unconnected]]. Endpoint ModbusTCPSlaveEndp

It’s quite different. It’s a windows host but that wouldn’t affect the general scheme of things. Something funny going on.

Make sure you shut down your diagnostic on the other PC, in case it’s hogging the only socket at your slave.

jesperdk · March 3, 2019, 10:13am

yeah, I did try that. I also tried letting my diagnostic run on the pc - as expected that blocks the connection, and gives an appropriate error in the log. Something else is going on. I am fairly certain that nobody body blocks the port and that OH can do the connect - it seems to be the read that times out.
I have tried reboot of both my pc and the Pi, but no change - I am at loss, but will try enabling a higher log level.

rossko57 · March 3, 2019, 11:06am

None of the usual ‘finger trouble’ here, decimal/hex address confusion, register types etc. What is the target device?

Confident there are currently no other modbus Things in OH to interfere? Did you have a few goes at creating these (thinking something unusable cached somewhere)

Some debug logging from the binding should help. Let’s take “timeout” as description with a pinch of salt for now.

jesperdk · March 3, 2019, 11:28am

Ok, good progress. Unfortunately I did two things at the same same, and not sure which gave the progress.
The modbus binding seems to default to disconnect after each read. I got the idea to change that. However, OpenHAB started to give me 500/502 errors when trying to change the modbus things.

I removed all my modbus things and redid them with disconnect after 10min. And it seems to work now! I have gotten a single read error exactly like the once I got all the time before, but it has been running for 30 min with two errors (I poll each 30sec).

Ahh, just saw that I get an error after each 10 minutes. Seems like my Smart-me interface to my Kamstrup smartmeter is quite particular about some timing. I am confident I will get it to work good enough now.

@rossko57 thank you for your support!

ps: modbus is a crasy protocol; registers, coils, offsets, holdings - must be the invention of some cray electrical engineer

rossko57 · March 3, 2019, 11:40am

Exactly right. This is intended to stop OH hogging the slave socket in a shared system
As you’ve found, you can configure to hold a connection open.

Okay, so something gets fumbled at connect - or maybe disconnect - but it’s non lethal. Interesting.
As you say, it might just be a timing thing. There are a few TCP tweaks available.
If you can get a debug log of that event our binding guru might take a look ? @ssalonen

It is what it is Remember it comes from a time before PCs existed! and is designed for slow serial connections.

jesperdk · March 3, 2019, 12:34pm

If you can get a debug log of that event our binding guru might take a look ?

I have a debug log, but it does not say any more. Just a timeout on the first read after a connect. I also occasionally get a timeout on later reads while connected. Seems to be a problem with the smartmeter and not with the binding. Only problem is it gives some noise in my log.

modbus is a crasy protocol

It is what it is Remember it comes from a time before PCs existed! and is designed for slow serial connections.

I started to do programming and to implements protocols quite some years before the PCs existed (see the grey beard on my avatar pic ), and modbus still seems crasy

rossko57 · March 3, 2019, 2:13pm

Maybe a delay between connect and first read poll would help.
I’m not sure that timeBetweenTransactionsMillis thing parameter would have any effect on that, but it might.

jesperdk · March 3, 2019, 2:39pm

I did try that, and don’t think it did any difference.

mohsh86 · May 9, 2019, 10:52am

Looks like a tcp session issue

I’ve used multiple modbus simulators:

http://www.plcsimulator.org/

this one limits the number of concurrent connections to 10.
using openhab binding, the connections would quickly fill to 10 when it stops responding and i start getting that error on openhab, on TCP view, the simulator would show CLOSE_WAIT status

WinTECH Software Download Page

this one does now represent any connection limit, on TCP view, it looks like it handles port closure successfully, i.e. only one instance of the of the simulator is showing on TCP view.

I would sayit is based on how the slave is programmed.

On a side note, I was wondering why there isn’t a receiveTimeoutMillis parameter on the bridge tcp while there is on the serial one? there could be an instance where a serial slave is residing behind Modbus-RTU to Modbus-TCP gateway (not just raw medium converter), such as a Moxa MGate.

rossko57 · May 9, 2019, 11:23am

Like other TCP parameters, this is effectively a host system wide attribute. The binding doesn’t have low level control of the stack.
You might introduce a shorter-than-system timeout in the binding -not convinced of the usefulness of that.

As for gateways, Modbus is simplistic and has no view of what is on the “other side” - it’s up to the gateway to manage the serial bus or radio net or whatever it might be.
The gateway might provide access to it’s management controls via some other route - SNMP, HTTP, blah..

mohsh86 · May 9, 2019, 11:34am

thanks for that, i just recalled setting serial timeouts on the modem/gateway.

back to the main issue, is there a way to keep the connection always open?

I’ve just tried reconnectAfterMillis and made it twice the refresh poll thinking that it will utilise existing session but it doesn’t, it creates a new connection anyways.

This might be problematic for some slaves as it expects the Modbus-TCP session to be established and kept establish and session closing time might be longer than the needed refresh poll making the slave eventually run out of allowed concurrent sessions.

Or at least this is the conclusion i came to

rossko57 · May 9, 2019, 11:39am

I’m not very TCP knowledgable at all, but that sounds like correct behaviour from the binding?
If the slave doesn’t respond, there’s not much else we can do but try to connect again.
The slave lets failed connections pile up, up to its limit.
The question is whether the binding can do anything about that. Is there a missing disconnect, before connect? Are some TCP targets smart enough to recognize reconnect attempts, and tidy up?

mohsh86 · May 9, 2019, 1:07pm

I’ve done some home work and debugging with wireshark, it seems that the modbus simulaotr (the problematic one mod_RSsim) failed to complete the 4 way hand shake for termination a TCP connection

As when i terminate /close the modbus connection from the master (Modscan) that runs locally on the same machine, Modscan.exe goes into FIN_WAIT2 and mod_RSsim.exe goes into CLOSE_WAIT.

as per this link and this it seem it’s the server/slave fault for not terminating the connection properly.

Given that many modbus devices might be using in house programming, it could be the case of the slave failing to terminate the connection.

the following diagram was obtained from the second link:

which explains that the server fails to go from CLOSE_WAIT to LAST_ACK and send a final FIN to the client/master, hence server is stuck at CLOSE_WAIT and client stuck at FIN_WAIT2.

monitoring the connection they’ll eventually terminate after a while, but that is longer than the polling period.

a workaround or enhancement of the binding is to allow the connection to be kept open.

rossko57 · May 9, 2019, 1:08pm

It means what it says, i.e. it will reconnect after that time, not after that time idle. You’d generally want it many times longer than poll rate for effect, certainly longer than twice. Minutes

rossko57 · May 9, 2019, 1:13pm

Okay, are you happy the binding is doing the right thing to (try to) tidily close a failed connection? I think the author has been all around this before, but it doesn’t hurt to look closely at error handling when you can reproduce a problem.

mohsh86 · May 9, 2019, 1:19pm

It wasn’t a failed connection to begin with, it closes the connection after every poll, that isn’t common behavior in the modbus world, or at least the industrial one.

Sorry i wasn’t paying attention, is there a post what the author referred to ?

Reproducing the problem is easy on the simulator that i’ve used (mod_RSsim), if have an old boiler that talks modbus and behaves the same, it be better to have a steady TCP connection by default, and the terminate after each poll as an optional parameter

@ssalonen @davidgraeff any ideas?

rossko57 · May 9, 2019, 1:33pm

reconnectAfterMillis

It’s not completely unlimited, I don’t know what the upper limit is - days?

Reconnect after each poll is the bindings standard behaviour because most TCP devices ‘should’ deal with it nicely. This is thought desirable because some devices only allow one connection, and OH should not hog it and block other users. Pros and cons either way, but has to be one way or the other, with winners and losers.