[SOLVED] Can not get modbus tcp to work

None of the usual ‘finger trouble’ here, decimal/hex address confusion, register types etc. What is the target device?

Confident there are currently no other modbus Things in OH to interfere? Did you have a few goes at creating these (thinking something unusable cached somewhere)

Some debug logging from the binding should help. Let’s take “timeout” as description with a pinch of salt for now.

Ok, good progress. Unfortunately I did two things at the same same, and not sure which gave the progress.
The modbus binding seems to default to disconnect after each read. I got the idea to change that. However, OpenHAB started to give me 500/502 errors when trying to change the modbus things.

I removed all my modbus things and redid them with disconnect after 10min. And it seems to work now! I have gotten a single read error exactly like the once I got all the time before, but it has been running for 30 min with two errors (I poll each 30sec).

Ahh, just saw that I get an error after each 10 minutes. Seems like my Smart-me interface to my Kamstrup smartmeter is quite particular about some timing. I am confident I will get it to work good enough now.

@rossko57 thank you for your support!

ps: modbus is a crasy protocol; registers, coils, offsets, holdings - must be the invention of some cray electrical engineer :wink:

Exactly right. This is intended to stop OH hogging the slave socket in a shared system
As you’ve found, you can configure to hold a connection open.

Okay, so something gets fumbled at connect - or maybe disconnect - but it’s non lethal. Interesting.
As you say, it might just be a timing thing. There are a few TCP tweaks available.
If you can get a debug log of that event our binding guru might take a look ? @ssalonen

It is what it is :crazy_face: Remember it comes from a time before PCs existed! and is designed for slow serial connections.

If you can get a debug log of that event our binding guru might take a look ?

I have a debug log, but it does not say any more. Just a timeout on the first read after a connect. I also occasionally get a timeout on later reads while connected. Seems to be a problem with the smartmeter and not with the binding. Only problem is it gives some noise in my log.

modbus is a crasy protocol

It is what it is :crazy_face: Remember it comes from a time before PCs existed! and is designed for slow serial connections.

I started to do programming and to implements protocols quite some years before the PCs existed (see the grey beard on my avatar pic :zombie:), and modbus still seems crasy :stuck_out_tongue_closed_eyes:

Maybe a delay between connect and first read poll would help.
I’m not sure that timeBetweenTransactionsMillis thing parameter would have any effect on that, but it might.

I did try that, and don’t think it did any difference.

Looks like a tcp session issue

I’ve used multiple modbus simulators:

  1. http://www.plcsimulator.org/
  • this one limits the number of concurrent connections to 10.

  • using openhab binding, the connections would quickly fill to 10 when it stops responding and i start getting that error on openhab, on TCP view, the simulator would show CLOSE_WAIT status

  1. https://www.win-tech.com/html/demos.htm
  • this one does now represent any connection limit, on TCP view, it looks like it handles port closure successfully, i.e. only one instance of the of the simulator is showing on TCP view.

I would sayit is based on how the slave is programmed.

On a side note, I was wondering why there isn’t a receiveTimeoutMillis parameter on the bridge tcp while there is on the serial one? there could be an instance where a serial slave is residing behind Modbus-RTU to Modbus-TCP gateway (not just raw medium converter), such as a Moxa MGate.

Like other TCP parameters, this is effectively a host system wide attribute. The binding doesn’t have low level control of the stack.
You might introduce a shorter-than-system timeout in the binding -not convinced of the usefulness of that.

As for gateways, Modbus is simplistic and has no view of what is on the “other side” - it’s up to the gateway to manage the serial bus or radio net or whatever it might be.
The gateway might provide access to it’s management controls via some other route - SNMP, HTTP, blah…

thanks for that, i just recalled setting serial timeouts on the modem/gateway.

back to the main issue, is there a way to keep the connection always open?

I’ve just tried reconnectAfterMillis and made it twice the refresh poll thinking that it will utilise existing session but it doesn’t, it creates a new connection anyways.

This might be problematic for some slaves as it expects the Modbus-TCP session to be established and kept establish and session closing time might be longer than the needed refresh poll making the slave eventually run out of allowed concurrent sessions.

Or at least this is the conclusion i came to :roll_eyes:

I’m not very TCP knowledgable at all, but that sounds like correct behaviour from the binding?
If the slave doesn’t respond, there’s not much else we can do but try to connect again.
The slave lets failed connections pile up, up to its limit.
The question is whether the binding can do anything about that. Is there a missing disconnect, before connect? Are some TCP targets smart enough to recognize reconnect attempts, and tidy up?

I’ve done some home work and debugging with wireshark, it seems that the modbus simulaotr (the problematic one mod_RSsim) failed to complete the 4 way hand shake for termination a TCP connection

As when i terminate /close the modbus connection from the master (Modscan) that runs locally on the same machine, Modscan.exe goes into FIN_WAIT2 and mod_RSsim.exe goes into CLOSE_WAIT.

as per this link and this it seem it’s the server/slave fault for not terminating the connection properly.

Given that many modbus devices might be using in house programming, it could be the case of the slave failing to terminate the connection.

the following diagram was obtained from the second link:

which explains that the server fails to go from CLOSE_WAIT to LAST_ACK and send a final FIN to the client/master, hence server is stuck at CLOSE_WAIT and client stuck at FIN_WAIT2.

monitoring the connection they’ll eventually terminate after a while, but that is longer than the polling period.

a workaround or enhancement of the binding is to allow the connection to be kept open.

It means what it says, i.e. it will reconnect after that time, not after that time idle. You’d generally want it many times longer than poll rate for effect, certainly longer than twice. Minutes :smiley:

Okay, are you happy the binding is doing the right thing to (try to) tidily close a failed connection? I think the author has been all around this before, but it doesn’t hurt to look closely at error handling when you can reproduce a problem.

It wasn’t a failed connection to begin with, it closes the connection after every poll, that isn’t common behavior in the modbus world, or at least the industrial one.

Sorry i wasn’t paying attention, is there a post what the author referred to ?

Reproducing the problem is easy on the simulator that i’ve used (mod_RSsim), if have an old boiler that talks modbus and behaves the same, it be better to have a steady TCP connection by default, and the terminate after each poll as an optional parameter

@ssalonen @davidgraeff any ideas?

reconnectAfterMillis

It’s not completely unlimited, I don’t know what the upper limit is - days?

Reconnect after each poll is the bindings standard behaviour because most TCP devices ‘should’ deal with it nicely. This is thought desirable because some devices only allow one connection, and OH should not hog it and block other users. Pros and cons either way, but has to be one way or the other, with winners and losers.

Since you’ve mentioned some devices allows only one connection, If OH2 is the only master polling that device, and that device is having the CLOSE_WAIT behavior; whether reconnectAfterMillis is set to 0 or 86400000 (a day), there will be a moment where max connections (in our case 1) is reach and comms will be lost, which is not ideal, since we don’t know when exactly the misbehaving slave device will eventually drop the CLOSE_WAIT session.

Yes indeed, if the slave has a problem then that one-connection limitation makes it a bigger problem. One way to mitigate that is to use the reconnectAfterMillis parameter.
There is no way to remotely fix that slaves problem from openHAB. It will experience the same problem in any other environment. Not sure where we’re going with this.

add an option to keep the connection open?
reconnectAfterMillis = -1 lol or add a new parameter for the tcp bridge in the binding.

I am personally using the binding with a plc slave that accepts only one connection. There are times when connection fails (I presume due to plc interrupt cycle, could be what you mentioned as well) but that no issue at all. You can retry the connection couple of times.

If you really like to have connection open for a long ime , you can set

2,147,483,647 milliseconds which is many weeks.

It’s currently not possible to have it open forever.

Good point. That can be made configurable if it is an issue. So far I have not heard problems on this

There is actually default of 3 seconds, which is plenty of time