Modbus connection problem

walter · January 12, 2016, 9:57am

Hi, I have installed new 1.8.0 version, before I was using 1.8.0 snapshot.

In my enviroment I have several binding including modbus TCP. I’m reading some devices from openhab and from an other machine with a script that read some values and store it in a db for energy data.

Using modbus binding after @paolo_denti change on #3228 PR all was fine, openhab can read and my external reader too all in same time without problem.

With latest version openhab works and I can read modbus devices but no way to read it from external program like before. I think it’s a problem about connections on modbus devices. I think that OH not close connection anymore.
@ssalonen I see your PR #3512. Is maybe changed something?

Can you give me a feedback about it?

BR
Walter

paolo_denti · January 12, 2016, 11:26am

@ssalonen
I see that you removed my previous fix, deleting the 3 lines I added

Why?

`- if (m_Connection != null && m_Connection.isConnected()) {

```
   	m_Connection.close();
```
```
  }`
```

ssalonen · January 12, 2016, 1:17pm

@paolo_denti Darn, you are correct! Obviously that was a mistake - sorry for that!

I think I have deleted those with the following logic

there is reconnecting parameter already in jamod serving that purpose (but when looking at it, openhab binding has hardcoded default of reconnecting=false )
various other fixes were done to close connections, for example ModbusTCPSlave did not close the socket on resetConnection, or ModbusXTransport did not close the underlying socket/serial port etc., connections were not closed on update – I concluded that these bugs might have been the reason for the original statement
connections were not closed with other connections (serial ports for example)
Honeywell Modbus TCP implementation guide explicitly recommends not to open & close tcp connection on every modbus transaction (see p. 15 of the linked guide)
Perhaps most importantly, the binding still establishes as many connections as there are slaves defined in openhab.cfg (in the beginning when binding is being configured)

It’s a shame that I commented some of these points to the PR when it had several commits but apparently those are now lost after the commits were squashed into single commit.

Obviously the new behaviour is not good for every use case.

@walter can you paste your modbus configuration (in openhab.cfg)?
- if I understood correctly, you are reading the modbus also with the external script? If this is the case, I believe the amount of connections is limited in the Modbus TCP server you use?
Please note the connection pooling I’m working on: if you have many modbus slave definitions in openhab pointing to the same binding, the amount of connections required will much less with connection pooling.
we might need to re-introduce the old behaviour where connections are closed after every transaction – if we do, I propose we make it configurable

paolo_denti · January 12, 2016, 1:46pm

@ssalonen

In a perfect world the connections are always left always open.
But modbus is used where the world is not perfect and where speed is not the primary target.
And remember that a normal polling cycle (typically with hundreds of addresses to poll) would probably cost hundreds of times the cost of opening and closing a tcp connection (very few millisecs anyway) .

I guess honeywell lives just inside their own offices because the modbus world is very different; I also would like to avoid opening and closing modbus connections …
but I always do it because after years of experience with modbus I found it is the only reliable way to work with.

I also think that the connection pooling in a modbus environment is just overkill in the best scenario, harmful in a more realistic scenario.

The only way I will ever use modbus, if I want a reliable behavior over modbus transport, is to “open, communicate, close”, nothing more. Does it cost a few msec over a maybe 200/300msecs of full communication for data transfer? I can live with it … it the communication stays reliable.

Years ago my connections were left open; after time I started to put it as configurable and keeping the as always on demand; after time I just removed the configuration because it is harmful.
modbus/tcp is not tcp/ip, there is not a reliable way to detect and manage a “falling socket” on the physical device; usually the device just timeouts in order to detect a falling socket, they do not have a full and reliable tcp stack; and within that time you just get bunch of errors … because of keeping those connections opened.

And moreover tcp is very very often implemented over tcp/serial converter (like moxa) with limited capabilities. So a connection pooling should also be mediated with your converter capabilities.

And you cannot modify the modbus binding just thinking that openhab is the only software accessing a modbus device. May be the device is accepting only 1 connection, alternatively from openhab or from other systems. You have to use the resource and free it as soon as possible; Modbus slaves are usually very poor devices and often also badly implemented. The only way to work in a reliable way with them is the simple one: “open, talk, close”… and keeping every communication very well queued in FIFO style for each modbus slave target.

walter · January 12, 2016, 1:52pm

@ssalonen I explain you my enviroment:

I have several modbus serial devices (energy counter, water counter, heating counter, meteo station, hvac controller) conneted to a modbus rtu/tcp gateway. I have about 30 modbus devices and I need to read some data from each device.
So I use Openhab to read meteo energy and hvac and other app to read the counters and store data in a db.

Each modbus tcp server can’t accept a lot of concurrently connections so it’s important the @paolo_denti fix!
I’m using a Moxa 3180 tcp/rtu gateway

Then I don’t understand why we need to do configurable this solution. If you use it works with all situations, I don’t see any advantage.

I agree with @paolo_denti

Walter

ssalonen · January 12, 2016, 2:22pm

And remember that a normal polling cycle (typically with hundreds of addresses to poll) would probably cost hundreds of times the cost of opening and closing a tcp connection (very few millisecs anyway) .

I see your point but closing and opening has issues as well. But please note that we have already got feedback from our users that connections/requests attempts are discarded with some PLCs due to the internal clock cycle. Thus I propose that connction closing should be configurable (perhaps defaulting to closing the connection at every poll).

I also think that the connection pooling in a modbus environment is just overkill in the best scenario, harmful in a more realistic scenario.

I think the my term connection pooling might be misleading as the plan is (see the discussion in the thread) to limit the max number of open connections to one (per each modbus endpoint). The idea is not flood the end server with multiple points – I believe this how you interpreted the connection pooling. Furthermore, the connection pooling fixes the long-standing issues we have had with modbus serial slaves in openhab (currently there is no proper synchronization).

I certainly am aware of practical limitations, for example the at-most-one-connection issue is one of the key factors motivating the connection pooling (perhaps the names is misleading, I should talk about connection sharing). Currently the binding is/was unusable since many connections are established. For me this is really solving a practical problem, not over-engineering.

The only way I will ever use modbus, if I want a reliable behavior over modbus transport, is to “open, communicate, close”, nothing more. Does it cost a few msec over a maybe 200/300msecs of full communication for data transfer? I can live with it … it the communication stays reliable.

Years ago my connections were left open; after time I started to put it as configurable and keeping the as always on demand; after time I just removed the configuration because it is harmful.
modbus/tcp is not tcp/ip, there is not a reliable way to detect and manage a “falling socket” on the physical device; usually the device just timeouts in order to detect a falling socket, they do not have a full and reliable tcp stack; and within that time you just get bunch of errors … because of keeping those connections opened.

I really appreciate the practical experience-based feedback. I’m aware of the falling socket issues, I actually incorporated some protection against using SO_KEEPALIVE flag (but nothing’s perfect, I know).

And moreover tcp is very very often implemented over tcp/serial converter (like moxa) with limited capabilities. So a connection pooling should also be mediated with your converter capabilities.

Would you open this up more? I’m not sure I understand fully. In the connection pooling thread we discussed the tcp-serial converters and the general conclusion was that it should fit quite well with those.

And you cannot modify the modbus binding just thinking that openhab is the only software accessing a modbus device. May be the device is accepting only 1 connection, alternatively from openhab or from other systems. You have to use the resource and free it as soon as possible; Modbus slaves are usually very poor devices and often also badly implemented. The only way to work in a reliable way with them is the simple one: “open, talk, close”… and keeping every communication very well queued in FIFO style for each modbus slave target.

I have exactly that case here with my modbus server, one of the reason for going with the connection pooling/sharing in the first place. You are quite right that we cannot assume that and I understand that we need to make the reconnecting functionality configurable… No need to get personal though, I believe we both are frustrated with the issues in the binding…

Related to the assumption that “you cannot modify the modbus binding just thinking that openhab is the only software accessing a modbus device”, I would like invite you to review the pull request #3684. It’s about holding state in the binding (basically assuming that state is not changed by an external party between polls).

I propose that we go as follows

implement configuration (slave specific) key boolean reconnecting to close the connection after every transaction (please see above why it would make sense to make it configurable). Default would be to close the connection after every transaction. Idea would be to move reconnection lines to the finally block. Would you see this is as a useful feature for serial ports as well?

paolo_denti · January 12, 2016, 3:18pm

These are my thoughts and after that I just stop because the matter should include the @maintainer group

Facts:

the last modification to the binding broke existing installations (as @walter stated)
before PR #3512 everything was running smoothly apart from your sentence “The following closes the connections when configuration changes (updated method)” which is a minor problem
your PR #3512 deleted 3 lines that were necessary as part of a previous fix that afflicted almost every user of the modbus binding

My proposal:

let’s roll back to the previous PR and let’s therefore revert the binding to a usable condition
just after that
let’s fix just the “close connection when configuration changes” that is a very minor problem
any other “optimization” should be discussed from scratch.

But i would wait for the @maintainer group opinion before writing anything else

walter · January 12, 2016, 3:59pm

I agree with @paolo_denti. For me it’s impossible to use OpenHab realease in the installations with modbus binding.

watou · January 12, 2016, 4:09pm

Doesn’t this invalidate pull request #3684 straight away?

What is the issue number in github that describes precisely the code regression in 1.8.0 that needs to be fixed in 1.8.1?

ssalonen · January 12, 2016, 4:13pm

@watou Argh – too fast response from me. I meant that the PR 3684 is about fixing that assumption. Current implementation assumes that state is not changed by external party between polls. But perhaps this discussion is better continued in the actual pull request.

ssalonen · January 17, 2016, 8:44am

@watou @walter @paolo_denti New issue #3818 made for this. Corresponding pull request #3819 made to fix the issue.

I would appreciate if you would take the time to review the solution. @walter: can you test the new version? I uploaded the PR 3819 version online.

walter · January 17, 2016, 9:02am

Hi @ssalonen I’m testing it and all now is working again.
Thanks a lot for the fix.

@watou Can you provide to approve the fix?
Walter

watou · January 17, 2016, 9:15am

I have marked both the issue and PR for the 1.8.1 milestone. Does it need to be merged sooner than that, or would it instead make sense for @ssalonen to provide a link on the forum to a “hotfix” JAR until then?

walter · January 17, 2016, 9:23am

A “hotfix” JAR I think it’s a very nice idea!

ssalonen · January 17, 2016, 11:29am

Yes I actually pasted hot fix jar above already

Jereme_Guenther · February 9, 2016, 6:13am

I agree with the trend here. version 1.7 worked great. In version 1.8 modbus started throwing errors saying “ModbusSlave not connected”. It acts very much like modbus isn’t closing connections behind itself, which us causing all sorts of issues in my system.

I tried PR 3819 version, but I am still getting the errors. Has anyone else been successful with this new version of the jar?

Jereme_Guenther · February 9, 2016, 3:28pm

After some more testing I do not think this is my issue at all. It sounds like everyone else here is able to use their modbus device just fine with openhab, just not other software at the same time. In my case openhab starts out reading the values from the modbus device, but quickly begins failing with the mentioned error; occasionally it will recover briefly and read values again before returning to the errors.

It’s not the modbus device as the old 1.7 configuration still works great.

walter · February 9, 2016, 3:54pm

Hi Jereme, I have a lot of modbus devices that I read from openhab and also from other sw at the same time.
Where I start to use OH I used 1.8 snapshot and modbus doesn’t work fine, the connection not closed was a big problem for me. Now I’m using PR 3819 from @ssalonen and all is fine, no problem.

P.S.
I’m using only modbus TCP using a Moxa Gw to convert modbus RTU in TCP. In new modbus version I see some changes on serial connection, but I don’t use it and I can’t test it becouse I use a VM.

ssalonen · February 9, 2016, 4:12pm

@Jereme_Guenther can you provide more details? I think you are using modbus TCP?

If 1.7 worked great, then I’m afraid the connection closing introduced originally in the PR 3228 by paolodenti and later re-introduced (to fixup my other PR) in my PR 3819 might be causing the issue. Looking at the issue milestones, I believe that in 1.7 there was no connection closing after modbus read/write transaction.

If you can give more information on your configuration we can try troubleshoot from there…

thanks

Jereme_Guenther · February 10, 2016, 4:23am

Yes, I am using modbus TCP, the device is an Adam 6052 module.

I currently have two configurations of openhab as I try and make the transition from 1.7 to 1.8. Fortunately I am pretty new to this software so I do not have much in the way of configuration to copy over. The 1.7 configuration continues to function perfectly.
--------The Error-----------------
2016-02-09 20:14:12.879 [INFO ] [.b.modbus.internal.ModbusSlave] - ModbusSlave not connected

I did discover something very interesting. While both the 1.8 modbus jar file, and the PR 3819 jar files individually don’t work. (the 1.7 version results in nothing, while both 1.8 versions have that error.) But, if I use both the 1.7 and PR 3819 jar files together then things work perfectly. This was very unexpected and an unintentional discovery. I would have expected some method conflict error or something similar from this.

Unfortunately, when I tried to verify these findings by recreating each set of errors in turn it did not work a second time. Now when I include both jar files I get the original error and this:
-----------Error----------------
2016-02-09 20:20:45.130 [WARN ] [i.internal.GenericItemProvider] - Attempted to register a second BindingConfigReader of type ‘modbus’. The primaraly reader will remain active!

Edit: (but then a few minutes later with the new files it did indeed start working correctly again. So I have been able to re-create it.)