Connection pooling in modbus binding

Hi @rossko57 and @ssalonen

I have now tried with an extra usb <> rs485 device.
And it is working :slight_smile:

Thanks :slight_smile:

This is the configration:

poll=10000
writemultipleregisters=true

# Read kWh & kW from EM23 El Meter
serial.em23.connection=/dev/ttyUSB1:9600:8:none:1:rtu:1000
serial.em23.id=1
serial.em23.start=12
serial.em23.length=3
serial.em23.type=input

# Read Nilan Temp
serial.nilan.connection=/dev/ttyUSB0:19200:8:even:1:rtu:1000
serial.nilan.id=30
serial.nilan.valuetype=int16
serial.nilan.start=203
serial.nilan.length=6
serial.nilan.type=input

# Read Nilan RH
serial.nilan2.connection=/dev/ttyUSB0:19200:8:even:1:rtu:1000
serial.nilan2.id=30
serial.nilan2.valuetype=int16
serial.nilan2.start=221
serial.nilan2.length=1
serial.nilan2.type=input

# Read Nilan AlarmState
serial.nilan3.connection=/dev/ttyUSB0:19200:8:even:1:rtu:1000
serial.nilan3.id=30
serial.nilan3.valuetype=int16
serial.nilan3.start=400
serial.nilan3.length=10
serial.nilan3.type=input

# Read/Write Nilan ControlSet
serial.nilan4.connection=/dev/ttyUSB0:19200:8:even:1:rtu:1000
serial.nilan4.id=30
serial.nilan4.start=1001
serial.nilan4.length=4
serial.nilan4.type=holding

# Read/Write Nilan Speed
serial.nilan5.connection=/dev/ttyUSB0:19200:8:even:1:rtu:1000
serial.nilan5.id=30
serial.nilan5.start=200
serial.nilan5.length=2
serial.nilan5.type=holding

Thank you for reporting back. Great to hear it is working!

Clarified this bit in the wiki

Is it possible to restric modbus.net.TCPMasterConnection to make only one connection per IP:port
As many slaves addet behind one IP the connections become many. Most of the Gateways support one connection.
No debug information about the address of the failed slave. ( ModbusSlave: Error connecting to master: Connection timed out)

Hi @igeorgiev

What version of modbus binding you are using? The latest development version includes this feature, and it is actually the only mode of operation.

Best
Sami

Hi Sami,
I’m using the last stable version openHAB runtime (v1.8.3).
When I start the OH, I see:
2016-08-01 19:54:14.055 [DEBUG] [o.b.m.internal.ModbusActivator] - Modbus binding has been started.
2016-08-01 19:54:14.129 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘FF_Service_PWM’ instanciated
2016-08-01 19:54:14.137 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Garden_TRh’ instanciated
2016-08-01 19:54:14.139 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Garden_TRh_Relay’ instanciated
2016-08-01 19:54:14.141 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘GF_Office_PWM1’ instanciated
2016-08-01 19:54:14.143 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘GF_Office_PWM2’ instanciated
2016-08-01 19:54:14.144 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘GF_Showroom’ instanciated
2016-08-01 19:54:14.146 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘GF_Warehouse’ instanciated
2016-08-01 19:54:14.148 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Office_TRh’ instanciated
2016-08-01 19:54:14.149 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Office_TRh_Relay’ instanciated
2016-08-01 19:54:14.151 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Printer_TRh’ instanciated
2016-08-01 19:54:14.153 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Printer_TRh_Relay’ instanciated
2016-08-01 19:54:14.154 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘RackRoom_TRh’ instanciated
2016-08-01 19:54:14.156 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘RackRoom_TRh_Relay’ instanciated
2016-08-01 19:54:14.157 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Service_TRh’ instanciated
2016-08-01 19:54:14.158 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Showroom_TRh’ instanciated
2016-08-01 19:54:14.159 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Showroom_TRh_Relay’ instanciated
2016-08-01 19:54:14.161 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Warehouse_TRh’ instanciated
2016-08-01 19:54:14.162 [DEBUG] [.modbus.internal.ModbusBinding] - modbusSlave ‘Warehouse_TRh_Relay’ instanciated
2016-08-01 19:54:14.163 [DEBUG] [.modbus.internal.ModbusBinding] - config looked good, proceeding with slave-connections
2016-08-01 19:54:14.168 [DEBUG] [modbus.net.TCPMasterConnection] - connect()
2016-08-01 19:54:14.192 [DEBUG] [modbus.net.TCPMasterConnection] - connect()
2016-08-01 19:56:21.497 [DEBUG] [modbus.internal.ModbusTcpSlave] - ModbusSlave: Error connecting to master: Connection timed out
2016-08-01 19:56:21.498 [DEBUG] [modbus.net.TCPMasterConnection] - connect()
2016-08-01 19:58:28.856 [DEBUG] [modbus.internal.ModbusTcpSlave] - ModbusSlave: Error connecting to master: Connection timed out
2016-08-01 19:58:28.857 [DEBUG] [modbus.net.TCPMasterConnection] - connect()

and from linux console I see the folloing connections:
netstat | grep 237
tcp6 0 0 192.168.74.235:49395 192.168.74.237:502 ESTABLISHED
tcp6 0 1 192.168.74.235:49396 192.168.74.237:502 SYN_SENT

All Slaves are on one IP address (…74.237) with different IDs

Hi

Unfortunately you cannot use the latest stable version to get the new behavior. You need to download the latest development version from here. Please remove the older version before installing the new one.

Please report how it works with the new version. I would recommend using pastebin.com for the logs, makes the thread more readable.

Hi Sami,

I have two slaves (tcp) configured. If one of them is off it takes about 20 seconds to poll the one that is working.
Is there anyway to avoid this situation?

I´m running OH 1.8.3 with Modbus binding 1.9.0 on Windows 10.

OH log:
http://pastebin.com/nivQsdgW

I don´t have enough knowledge, but I think “modbus.net.TCPMasterConnection” is taking to long to return an error connection…

http://pastebin.com/UDtKgcht

Hi @Botura, can you please paste your modbus configuration please. Which of the slaves return connection timeout.

It seems that connection timeout is 3 seconds by default. Further wait might be introduced by the connection string parameters (see the wiki for details).

Best
Sami

Here is my configuration, the slave that is returning timeout is 192.168.0.12.

The timeout seens to be 21 seconds… I´ve tried with the default parameters, ie, just setting host = “modbus:tcp.IOR12.connection=192.168.0.12” but the behavior was the same, except by the retries times.

As I could understand reading the wiki, there is no timeout parameter for modbus tcp slaves, the timeout is used only with serial slaves.

http://pastebin.com/rxZRwh9u

Hi,

you are correct that there is no way to modify the timeout – in fact, there never has been :frowning: Nevertheless, I’m still a bit confused why the default 3s timeout is not respected and you are observing much larger timeout period.

Before jumping to any fixes etc, can you please do these steps

  1. Make sure you are using the modbus binding version linked above (just to be sure)
  2. Use simplified config with one modbus slave only, e.g. IOR11. Use 0 for reconnectAfterMillis, interConnectDelayMillis and 1 for connectMaxTries (this you have already). Keep the poll period of 500 (ms) and interTransactionDelayMillis of 60 (ms).
  3. enable debug logging, instructions are in the wiki
  4. Use pastebin to paste 1) configuration and 2) logs

Best,
Sami

Hi Sami,

I copied the binding version from the link you sent and used a simplified config with only one modbus slave with the parameters you suggested.

What I could see is that it happens the same thing, long delay to timeout (about 20s).
But, if instead of “192.168.0.11” I use “localhost” or my own IP then the timeout is about 1s.

If I use two slaves, the “localhost” takes 1s to timeout, and “192.168.0.11” takes 20s to timeout.

And if I disable my wifi then the timeout for “192.168.0.11” is less than a second.

It seens to be related with network, is that correct? But I have no idea how to solve this…

Regards,
Matheus

Log
Modbus config

1 Like

Thanks for the steps. Well this one got more obscure Indeed. Really have no explanation for this behavior. My best guess is that it is actually related to routing somehow…

It certainly sounds like you said, with some networks the timeout takes more time occur. I would also expect the time to be the same… One more thing, can you double confirm that you actually get timeout error when the host localhost or your own ip? I would expect “connection refused” error.

Can you also please confirm what OS you are running? Windows?

You are right, with localhost I get “connection refused” error.

I´m using Windows 10. This weekend I´ll test with a Raspberry and post the results.

Thanks for your attention!

Hi Sami,

I found some material about this timeout, it is related to the specification and implemention of tcp/ip in the OS kernel.

I follow the instructions of the second link and changed my windows “Initial RTO” from 3000 to 1000, and now I have a 7 seconds timeout instead of 21. But I think it is still to high for a modbus tcp timeout (and besides that it will affect my entire system and not only my OH application, I don´t know what are the consequences…)

If what I have found is right, I think the best solution would be to implement a timeout option for Modbus TCP. Without this, the system will become very slow if someone uses two or more slaves and some of them are out.

http://stackoverflow.com/questions/26896414/where-does-the-socket-timeout-of-21000-ms-come-from

https://datafull.co/p/cual-es-el-valor-predeterminado-tcp-connect-timeout-en-windows

Yeah I think that’s it. Also: the three seconds I talked about is a bit different thing, it is associated with reads from the socket stream (SO_TIMEOUT) and I think there’s really no need for user to override that.

By using alternative way of creating the socket, specifying the connection timeout parameter (instead of OS default) should resolve the issue.

Would you like to create a feature request for this in the github?

Best
Sami

OH and github newbie… I don´t know how to create a feature request, would you mind openning it and send me the link so I can learn seeing what you have done?

@Botura: Posted the feature request here: https://github.com/openhab/openhab/issues/4595

Thank you Sami!

While we’re on the topic … some thoughts about retry behaviour ? Which also applies in the “missing slave” situation.
When I last checked, if retries were enabled, the binding will wait on all the try + retry timeout periods before polling other slaves.
After one poll, we do it all again, affecting performance badly.

“Nicer” behaviour would be to poll other slaves before retrying the failed one.
Basically, that would change the concept of retry to mean ‘number of failed polls before we declare that device broken’.

What exactly we do about a broken slave is not at all clear to me. Can we detect a broken slave via an OH rule? How could we take action to reduce the impact on survivors?
I fear there probably isn’t much that can be done easily, without restructuring - which should be an OH2 target. Just see that retries=0 and use the poll as the ‘retry’ mechanism?

My application has modbus slaves scattered around a large site with local power supplies, any of which could be switched off for maintenance or of course some fault. I haven’t really figured out how to deal with off-line slaves at all.