Connection pooling in modbus binding

mbs38 · February 14, 2016, 10:49pm

Ok now things are getting weird:
When I change this line in logback_debug.xml

logger name=“org.openhab.binding.modbus” level=“DEBUG”

to

logger name=“org.openhab.binding.modbus” level=“TRACE”

the performance is doubled (amount of successful master-slave transactions). The configuration remained the same ( http://pastebin.com/2Hh1bbpn ).

rossko57 · February 18, 2016, 8:47pm

I have broken my cheap chinese TCP gateway, but am able to try the updated binding with a RS-485 serial setup
Two slaves configured, both part of one real device:
modbus:serial.slave4R.connection=COM5:9600:8:none:1:rtu:1000
modbus:tcp.slave4R.id=40
modbus:tcp.slave4R.type=coil
modbus:tcp.slave4R.start=16
modbus:tcp.slave4R.length=4
modbus:serial.slave4I.connection=COM5:9600:8:none:1:rtu:1000
modbus:tcp.slave4I.id=40
modbus:tcp.slave4I.type=discrete
modbus:tcp.slave4I.start=32
modbus:tcp.slave4I.length=4

All seems to work fine, no errors reported. Randomly commanding coil items on and off (so as to intersperse traffic with polls) doesn’t seem to cause issues.

ssalonen · February 19, 2016, 5:49am

Glad to hear it! If you don’t mind, can you please test it with the released official version modbus binding as well? Just to confirm that the new version fixes something

Thanks!

ssalonen · February 19, 2016, 6:49am

@mbs38 @Ferryb4 Now I get it why some log statements are missing! We have to setup logger for net.wimpi.modbus (modbus library inside the binding) as well:

<logger name="org.openhab.binding.modbus" level="TRACE" />
<logger name="net.wimpi.modbus" level="TRACE" />

If you enable both of those loggers, you should be able to see useful summaries in logs.

lgeorgiev · February 19, 2016, 10:24am

Hi Have same problems with Modbus. Since 1.6 no any solutions for this problem. The main is pooling proccess not works correct. I try with TCP and RTU . Same problem. There is also problem with connection component, but this is not the main problem. Is there any idea for solvin the problem with pooling proccess? I have ~ 15 slaves and till now no any solutions. In general The Modbus binding not working and can be used only for testing of one slave on table. No any productions options.

ssalonen · February 19, 2016, 3:58pm

@lgeorgiev exactly, it has been an issue for a long time with binding. That’s the reason implementation discussed in this thread has been created.

Have you tried out with the version linked in this thread? If not, here’s the instructions I gave in other thread test with modbus tcp. I would recommend using modbus tcp, my feeling is that you avoid a lot of issues related to serial communication by using that. Please don’t hesitate to ask if something is clear.

Please also note the extended logging configuration instructions mentioned above by me.

I would be very grateful if you have time to test this with your setup and report back! Thanks.

ssalonen · February 19, 2016, 4:07pm

@mbs38 did the errors reduce with TRACE level logging? I’m thinking whether the few millisecond overhead caused by the extensive logging could reduce errors since serial line is in not so heavy use… Specifically device issues mentioned in issue 3537 might be related. As mentioned in the issue, the implementation has configuration parameter for arbitrary sleep before executing the modbus serial transaction, it’s just not exposed as configuration parameter (default is no sleep)

Also could you have to paste logs with the net.wimpi.modbus turned to debug level in logs. That should have log statements about successful executions (if some transaction has initially failed)

Thanks!

rossko57 · February 19, 2016, 8:36pm

@ssalonen - I guess my two-slave serial setup is no good test. Seems to work just as well with binding from 1.8.0 distrib package. Host is an elderly (slow) WinXP laptop and USB-485 adaptor, timing may be relevant,

YMedox · February 20, 2016, 7:11pm

Hi all! As @ssalonen asked putting here all my data.
simplified items:
Group gEnergy “Параметры электроснабжения”

Number GridVoltageRaw Number GridVoltage Number GridInputPower Number GridOutputPower Number ACGeneratorPower Number XWLoadFrequencyRaw Number XWLoadFrequency Number LoadVoltageRaw Number LoadVoltage Number LoadPower Number LoadPowerAPP Number AveragePVVoltageRaw Number AveragePVVoltage Number MPPTPVPower Number MPPTBatteryCurrentRaw Number MPPTBatteryCurrent Number MPPTBatteryPower Number BatteryVoltageRaw Number BatteryVoltage Number BatteryTemperatureRaw Number BatteryTemperature Number BatteryCurrentNetRaw Number BatteryCurrentNet “ComBox Сетевое напряжение (modbus)” {modbus=“combox:4”}
“ComBox Сетевое напряжение [%.1f V]” (gEnergy)
“ComBox Мощность сети [%d W]” (gEnergy) {modbus=“combox:7”}
“ComBox Мощность на продажу [%d W]” (gEnergy) {modbus=“combox:10”}
“ComBox Мощность от генератора [%d W]” (gEnergy) {modbus=“combox:12”}
“ComBox Частота на нагрузке (modbus)” {modbus=“combox:27”}
“ComBox Частота на нагрузке [%.2f Hz]” (gEnergy)
“ComBox Напряжение на нагрузке (modbus)” {modbus=“combox:16”}
“ComBox Напряжение на нагрузке [%.1f V]” (gEnergy)
“ComBox Мощность нагрузки [%d W]” (gEnergy) {modbus=“combox:15”}
“ComBox Мощность нагрузки (apparent) [%d VA]” (gEnergy) {modbus=“combox:14”}
“ComBox Среднее напряжение на панелях (modbus)” {modbus=“mppt:0”} //Не удается прочитать 143 значения
“ComBox Среднее напряжение на панелях [%.1f V]” (gEnergy)
“ComBox Мощность от панелей [%d W]” (gEnergy) {modbus=“combox:36”}
“ComBox Ток на батареи (modbus)]” {modbus=“combox:37”}
“ComBox Ток на батареи [%.1f A]” (gEnergy)
“ComBox Мощность на батареи [%d W]” (gEnergy) {modbus=“combox:38”}
“ComBox Напряжение на батареях (modbus)” {modbus=“combox:42”}
“ComBox Напряжение на батареях [%.1f V]” (gEnergy)
“ComBox Температура батарей (modbus)” {modbus=“combox:43”}
“ComBox Температура батарей [%.1f °C]” (gEnergy)
“ComBox Ток батарей (modbus)” {modbus=“combox:44”}
“ComBox Ток батарей [%.1f A]” (gEnergy)

There are also very simple rules for transforming some raw data.

Here is modbus cfg:
modbus:tcp.combox.connection=192.168.1.50:502
modbus:tcp.combox.type=holding
modbus:tcp.combox.id=201
modbus:tcp.combox.start=68
modbus:tcp.combox.length=90
modbus:tcp.combox.valuetype=int32

modbus:tcp.mppt.connection=192.168.1.50:502
modbus:tcp.mppt.type=holding
modbus:tcp.mppt.id=30
modbus:tcp.mppt.start=76
modbus:tcp.mppt.length=4
modbus:tcp.mppt.valuetype=int32

openHAB started with sudo start_debug.sh. Here is poenhab.log from start moment http://pastebin.com/vx48ui6d

With this config I got both sections “combox” and “mppt” filled with data from tcp-gate
But there is a bit strange behaviour of the item AveragePVVoltageRaw. Look here http://pastebin.com/dK7TW8px (this is events.log)
And there are errors in log and in console.

YMedox · February 20, 2016, 7:12pm

Device is TCP-Modbus gate http://solar.schneider-electric.com/product/conext-combox/
I am only reading.

YMedox · February 21, 2016, 7:19am

One more observation. The single item from “mppt” section is being refreshed every poll=3000 ms, whilst other items from “combox” section now refreshing twice or triple more rarely.

ssalonen · February 21, 2016, 8:34am

@YMedox can you please test again with logging configuration pasted above? I can’t see all log statements now.

EDIT: it looks like the server actively refuses (Connection refused, i.e. В соединении отказано in the logs I believe) some connection attempts. I’m thinking whether we have issues due to the fact that connection is closed after every transaction, and thus the server might refuce to accept new connection rightaway (see e.g. issue 3537 that documents similar behaviour for modbus tcp requests). If this is indeed the case, we should be able to make either connection establishment more robust a) waiting small (configurable, default 0ms) duration before executing read request, as discussed in the issue b) by retrying in case of errors.

I wonder if you can reproduce the issue with external script, e.g. using netcat or nc (noticed that you have *nix system so that should be readily available usually). For example, one can test that connection can be established to port 80 (-v verbose, -z just connect and then close immediately, -w timeout 1 second)

Connection to www.google.fi 80 port [tcp/http] succeeded!````

In your case you could try to connect to the modbus server as much as you can (note that it will connect 100 times to the server, hope it's ok for you!):
````$ for i in $(seq 1 100); do nc -v -z -w 1  192.168.1.50 502; done````

You should not get any errors. You can try even more connections by changing the number `100` to `10000` for example.

Thanks

mbs38 · February 21, 2016, 7:49pm

@ssalonen I’ve changed the logging configuration to this:

Some logs with the new settings: aprettybigpaste - Pastebin.com

The errors did reduce with trace level logging. However there are still some errors - as you can see in the paste.

ssalonen · February 21, 2016, 8:05pm

@mbs38 Thank you very much! I like what I see now:

21:28:03.524 [DEBUG] [i.modbus.io.ModbusRTUTransport:77 ] - Sent: 09 01 00 00 00 18 3d 48
21:28:03.564 [ERROR] [i.modbus.io.ModbusRTUTransport:170 ] - Last request: 09 01 00 00 00 18 3d 48
21:28:03.564 [ERROR] [i.modbus.io.ModbusRTUTransport:171 ] - failed to read: Error reading response (EOF)
21:28:03.565 [ERROR] [w.m.io.ModbusSerialTransaction:198 ] - execute try 1/3 error: I/O exception - failed to read. Request: net.wimpi.modbus.msg.ReadCoilsRequest@6806bc7d. Serial parameters: SerialParameters@5d7b8a17[portName=/dev/ttyS0,baudRate=38400,flowControlIn=none,flowControlOut=none,databits=8,stopbits=1,parity=none,encoding=rtu,echo=false,receiveTimeout=40]
21:28:03.569 [DEBUG] [i.modbus.io.ModbusRTUTransport:77 ] - Sent: 09 01 00 00 00 18 3d 48
21:28:03.581 [DEBUG] [i.modbus.io.ModbusRTUTransport:144 ] - Response: 09 01 03 00 1f 00 35 36
21:28:03.581 [DEBUG] [w.m.io.ModbusSerialTransaction:209 ] - execute eventually succeeded with 1 re-tries. Request: net.wimpi.modbus.msg.ReadCoilsRequest@6806bc7d. Serial parameters: …

So it eventually succeeds, it just has to re-try the transaction. Now that timeout is configurable, I think this is pretty good since the overall polling is not slowed too much.

Can you please tell me if transaction fails completely, i.e. 3 retries are not enough (you should have something like execute reached max tries 3, throwing last error in the logs)? Let it run for some time to make sure…

I have a hunch that we we could lower the error rate by making the transaction delay configurable, i.e. we would wait for some millis before executing the read/write transaction… I believe that’s more logging helps because delay is introduced between the transactions.

EDIT: actually can you try to increase the timeout from 40ms. I believe that might reduce errors.

EDIT 2: I will make that particular log statement info level in the future, I think it would nice to see that transaction eventually succeeded if there are errors.

mbs38 · February 21, 2016, 10:01pm

For evaluation I’ve created a microcontroller based device that can count the amount of successful modbus transactions on my RS485 network.

This is my openhab.cfg and logging configuration:
openhab.cfg: http://pastebin.com/KUmKxsfe
logback.xml: http://pastebin.com/k8BUPRBb
logback_debug.xml: http://pastebin.com/kH4X8kFf

With openhab started in debug mode I get about 1700 successful transactions per minute, in normal mode I only get about 190.

Here are the corresponding logs:

normal: http://pastebin.com/uDQP8g0k
debug: http://pastebin.com/YEHZZbNg

What do you think?

EDIT: Btw, because of this I’m currently running my productive system in debug mode - I’m piping the output to /dev/null because it generates 400MB of logs per hour…

ssalonen · February 22, 2016, 5:29am

This is excellent stuff! I will have to look at the logs and will get back to you…

ssalonen · February 22, 2016, 8:13pm

@mbs38 you realize that you have 5ms polling period?

I think that might be too low…My understanding of low level serial communication is a bit limited but here goes. modbus.org: MODBUS over Serial Line - Specification and Implementation guide (p.13 remark) says that inter-frame delay of 1.75ms is recommended with baud rates > 19200. See also this compact answer in stackoverflow on the timings.

Since you have 10 slaves configured, each will product single modbus request-response message (frame) pair, i.e. we should expect poll to take over 1.75ms102 = 35ms. This is with the assumption that there is no delays between bits of the message (the spec recommends 750us=0.75ms for the inter-character time-out). Let’s say each request has 8 bits + RTU frame of 3 bits = 11 bits. With max inter-character delays, that would total to 0.75ms*10 = 7.5ms per message. Since each poll means 20 messages, that amounts to max 150ms of inter-character delays. Summing this up with inter-frame delays we get 185ms (150+35).

So using this logic, poll period of around 35ms is as low as you should go, but with some cases 185ms could be considered the minimum limit (max. inter-character delays).

I might be completely off here (specs are specs – real world is something different, and not sure if I read it correctly), and to be honest have no details how the serial communication has been implemented in the java library openhab uses. I think it uses OS specific bits so this might depend on platform.

Would the errors disappear if you set poll period to something more conservative, such as 500ms? I do understand that there might be use cases for high poll periods, but at least we can rule out what causes these issues… If that works, perhaps you could try to lower it from there
You could also try increasing the baud rate (e.g. 115200 which is the maximum the binding supports), if possible with your setup.

Thanks for testing these out…

EDIT: Noticed that the jamod library that is used (actually openhab has its own forked version) for the modbus binding, has this disclaimer under Serial Modbus Implementations section:

The RTU implementation does only support the Master side. It is working by the best effort principle, which means it might not work in a reliable way in a low-lantency real-time context.

The requirement of 5ms poll period might be on the edge I believe

EDIT2: if you check your debug logs closely,
$ grep Sent /tmp/YEHZZbNg.txt

00:59:11.548 [DEBUG] [i.modbus.io.ModbusRTUTransport:77 ] - Sent: 03 02 00 00 00 01 b8 28
00:59:11.571 [DEBUG] [i.modbus.io.ModbusRTUTransport:77 ] - Sent: 0a 01 00 00 00 18 3d 7b
00:59:11.601 [DEBUG] [i.modbus.io.ModbusRTUTransport:77 ] - Sent: 10 02 00 00 00 01 ba 8b
00:59:11.621 [DEBUG] [i.modbus.io.ModbusRTUTransport:77 ] - Sent: 08 01 00 00 00 18 3c 99

you can see that true poll periods are 23ms, 30ms, and 20ms. Perhaps this gives some realistic poll period to try, say 50ms?

EDIT3: In any case, I think even with long polling period, at every poll, many modbus requests are written simultaneously without waiting in-between. I will make a version with configurable inter-request wait period that might help with these issues. In fact, I saw similar approach taken in another modbus library (modbus4j).

YMedox · February 24, 2016, 9:10am

@ssalonen, here is part of output of “for i in $(seq 1 100); do nc -v -z -w 1 192.168.1.50 502; done” command: http://pastebin.com/jDq5Xr2q openHAB was not working at this moment.

I added
logger name=“org.openhab.binding.modbus” level=“TRACE” />
logger name=“net.wimpi.modbus” level=“TRACE” />
into logback.xml and started openhab with start.debug.sh. Here is openhab.log: http://pastebin.com/GLN5XG3r

Both slaves pass values to items in close but not sincronous interval.

ssalonen · February 24, 2016, 6:47pm

@YMedox thanks! Please note that when you use start_debug it does not use logback.xml. Instead, it uses logback_debug.xml.

Because of this you need to apply the logging changes there instead.

But don’t worry I think it would reveal anything important… Let me get back to you later when I have a possible fix

mbs38 · February 24, 2016, 7:12pm

@ssalonen, I’ve changed modbus:poll=5 to modbus:poll=50. The rest of the configuration is unchanged.

Results:
debug mode: ca 1660 transactions / minute
normal mode: ca 120 transactions / minute

logs:
debug mode: http://pastebin.com/zhTwH1mZ
normal mode: http://pastebin.com/EqaZbyWB