Connection pooling in modbus binding

YMedox · February 24, 2016, 9:10am

@ssalonen, here is part of output of “for i in $(seq 1 100); do nc -v -z -w 1 192.168.1.50 502; done” command: http://pastebin.com/jDq5Xr2q openHAB was not working at this moment.

I added
logger name=“org.openhab.binding.modbus” level=“TRACE” />
logger name=“net.wimpi.modbus” level=“TRACE” />
into logback.xml and started openhab with start.debug.sh. Here is openhab.log: http://pastebin.com/GLN5XG3r

Both slaves pass values to items in close but not sincronous interval.

ssalonen · February 24, 2016, 6:47pm

@YMedox thanks! Please note that when you use start_debug it does not use logback.xml. Instead, it uses logback_debug.xml.

Because of this you need to apply the logging changes there instead.

But don’t worry I think it would reveal anything important… Let me get back to you later when I have a possible fix

mbs38 · February 24, 2016, 7:12pm

@ssalonen, I’ve changed modbus:poll=5 to modbus:poll=50. The rest of the configuration is unchanged.

Results:
debug mode: ca 1660 transactions / minute
normal mode: ca 120 transactions / minute

logs:
debug mode: http://pastebin.com/zhTwH1mZ
normal mode: http://pastebin.com/EqaZbyWB

ssalonen · February 24, 2016, 8:42pm

@mbs38 thank you again very much. That sounds promising! I will come back to you with a possible fix…

YMedox · February 25, 2016, 6:21am

@ssalonen, thank you! Will wait for news from you.

ssalonen · February 25, 2016, 2:48pm

@YMedox @mbs38 @Ferryb4 @rossko57 New version (v3) out!

Download it out from my github releases

Now you can configure delay between transactions. With my Siemens PLC, 100ms was OK while 50ms was too little (got connection refused all the time with 5ms polling).

Testing really appreciated as always

Ferryb4 · February 25, 2016, 7:30pm

Hi Ssalonen,

Thank for the great work your doing.

My log are here http://pastebin.com/xTCA6Nqt and my config http://pastebin.com/n5emH7be

I still get errors…
Do you have any suggestion to try?

ssalonen · February 25, 2016, 8:29pm

Thanks that helps. The delay logic does not work as expected still (you can see “waited 0ms”)

Will look into it

edit: here’s a fixed version : link . Hope it works! @Ferryb4

edit2: noted later that this fixes it only for tcp connection but not serial. Try the v5 for serial fix.

YMedox · February 26, 2016, 9:42am

Dear @ssalonen thank you for a great work!
Now I have 5 slaves, 150 ms delay and poll = 4000 ms. It works!
Thank you again.

mbs38 · February 26, 2016, 11:59pm

@ssalonen still no improvement and still there’s this strange performance discrepancy between normal and debug mode:

openhab.cfg: http://pastebin.com/YvjU58j2
logback confs are unchanged.
logs, normal mode: http://pastebin.com/2hj6i6np
logs, debug mode: http://pastebin.com/xWWTzsbP

measured performance, normal mode: 182 transactions per minute
measured performance, debug mode: 1666 transactions per minute

Best regards,
M

ssalonen · February 27, 2016, 5:15am

@mbs38

EDIT 2: There was still bug with the delays with serial connedctions which basically meant that the delay (50ms) was not exercised in practice (from the logs you can see that Waited 0ms (interBorrowDelayMillis 50ms) before giving returning connection SerialConnection). Please try the updated version (v5).

If that does not work, can you please try larger interTransactionDelay (the last number that is currently 50ms).

50ms might be too fast for the device, at least it was too much for my siemens plc (which should be industrial grade) which is using modbus tcp (not serial!).

Edit: Small thing but can you please add enable net.wimpi.modbus INFO level in logback.xml. It just basically makes it clear if transactions eventually succeeded…

Ferryb4 · February 27, 2016, 10:24am

Hi Ssalonen,

It works with v5.
I set the delay to 500ms
Now I have no more errors.

Thank you very much for your time and energy.

mbs38 · February 27, 2016, 2:16pm

It didn’t work. With version 5 and this configuration for alI devices

modbus:poll=50
onnection=/dev/ttyS0:38400:8:none:1:rtu:280:80

I get a stable performance of 745 transactions per minute. Which sort of makes sense: 60*1000/80 = 750
I’ve added INFO level to the logback.xml. Not a single [ERROR] in the logs.
Although 745 is about four times better than the version openhab currently ships with, it’s still very slow. I’m convinced we can do better than that.

With 70ms transaction delay the performance goes below 300 transactions per minute.
Btw the difference between debug mode and normal mode is gone.

[quote=“ssalonen, post:65, topic:5246”]
50ms might be too fast for the device, at least it was too much for my siemens plc (which should be industrial grade) which is using modbus tcp (not serial!).
[/quote] That can’t be the problem here. The devices I’ve got here have achieved >100 transactions per second (!), when they were connected to a PLC. Of course it also depends on the amount of Registers/Bits that is requested by the PLC, but this still rules out that the problem is with the devices.

In order to show what is possible with a single device I’ve changed my configuration as follows:
modbus:poll=5
modbus:serial.agsBusMasterHolding.connection=/dev/ttyS0:38400:8:none:1:rtu:280:10
modbus:serial.agsBusMasterHolding.id=247
modbus:serial.agsBusMasterHolding.start=42
modbus:serial.agsBusMasterHolding.length=14
modbus:serial.agsBusMasterHolding.type=holding

I get a stable 1864 transactions per minute. Again no [ERROR] in the logs.
Just to clarify: 14 registers is by far the biggest chunk of data I’m requesting among all the devices I have configured. The bottleneck must therefore be somewhere else.
We should make multiple devices work just as well as a single device. ~1900 transactions per minute is what what we should be aiming for (at least ).

If you need any further debug logs with multiple and single devices please say so. I’m willing to help in any way I can.

Thanks for your work!
M

ssalonen · February 27, 2016, 3:48pm

Thank you for the cold hard facts. I’m really satisfied that the binding now at least “works” even though the performance is not yet maximised…

With 70ms transaction delay the performance goes below 300 transactions per minute.

If I understand correctly, you start to get the EOF error with delays of 70ms? Or any other idea why performance decreases?

Now that the basic functionality works, could you try to decrease the read timeout to lower value, say 50ms. I’m thinking It could improve the performance even though the first read might sometimes time out, perhaps it helps to just retry aggressively. Now the binding retries the transactions 3 times.

You could also try to lower transaction delay to the maximum performance you have witnessed, ie 10ms (100 transactions per second).

Looking forward how these tips work!

ssalonen · February 27, 2016, 3:53pm

Now properly understood you single device benchmark. Really illuminating.

Can you share trace level logs with the single slave back multiple slave? How many slaves you have exactly with multi slave configuration? I would like to investigate whether the extra loss is due to overhead cause by the frameworks etc I’m using … We talking about so large performance already…

It might also help that I remove the trace level logging statements, they do incur small overhead. I’ll have a look at this

mbs38 · February 27, 2016, 6:13pm

single device conf: just the six lines I’ve posted in my previous post (1 slave)
single device logs, debug: trace - Pastebin.com (1871 transactions / minute)

multiple device conf: conf multiple - Pastebin.com (13 slaves)
multiple device logs, debug: multipeltrace - Pastebin.com (730 transactions / minute)

The trace level logging doesn’t seem to hurt performance in any way.

I’ll try lowering the timeouts later.

M

mbs38 · March 2, 2016, 10:18am

Makes things worse. Long transactions, like reading 14 registers, never succeed.

M

ssalonen · March 2, 2016, 7:11pm

@mbs38

OK, to summarize the situation we have now the following (documenting for future generations as well… Perhaps we can snip out something for the wiki in the future as well)

Summary

1. single slave in modbus.cfg, low inter transaction delay (10ms), with generous read timeout (280ms), poll period practically minimal (5ms)

/dev/ttyS0:38400:8:none:1:rtu:280:10 (config) (as reported reported here and here)

performance good (logs), 1864 transactions per minute. Log snippet you pasted shows that on average we execute 1 transaction in 32ms (median 24ms), ie ~1875 transactions/minute[1]
Performance is slower than poll period 5ms, but that is quite expected and perfectly all right – the system reads as fast as it can in the sense that there are no “exra” waits between transactions.

2. Many slaves (13) in modbus.cfg, high inter transaction delay (80ms), with generous read timeout (280ms). Poll slower 50ms**

(config)
modbus:serial.agsBusMasterHolding.connection=/dev/ttyS0:38400:8:none:1:rtu:280:80 (as reported here)

performance poor, average 1 transaction in 80ms, ie ~750 transactions/minute (logs)
performance is poor because we artificially ask the system to wait between transactions, on average 43ms is waited after each transaction
no errors since the delays are really conservative

3. Many slaves (13) in modbus.cfg, high inter transaction delay (80ms), with tight read timeout (50ms). Poll slower 50ms**

(same as this config but with different read timeout)
(as reported here)

poor performance, EOF errors
since same timeout goes for small (short) reads, and long reads (long) reads [3] (e.g. 14 registers), we get timeouts.

[1] grepped timestamps with: grep "to allow delay between transactions" single_device.log|cut -d' ' -f1
[2] grepped average wait time using: grep "to allow delay between transactions" multi-device.txt|cut -d' ' -f8
[3] Here’s the corresponding place in code which makes the read block until 1) timeout occurs 2) all message bytes are read

Steps forward

Decreasing the read timeout was a bit poor advice, and definately that is not the issue since the single device configuration works extremely fast with read timeout of 280ms.

Based on the observations I think this might really work with multi device scenario (good general configuration instructions as well):

short poll period, e.g. 5ms. The binding will simply block until previous read is done, and immediately start the new poll round (checked this from openhab core code as well). If there is no requirement to update item status fast, this can be increased ofcourse.
generous read timeout, e.g. 280ms. When the line works, the read returns much faster. In case of errors (e.g. lost bit in transmission for whatever reason), the system waits 280ms before giving up. Default 500ms is pretty good default actually. With a lot of registers, the read timeout likely needs to be increased. If we get EOF errors, the value might be too low.
shorter inter transaction delay, e.g. 10ms. As has been proven by you in the single device scenario, even delay of ~30ms+some seems to work just fine with your device. For high performance requirements, this should be set as low as possible. If we get EOF errors, the value might be too low. This is likely hardest out of all parameters to tune in the sense that it depends on device performance as well.

Summarizing the above advice, can you try this kind of connection string with the multi-slave setup

 # - Short poll period (5ms).  Binding will simply block until previous read is done, and immediately start the new poll round
 #
modbus:poll=5
 # - Read timeout (280ms) needs to be enough for long queries to work out. When the line works, the read returns much faster.
 # - Short inter transaction delay (10ms). For high performance requirements, this should be set as low as possible. If we get EOF errors, the value is too low
 #
modbus:serial.X.connection=/dev/ttyS0:38400:8:none:1:rtu:280:10````


Best,
Sami

mbs38 · March 2, 2016, 9:35pm

@ssalonen the summary is a good idea.

I have just tried: http://pastebin.com/5qfiMYG0

There is again a huge performance difference between start.sh and start_debug.sh:
normal: 195 transactions/minute, logs: http://pastebin.com/uPniRYmC
debug: 530 transactions/minute, logs: http://pastebin.com/6kquYwjj

However, there is obviously no improvement - in fact the opposite has happened.

Best regards,
M

ssalonen · March 3, 2016, 6:13pm

Hi @mbs38

Can you try to increase the inter transaction delay, try first with 25ms (binding ensures that there is at least that duration time between transactions. 25ms worked with single slave as explained in the summary) and if you still get errors try even higher value e.g. 35ms.

I would expect it to work eventually, we need to find the smallest value that works. If the delay is too small some bits might be lost it seems and eventually we get EOF error. Performance is naturally decreased since each EOF error means that we have waited the read timeout…

Best
Ssalonen