Connection pooling in modbus binding

@ssalonen, here is part of output of ā€œfor i in $(seq 1 100); do nc -v -z -w 1 192.168.1.50 502; doneā€ command: http://pastebin.com/jDq5Xr2q openHAB was not working at this moment.

I added
logger name=ā€œorg.openhab.binding.modbusā€ level=ā€œTRACEā€ />
logger name=ā€œnet.wimpi.modbusā€ level=ā€œTRACEā€ />
into logback.xml and started openhab with start.debug.sh. Here is openhab.log: http://pastebin.com/GLN5XG3r

Both slaves pass values to items in close but not sincronous interval.

@YMedox thanks! Please note that when you use start_debug it does not use logback.xml. Instead, it uses logback_debug.xml.

Because of this you need to apply the logging changes there instead.

But donā€™t worry I think it would reveal anything importantā€¦ Let me get back to you later when I have a possible fix

@ssalonen, Iā€™ve changed modbus:poll=5 to modbus:poll=50. The rest of the configuration is unchanged.

Results:
debug mode: ca 1660 transactions / minute
normal mode: ca 120 transactions / minute

logs:
debug mode: http://pastebin.com/zhTwH1mZ
normal mode: http://pastebin.com/EqaZbyWB

@mbs38 thank you again very much. That sounds promising! I will come back to you with a possible fixā€¦

@ssalonen, thank you! Will wait for news from you.

@YMedox @mbs38 @Ferryb4 @rossko57 New version (v3) out!

Download it out from my github releases

Now you can configure delay between transactions. With my Siemens PLC, 100ms was OK while 50ms was too little (got connection refused all the time with 5ms polling).

Testing really appreciated as always :slight_smile:

Hi Ssalonen,

Thank for the great work your doing.

My log are here http://pastebin.com/xTCA6Nqt and my config http://pastebin.com/n5emH7be

I still get errorsā€¦
Do you have any suggestion to try?

Thanks that helps. The delay logic does not work as expected still (you can see ā€œwaited 0msā€)

Will look into it

edit: hereā€™s a fixed version : link . Hope it works! @Ferryb4

edit2: noted later that this fixes it only for tcp connection but not serial. Try the v5 for serial fix.

Dear @ssalonen thank you for a great work!
Now I have 5 slaves, 150 ms delay and poll = 4000 ms. It works!
Thank you again.

1 Like

@ssalonen still no improvement and still thereā€™s this strange performance discrepancy between normal and debug mode:

openhab.cfg: http://pastebin.com/YvjU58j2
logback confs are unchanged.
logs, normal mode: http://pastebin.com/2hj6i6np
logs, debug mode: http://pastebin.com/xWWTzsbP

measured performance, normal mode: 182 transactions per minute
measured performance, debug mode: 1666 transactions per minute

Best regards,
M

@mbs38

EDIT 2: There was still bug with the delays with serial connedctions which basically meant that the delay (50ms) was not exercised in practice (from the logs you can see that Waited 0ms (interBorrowDelayMillis 50ms) before giving returning connection SerialConnection). Please try the updated version (v5).

If that does not work, can you please try larger interTransactionDelay (the last number that is currently 50ms).

50ms might be too fast for the device, at least it was too much for my siemens plc (which should be industrial grade) which is using modbus tcp (not serial!).

Edit: Small thing but can you please add enable net.wimpi.modbus INFO level in logback.xml. It just basically makes it clear if transactions eventually succeededā€¦

Hi Ssalonen,

It works with v5.
I set the delay to 500ms
Now I have no more errors.

Thank you very much for your time and energy.

:muscle:

It didnā€™t work. With version 5 and this configuration for alI devices

modbus:poll=50
onnection=/dev/ttyS0:38400:8:none:1:rtu:280:80

I get a stable performance of 745 transactions per minute. Which sort of makes sense: 60*1000/80 = 750
Iā€™ve added INFO level to the logback.xml. Not a single [ERROR] in the logs.
Although 745 is about four times better than the version openhab currently ships with, itā€™s still very slow. Iā€™m convinced we can do better than that.

With 70ms transaction delay the performance goes below 300 transactions per minute.
Btw the difference between debug mode and normal mode is gone.

[quote=ā€œssalonen, post:65, topic:5246ā€]
50ms might be too fast for the device, at least it was too much for my siemens plc (which should be industrial grade) which is using modbus tcp (not serial!).
[/quote] That canā€™t be the problem here. The devices Iā€™ve got here have achieved >100 transactions per second (!), when they were connected to a PLC. Of course it also depends on the amount of Registers/Bits that is requested by the PLC, but this still rules out that the problem is with the devices.

In order to show what is possible with a single device Iā€™ve changed my configuration as follows:
modbus:poll=5
modbus:serial.agsBusMasterHolding.connection=/dev/ttyS0:38400:8:none:1:rtu:280:10
modbus:serial.agsBusMasterHolding.id=247
modbus:serial.agsBusMasterHolding.start=42
modbus:serial.agsBusMasterHolding.length=14
modbus:serial.agsBusMasterHolding.type=holding

I get a stable 1864 transactions per minute. Again no [ERROR] in the logs.
Just to clarify: 14 registers is by far the biggest chunk of data Iā€™m requesting among all the devices I have configured. The bottleneck must therefore be somewhere else.
We should make multiple devices work just as well as a single device. ~1900 transactions per minute is what what we should be aiming for (at least :slightly_smiling:).

If you need any further debug logs with multiple and single devices please say so. Iā€™m willing to help in any way I can.

Thanks for your work!
M

1 Like

Thank you for the cold hard facts. Iā€™m really satisfied that the binding now at least ā€œworksā€ even though the performance is not yet maximisedā€¦

With 70ms transaction delay the performance goes below 300 transactions per minute.

If I understand correctly, you start to get the EOF error with delays of 70ms? Or any other idea why performance decreases?

Now that the basic functionality works, could you try to decrease the read timeout to lower value, say 50ms. Iā€™m thinking It could improve the performance even though the first read might sometimes time out, perhaps it helps to just retry aggressively. Now the binding retries the transactions 3 times.

You could also try to lower transaction delay to the maximum performance you have witnessed, ie 10ms (100 transactions per second).

Looking forward how these tips work!

Now properly understood you single device benchmark. Really illuminating.

Can you share trace level logs with the single slave back multiple slave? How many slaves you have exactly with multi slave configuration? I would like to investigate whether the extra loss is due to overhead cause by the frameworks etc Iā€™m using ā€¦ We talking about so large performance alreadyā€¦

It might also help that I remove the trace level logging statements, they do incur small overhead. Iā€™ll have a look at this

single device conf: just the six lines Iā€™ve posted in my previous post (1 slave)
single device logs, debug: trace - Pastebin.com (1871 transactions / minute)

multiple device conf: conf multiple - Pastebin.com (13 slaves)
multiple device logs, debug: multipeltrace - Pastebin.com (730 transactions / minute)

The trace level logging doesnā€™t seem to hurt performance in any way.

Iā€™ll try lowering the timeouts later.

M

1 Like

Makes things worse. Long transactions, like reading 14 registers, never succeed.

M

@mbs38

OK, to summarize the situation we have now the following (documenting for future generations as wellā€¦ Perhaps we can snip out something for the wiki in the future as well)

Summary

1. single slave in modbus.cfg, low inter transaction delay (10ms), with generous read timeout (280ms), poll period practically minimal (5ms)

/dev/ttyS0:38400:8:none:1:rtu:280:10 (config) (as reported reported here and here)

  • performance good (logs), 1864 transactions per minute. Log snippet you pasted shows that on average we execute 1 transaction in 32ms (median 24ms), ie ~1875 transactions/minute[1]
  • Performance is slower than poll period 5ms, but that is quite expected and perfectly all right ā€“ the system reads as fast as it can in the sense that there are no ā€œexraā€ waits between transactions.

2. Many slaves (13) in modbus.cfg, high inter transaction delay (80ms), with generous read timeout (280ms). Poll slower 50ms**

(config)
modbus:serial.agsBusMasterHolding.connection=/dev/ttyS0:38400:8:none:1:rtu:280:80 (as reported here)

  • performance poor, average 1 transaction in 80ms, ie ~750 transactions/minute (logs)
  • performance is poor because we artificially ask the system to wait between transactions, on average 43ms is waited after each transaction
  • no errors since the delays are really conservative

3. Many slaves (13) in modbus.cfg, high inter transaction delay (80ms), with tight read timeout (50ms). Poll slower 50ms**

(same as this config but with different read timeout)
(as reported here)

  • poor performance, EOF errors
  • since same timeout goes for small (short) reads, and long reads (long) reads [3] (e.g. 14 registers), we get timeouts.

[1] grepped timestamps with: grep "to allow delay between transactions" single_device.log|cut -d' ' -f1
[2] grepped average wait time using: grep "to allow delay between transactions" multi-device.txt|cut -d' ' -f8
[3] Hereā€™s the corresponding place in code which makes the read block until 1) timeout occurs 2) all message bytes are read

Steps forward

Decreasing the read timeout was a bit poor advice, and definately that is not the issue since the single device configuration works extremely fast with read timeout of 280ms.

Based on the observations I think this might really work with multi device scenario (good general configuration instructions as well):

  • short poll period, e.g. 5ms. The binding will simply block until previous read is done, and immediately start the new poll round (checked this from openhab core code as well). If there is no requirement to update item status fast, this can be increased ofcourse.

  • generous read timeout, e.g. 280ms. When the line works, the read returns much faster. In case of errors (e.g. lost bit in transmission for whatever reason), the system waits 280ms before giving up. Default 500ms is pretty good default actually. With a lot of registers, the read timeout likely needs to be increased. If we get EOF errors, the value might be too low.

  • shorter inter transaction delay, e.g. 10ms. As has been proven by you in the single device scenario, even delay of ~30ms+some seems to work just fine with your device. For high performance requirements, this should be set as low as possible. If we get EOF errors, the value might be too low. This is likely hardest out of all parameters to tune in the sense that it depends on device performance as well.

Summarizing the above advice, can you try this kind of connection string with the multi-slave setup

 # - Short poll period (5ms).  Binding will simply block until previous read is done, and immediately start the new poll round
 #
modbus:poll=5
 # - Read timeout (280ms) needs to be enough for long queries to work out. When the line works, the read returns much faster.
 # - Short inter transaction delay (10ms). For high performance requirements, this should be set as low as possible. If we get EOF errors, the value is too low
 #
modbus:serial.X.connection=/dev/ttyS0:38400:8:none:1:rtu:280:10````


Best,
Sami

@ssalonen the summary is a good idea.

I have just tried: http://pastebin.com/5qfiMYG0

There is again a huge performance difference between start.sh and start_debug.sh:
normal: 195 transactions/minute, logs: http://pastebin.com/uPniRYmC
debug: 530 transactions/minute, logs: http://pastebin.com/6kquYwjj

However, there is obviously no improvement - in fact the opposite has happened.

Best regards,
M

Hi @mbs38

Can you try to increase the inter transaction delay, try first with 25ms (binding ensures that there is at least that duration time between transactions. 25ms worked with single slave as explained in the summary) and if you still get errors try even higher value e.g. 35ms.

I would expect it to work eventually, we need to find the smallest value that works. If the delay is too small some bits might be lost it seems and eventually we get EOF error. Performance is naturally decreased since each EOF error means that we have waited the read timeoutā€¦

Best
Ssalonen