@sipvoip
Now with improved logging I can comment more.
With the last log, there is only single “transaction id match” error, so that should not be the root cause after all.
The queries to to the slaves seem to be bottlenecked. For example, coils are read from 10.88.64.45
every ~10s.
$ grep -B5 "Sending request.*ReadCoils" ~/Downloads/modbus.log|grep 64.45|grep borrowing
2017-02-18 16:23:02 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55674]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 1 ms
2017-02-18 16:23:11 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55715]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 1 ms
2017-02-18 16:23:22 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55758]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:23:33 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55804]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 1 ms
2017-02-18 16:23:44 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55860]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 1 ms
2017-02-18 16:23:54 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55929]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:24:09 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=55990]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 1 ms
2017-02-18 16:24:21 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56042]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:24:32 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56099]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:24:42 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56145]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:24:55 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56197]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:25:08 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56253]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:25:21 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56320]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:25:31 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56367]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:25:43 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56412]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:25:55 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56467]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:26:07 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56531]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:26:18 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56585]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:26:30 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56621]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:26:41 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56679]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
I believe this matches the lag you observe.
We can analyze how long does the binding wait in order to get a connection (using command like: grep borrowing ~/Downloads/modbus.log|rev|cut -d' ' -f1-5|rev|grep 10.88.64.45
)
- 10.88.64.45: 0-1 ms
- 10.88.64.46: 1000ms or 2000ms often, just couple of times 1ms
- 10.88.64.47: 1000ms or 2000ms often, just couple of times 1ms
- 10.88.64.48: 1000ms or 2000ms often, just couple of times 1ms
- 10.88.64.49: almost always 1-3ms, once 1000ms
This means that the slaves with ip ending with 46
, 47
and 48
slow down polling in general. This is due to the fact that each “slave” (water, basement, etc.) are polled in turn, something like this:
- poll water (45)
- poll basement (46)
- poll basement_shared (46)
- poll basement_ponet (46)
- poll first (47)
- poll first_ponet (47)
- poll first_shared (47)
- poll second (48)
- poll second_ponet (48)
- poll second_shared (48)
- poll shed (49)
By default, connection is closed after each read and write request. By default, the binding waits 60ms before proceeding with request to the same endpoint (same ip and port).
So in reality the basic polling goes like this:
- poll water (45)
- poll basement (46)
- wait 60ms, poll basement_shared (46)
- wait 60ms, poll basement_ponet (46)
- poll first (47)
- wait 60ms, poll first_ponet (47)
- wait 60ms, poll first_shared (47)
- poll second (48)
- wait 60ms, poll second_ponet (48)
- wait 60ms, poll second_shared (48)
- poll shed (49)
I noticed that you had lot of writes happening at the same time, which means polling is “interrupted” due to the writes, something like this
- poll water (45)
- poll basement (46)
- wait 60ms, poll basement_shared (46)
- wait 60ms, poll basement_ponet (46)
- poll first (47)
- wait 60ms, write to ip .47
- wait 60ms, poll first_ponet (47)
- wait 60ms, write to ip .47
- wait 60ms, poll first_shared (47)
- poll second (48)
- wait 60ms, poll second_ponet (48)
- wait 60ms, poll second_shared (48)
- poll shed (49)
This makes the poll take longer time.
With this hypothesis I have been investigating what happens between two polls to the same ip. I’m looking the following time interval:
2017-02-18 16:26:30 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56621]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
2017-02-18 16:26:41 TRACE o.o.b.m.internal.ModbusSlave[:366]- ModbusSlave (water): borrowing connection (got TCPMasterConnection@66c735a8[socket=Socket[addr=/10.88.64.45,port=502,localport=56679]]) for endpoint ModbusTCPSlaveEndpoint@34524c60[address=10.88.64.45,port=502] took 0 ms
Let’s check out all connection attemps in this interval:
grep " connecting" ~/Downloads/modbus_log_2017_02_18_16_26_from_30_to_41s.txt|cut -d' ' -f1,2,15
2017-02-18 16:26:30 TCPMasterConnection@2297e31[socket=Socket[addr=/10.88.64.47,port=502,localport=58742]]
2017-02-18 16:26:31 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37163]]
2017-02-18 16:26:31 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34289]]
2017-02-18 16:26:32 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37170]]
2017-02-18 16:26:32 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34295]]
2017-02-18 16:26:33 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37176]]
2017-02-18 16:26:33 TCPMasterConnection@2297e31[socket=Socket[addr=/10.88.64.47,port=502,localport=58760]]
2017-02-18 16:26:33 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34305]]
2017-02-18 16:26:34 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37184]]
2017-02-18 16:26:34 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34313]]
2017-02-18 16:26:35 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37193]]
2017-02-18 16:26:35 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34320]]
2017-02-18 16:26:36 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37199]]
2017-02-18 16:26:36 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34323]]
2017-02-18 16:26:37 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37203]]
2017-02-18 16:26:37 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34328]]
2017-02-18 16:26:38 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37207]]
2017-02-18 16:26:39 TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34333]]
2017-02-18 16:26:39 TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37212]]
2017-02-18 16:26:40 TCPMasterConnection@2297e31[socket=Socket[addr=/10.88.64.47,port=502,localport=58795]]
grep " connecting" ~/Downloads/modbus_log_2017_02_18_16_26_from_30_to_41s.txt|cut -d' ' -f1,2,15|cut -d'=' -f3|sort|uniq -c
8 /10.88.64.46,port
3 /10.88.64.47,port
9 /10.88.64.48,port
During the ~10 second interval, 20 connections are made to other slaves.
We can confirm that all these connections need to wait the ~60ms period between transactions:
grep "delay between connections" ~/Downloads/modbus_log_2017_02_18_16_26_from_30_to_41s.txt|cut -d' ' -f1-2,5-6,15
2017-02-18 16:26:30 Waited 59ms TCPMasterConnection@2297e31[socket=Socket[addr=/10.88.64.47,port=502,localport=58742]]
2017-02-18 16:26:31 Waited 58ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37163]]
2017-02-18 16:26:31 Waited 58ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34289]]
2017-02-18 16:26:32 Waited 58ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37170]]
2017-02-18 16:26:32 Waited 59ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34295]]
2017-02-18 16:26:33 Waited 58ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37176]]
2017-02-18 16:26:33 Waited 57ms TCPMasterConnection@2297e31[socket=Socket[addr=/10.88.64.47,port=502,localport=58760]]
2017-02-18 16:26:33 Waited 57ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34305]]
2017-02-18 16:26:34 Waited 57ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37184]]
2017-02-18 16:26:34 Waited 57ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34313]]
2017-02-18 16:26:35 Waited 58ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37193]]
2017-02-18 16:26:35 Waited 57ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34320]]
2017-02-18 16:26:36 Waited 59ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37199]]
2017-02-18 16:26:36 Waited 57ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34323]]
2017-02-18 16:26:37 Waited 58ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37203]]
2017-02-18 16:26:37 Waited 58ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34328]]
2017-02-18 16:26:38 Waited 59ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37207]]
2017-02-18 16:26:39 Waited 58ms TCPMasterConnection@2a326a36[socket=Socket[addr=/10.88.64.46,port=502,localport=34333]]
2017-02-18 16:26:39 Waited 58ms TCPMasterConnection@17f95457[socket=Socket[addr=/10.88.64.48,port=502,localport=37212]]
2017-02-18 16:26:40 Waited 58ms TCPMasterConnection@2297e31[socket=Socket[addr=/10.88.64.47,port=502,localport=58795]]
Summary and next steps:
- the performance could be improved by having separate poll threads for different endpoints (ip+port). Something that has been discussed in the context of error handling before here in community threads… Something I can consider for the openhab2 refactoring I’m working on…
- currently it is hardcoded that at most single connection is allowed to the endpoint. Lifting this restriction or making it configurable should improve performance (assuming slave can deal with multiple clients – not all can). Something I can consider for openhab2.
- Assuming your slave can handle the requests, you should be able to improve performance by reducing the wait time between transactions by configuring
interTransactionDelayMillis
connection string parameter. Consult Advanced connection parameters (since 1.9.0) section in wiki for more details.
- If possible/feasible, reducing amouont of write commands certainly will improve the performance.
Let me know how it goes with the interTransactionDelayMillis
connection parameter. Remember to configure the same connection settings to all openhab “slaves” sharing the same endpoint (in your case, ip).
Best,
Sami