Possible performance of bindings?

rossko57 · June 28, 2018, 11:27pm

Wait … do you mean you are hitting the slaves from other Masters at the same time as OpenHAB?

erniemur · June 29, 2018, 1:06am

Yes and no - it makes no difference. The modbus slave opens a new thread for each new socket communication.
%

erniemur · June 29, 2018, 1:12am

My collegue Felix is doing most of the tests and he is also reading here but might not be registered yet. I will not be available until monday.

ssalonen · June 29, 2018, 5:16am

Sounds like you might be suffering from the issue explained in this post

What do you think? Do you get warnings logged because of large queue size?

Best,
Sami

erniemur · June 29, 2018, 7:12am

Yes, I think this could be the problem. My collegue will provide more details later. On the raspberry there were errors logged about lost events or so and a size of 560 queue size, events or threads, I cannot remember exactly. Sometimes the system can catch up.and from this time on there is no delay anymore at least on the faster pc.
It seems that there are no counter values lost but the values arrive randomly at the item.

ssalonen · June 29, 2018, 2:38pm

Please try out with the new version, for more details, see this post: Modbus openHAB2 binding available for alpha testing

Sami

awiflx · July 2, 2018, 11:16am

Hey, so I can now provide some additional information.

The following applies to the RaspberryPi setup.
I did upgrade to the newest version of the binding (2.4.0.201806291551 that is, I hope).
I reduced the number of items to 1 per thing, just for now.
Started the Server with 120 things and items, letting it run for about 30 min.
Here is the logfile for these 30 min: openhab.log (json, google drive)

The first 20 items did get initialized, but the 100 others didn’t. The amount of tasks in the callbackThreadPool was at 2200.

CPU usage is at about 10-15%, Memory at about 20%
So that’s very different from what we experienced before, where the CPU usage would always go up to 350% to 400% but the callbackThreadPool would consist of 500-600 tasks, after 30 min.

I will do additional testing with the tips @rlkoshak provided and keep you updated.
Also there is gonna be testing on the number of items per poll affecting the behaviour.

Felix

ssalonen · July 4, 2018, 9:07am

@awiflx, I can confirm that you are using the latest binding.

The caching is mostly applied since the REFRESH commands come more than 50ms (the default) apart:

$ grep 'received refresh() but the cache is not applicable. Polling new data' ~/Downloads/openhab.txt|wc -l
201
$ grep 'cache was recent enough (age at most 50 ms). Reusing old response' ~/Downloads/openhab.txt|wc -l
37

Also, the overall number of REFRESH does not explain the large queue – only 201 poll tasks are queued this way

$ grep 'Scheduling one-off poll' ~/Downloads/openhab.txt|wc -l
201

So it must be something else

I don’t see any log messages from the binding, are you sure you have configured the logging as explained in the logs: Enable verbose logging?

Please

ensure log configuratoin
also enable org.eclipse.smarthome.core.common.QueueingThreadPoolExecutor
share item and thing definitions (I can try to reproduce)

Best,
Sami

awiflx · July 4, 2018, 9:51am

@ssalonen, allright i will try and go through that step by step.
First off, thank you for taking the time and being so patient about this.

I got the logging configured like this in the org.ops4j.pax.logging.cfg
(i did this through karaf with: log:set TRACE org.openhab.binding.modbus - all others according to this)

#Modbus logger
log4j2.logger.org_openhab_binding_modbus.level = TRACE
log4j2.logger.org_openhab_binding_modbus.name = org.openhab.binding.modbus
log4j2.logger.org_openhab_io_transport_modbus.level = TRACE
log4j2.logger.org_openhab_io_transport_modbus.name = org.openhab.io.transport.modbus
log4j2.logger.net_wimpi_modbus.level = TRACE
log4j2.logger.net_wimpi_modbus.name = net.wimpi.modbus
log4j2.logger.org_eclipse_smarthome_core_common_queueingthreadpoolexecutor.level = TRACE
log4j2.logger.org_eclipse_smarthome_core_common_queueingthreadpoolexecutor.name = org.eclipse.smarthome.core.common.QueueingThreadPoolExecutor

The definitions are like this:
dThing.json (10.0 KB)
dItem.json (14.1 KB)
and are created through a bash-script

I will now run the test again and post you the log file received.
Guess 10 to 15 min should be enough time?

Felix

Edit:
Tasks at ~900
CPU at 10-15%, Mem at 15-20%, Active threads at 5
Here is the log
openhab.log
And that’s what it looked like in PaperUIs Control

ssalonen · July 4, 2018, 10:18am

Thanks!

Few minutes should be enough!

awiflx · July 4, 2018, 10:41am

Edited the post above, so everything is packed together

ssalonen · July 4, 2018, 10:52am

Thanks… still a myster though.

Can you paste here output of threads karaf console command?

awiflx · July 4, 2018, 11:08am

like this?
threads.json (65.5 KB)

ssalonen · July 4, 2018, 11:48am

Thank you! I think I found the root cause – it’s a good old deadlock situation which halts all handling of the data, eventually resulting in increasing queue.

I actually managed to reproduce the issue, although not consistently (must be a timing thing)

I now pushed a fix, and it should be brewing here

awiflx · July 4, 2018, 12:55pm

Ok, thank you big time!

I will get on testing the fix tomorrow and hit you up with anything I come up with.

Keep up the amazing work!

awiflx · July 6, 2018, 11:09am

So I finally did some testing on the fix you did. Performance is much better now, all Items get initialized and the CPU is getting load (Up to 70-80% which is running really smooth on the PI compared to what happened before).

What I did recognize after all is that some values still arrive “late”. There is a counter adding up once per second, and the poll getting the data once per second. So it should add up and be polled like it. But it seems the commands in queue get a little mixed up, so that there are some “old” values arriving after the “new” ones. Which is, I guess, not intended?

I will check if I can work around it by doing a more intelligent setup design next week, just wanted to state it out.
I am very grateful for what you did there and it really helps us out right now.

Wishing you a fantastic friday,
Felix

ssalonen · July 6, 2018, 11:36am

Thanks, awesome to hear we are heading the right direction!

Where is this counter? In case it is in openHAB, perhaps you need to adjust autoupdate parameter in openHAB?

For more info, consult these posts

rossko57 · July 7, 2018, 1:01am

From earlier convo, @awiflx has set up test counters incrementing in his Modbus-TCP slaves.

As I understand the Modbus binding, it works a strictly one-at-a-time process to query-response the slave(s).
Even if the outgoing query queue got jumbled, the responses would still appear to be in the correct sequence so far as the test counter goes.

So any jumbling of order of arrival ought to be on the OpenHAB event bus side?
Speculation - multithreaded task to process incoming data interrupted by next data arrival, second data may be put on event bus before first?

It’s possible the event bus queueing is not FIFO, nobody would normally notice or care much.

ssalonen · July 7, 2018, 10:06am

Thanks @rossko57!

In that case it would help me to see verbose logs, and full item and thing configurations

Best
Sami

erniemur · July 7, 2018, 10:19am

The counter is just a number which is incremented every second and transferred to the modbus client by an external software. I would consider such a behaviour as a bug. Imageine a switch off and a switch on command come in the wrong order.
As Felix has shown the problem, the old counter which arrived was approx. 100 seconds old, e.g. 1233, 1234, 1235, 1144,1246, …