Possible performance of bindings?

ssalonen · July 4, 2018, 10:18am

Thanks!

Few minutes should be enough!

awiflx · July 4, 2018, 10:41am

Edited the post above, so everything is packed together

ssalonen · July 4, 2018, 10:52am

Thanks… still a myster though.

Can you paste here output of threads karaf console command?

awiflx · July 4, 2018, 11:08am

like this?
threads.json (65.5 KB)

ssalonen · July 4, 2018, 11:48am

Thank you! I think I found the root cause – it’s a good old deadlock situation which halts all handling of the data, eventually resulting in increasing queue.

I actually managed to reproduce the issue, although not consistently (must be a timing thing)

I now pushed a fix, and it should be brewing here

awiflx · July 4, 2018, 12:55pm

Ok, thank you big time!

I will get on testing the fix tomorrow and hit you up with anything I come up with.

Keep up the amazing work!

awiflx · July 6, 2018, 11:09am

So I finally did some testing on the fix you did. Performance is much better now, all Items get initialized and the CPU is getting load (Up to 70-80% which is running really smooth on the PI compared to what happened before).

What I did recognize after all is that some values still arrive “late”. There is a counter adding up once per second, and the poll getting the data once per second. So it should add up and be polled like it. But it seems the commands in queue get a little mixed up, so that there are some “old” values arriving after the “new” ones. Which is, I guess, not intended?

I will check if I can work around it by doing a more intelligent setup design next week, just wanted to state it out.
I am very grateful for what you did there and it really helps us out right now.

Wishing you a fantastic friday,
Felix

ssalonen · July 6, 2018, 11:36am

Thanks, awesome to hear we are heading the right direction!

Where is this counter? In case it is in openHAB, perhaps you need to adjust autoupdate parameter in openHAB?

For more info, consult these posts

rossko57 · July 7, 2018, 1:01am

From earlier convo, @awiflx has set up test counters incrementing in his Modbus-TCP slaves.

As I understand the Modbus binding, it works a strictly one-at-a-time process to query-response the slave(s).
Even if the outgoing query queue got jumbled, the responses would still appear to be in the correct sequence so far as the test counter goes.

So any jumbling of order of arrival ought to be on the OpenHAB event bus side?
Speculation - multithreaded task to process incoming data interrupted by next data arrival, second data may be put on event bus before first?

It’s possible the event bus queueing is not FIFO, nobody would normally notice or care much.

ssalonen · July 7, 2018, 10:06am

Thanks @rossko57!

In that case it would help me to see verbose logs, and full item and thing configurations

Best
Sami

erniemur · July 7, 2018, 10:19am

The counter is just a number which is incremented every second and transferred to the modbus client by an external software. I would consider such a behaviour as a bug. Imageine a switch off and a switch on command come in the wrong order.
As Felix has shown the problem, the old counter which arrived was approx. 100 seconds old, e.g. 1233, 1234, 1235, 1144,1246, …

erniemur · July 7, 2018, 10:34am

I think it could make big problems, if the order of events is not defined.
It should not be allowed that an earlier event from the same source arrives later than the most recent.

rossko57 · July 7, 2018, 10:56am

There are quite a few areas in OpenHAB where order of events is NOT defined, by design. Suppose you have two rules triggered by event X. Which wil start first? Which will end first?

That seems like the important part. In particular, with your one-second polling intervals, it certainly ought to be that way.

Ah, that is interesting. Where has it been all that time?

Could you clarify the counter a bit? I thought it was generated in the slave PLC(?) that OpenHAB was polling. But now it sounds like it is written to the slave across Modbus-TCP by a third-party master. There would be scope for that process to get disordered too. What could happen if an update write failed, and got retried later?

erniemur · July 9, 2018, 7:35am

The slave is an embedded linux (ARM Cortex A9) with a self-programmed modbus server (which uses the rllib from pvbrowser.org). The counter is transferred from an application by one thread directly via shared memory into the shared memory of the modbus server. Access is locked via a mutex for both applications. We would also have a counter generaed by the modbus server itself. At the time the jumps occured, there was also a connection via modbus poll application from a windows pc where we could not see such jumps.
I also do not know if the counter was just late or if it came twice. We will make some checks.

erniemur · July 9, 2018, 8:28am

Felix saw this issue on the rasperry pi with openhabian. Now I tried on my virtual linux machine (openhab-2.3.0) and at the beginning there was no problem, but after I made a system:shutdown, old counters arrived. After it catched up, everything was ok. Note, the interval of both, modbus counter and openhab was not exactly one seconds, therefore it is normal if some number are left out:

2018-07-09 10:08:07.242 [vent.ItemStateChangedEvent] - dItem_0119_0001 changed from NULL to 0
2018-07-09 10:08:07.906 [vent.ItemStateChangedEvent] - dItem_0120_0001 changed from NULL to 0
2018-07-09 10:08:08.605 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6788 to 6787
2018-07-09 10:08:09.991 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6787 to 6789
2018-07-09 10:08:11.560 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6789 to 6787
2018-07-09 10:08:11.997 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6787 to 6786
2018-07-09 10:08:11.997 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6786 to 6787
2018-07-09 10:08:11.998 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6787 to 6789
2018-07-09 10:08:11.999 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6789 to 6790
2018-07-09 10:08:12.080 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6790 to 6788
2018-07-09 10:08:12.141 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6788 to 6789
2018-07-09 10:08:12.142 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6789 to 6790
2018-07-09 10:08:12.327 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6790 to 6791
2018-07-09 10:08:12.332 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6791 to 6787
2018-07-09 10:08:12.333 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6787 to 6792
2018-07-09 10:08:12.508 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 6792 to 6790

After this I changed the poll frequency from 1000 ms to 100 ms and made a shutdown.
After restart, the counter was at the beginning 10 seconds behind the real value. Look what happended:

2018-07-09 10:24:08.081 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7737 to 7738
2018-07-09 10:24:11.694 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7738 to 7737
2018-07-09 10:24:12.206 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7737 to 7738
2018-07-09 10:24:12.954 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7738 to 7739
2018-07-09 10:24:13.143 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7739 to 7738
2018-07-09 10:24:15.925 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7738 to 7739
2018-07-09 10:24:16.317 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7739 to 7737
2018-07-09 10:24:16.321 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7737 to 7739
2018-07-09 10:24:17.002 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7739 to 7738
2018-07-09 10:24:17.280 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7738 to 7739
2018-07-09 10:24:17.628 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7739 to 7740
2018-07-09 10:24:17.681 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7740 to 7738
2018-07-09 10:24:17.773 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7738 to 7740
2018-07-09 10:24:18.554 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7740 to 7739
2018-07-09 10:24:18.834 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7739 to 7740
2018-07-09 10:24:19.722 [vent.ItemStateChangedEvent] - dItem_0057_0001 changed from 7740 to 7739

The same value arrived more than once in a random order.

Felix is not here at the moment. I have uninstalled and reinstalled modubs binding and modbus transport. I am not sure if I have the latest version. paer ui shows for Modbus binding: market:binding-3528471 - 1.0 and for Modbus transport: market:binding-3528475 - 1.0

But anyway, it seems to be the same problem as felix had - and he was sure that it was the bugfixed version.

awiflx · July 9, 2018, 9:50am

@rossko57 @ssalonen
So I did check on that idea of the update write failing and being retried later by setting the maxTries in the poller Bridges definition to 1, so there would be no retrying to send or pull values after the first try.
But values still arrived “late”

Binding Version is 2.4.0.201807041154 for the Modbus binding and 2.4.0.201806291551 for the Transport Bundle

ssalonen · July 9, 2018, 10:23am

Hi!

I can have a look into these issues but really just need the 1) verbose logs 2) items 3) things with latest versions of binding and transport. I assume there are no rules in openhab.

Best
Sami

awiflx · July 9, 2018, 10:36am

Hey

I´ll get to it.
Guess the same procedure as last time is fine.

awiflx · July 9, 2018, 11:24am

Time span is around 8 min

openhab.log.1
openhab.log.2
openhab.log.3
dItem.json (14.1 KB)
dThing.json (10.1 KB)

CPU was at around 300 to 370% (so at ~80 per core), MEM was at around 20%

You´re right, no rules to this point.

ssalonen · July 12, 2018, 4:12pm

Hi,

finally had a moment to look into the logs. I don’t see any counter increments with dItem_0057_0001 (modbus:data:gasanalyse:pull:19954124:number)?

Something wrong with the setup?

Could you please repeat such that the counter increases? Please also attach event.log.

Best,
Sami