Modbus TCP Binding Version 2 Crashes in OH2.4 RPi3A Setup

ion1 · January 9, 2020, 11:04pm

Yea … I was wondering whether Grafana/Influx might be an issue and that was going to be one of my next steps … to take it out of the equation.

But I do like the pretty graphs.

Kim_Andersen · January 9, 2020, 11:09pm

You can still have your graphs… Just move Grafana to another server…
I just noticed you´re using an Rpi3A (I didnt even know there was an A model). I´m using a Rpi3B+ on my main system. Thats the same Rpi which had the major issue running OH 2.4 with Grafana rendering images using phantomJS.
So if you´re using a Rpi3A, I would say for sure you are pushing it way above its limits… But give it a try by stopping grafana first (no need to stop Influxdb… It hardly uses any resources).

There may be another option regarding image rendering of Grafana… They dumped the phnatomJS and made something new. There should be a tutorial here somwhere… I have not tried myself, cause I choose to move Grafana insted, already having a windows server running doing nothing

gitMiguel · January 9, 2020, 11:09pm

Now theres a bunch of reasons why your system crashes. RPi is getting a bit lightweight.

What does mbmain.py do?

ion1 · January 9, 2020, 11:12pm

What does mbmain.py do?

MBLogic software … we have adapted it a bit.

ion1 · January 9, 2020, 11:20pm

OK … one of our setups is as follows …

rpi3A running OH2.4 … c/w Grafana/Influx … ip xxx.25 Modbus 2.x TCP Master wifi to …
rpi0 running our modbus software server slave ip xxx.120 (no OH2) …

This setup has been running with no errors for 40+hrs …

htop as follows …

Note swap mem ~50% …

Me thinks the problem lies with running oh2.4 and our modbus system together on one rpi3A using the 127.0.0.1 ip.

Perhaps its just too much for the rpi3A setup …

Will remove Grafana as Kim advises and see how that goes

rossko57 · January 9, 2020, 11:24pm

This won’t make it better, possibly worse. Properly set up Modbus error recovery adds delay to any remaining working parts of your modbus system, extending transaction queues.
If want to think about this properly, please see guide -

But in truth with one Modbus slave as you have shown us, I would leave retry settings at default
Messing with this will not fix your problem.

Get rid of modbus.cfg. It’s neither use nor ornament. Make very sure that you do not have v1 binding installed alongside v2.

Okay, so you have one slave, a half dozen pollers at fairly high poll rates, and many tens of Item? I am guessing all those data Things are linked to Items?
This is asking a lot of an RPi3. It’s not too much at all - but there are big performance traps set for you. (Especially if you are running your slave on the same RPi). There is not enough performance to allow you to be careless here.

Modbus binding defaults are suited for unfamiliar users to get a simple project going, you’ve moved beyond that and should take steps to tune your system… All those Items updating every half second is stressing out the rest of your system - are some/all persisted?
I recommend to read and implement here -

Reviewing the performance guide, I feel should add a additional section about TCP behaviour. Again, the defaults get you started but need proper consideration in a limited resource environment - which can include the target Modbus slaves, not just the openHAB host.
This is probably the cause of your logged TCP errors, but has likely nothing at all to do with system issues.

I’ll add this when I’ve thought about it, but you’ve got plenty to get on with and I think that is more relevant.

And yes, v2 binding has different potential stress points and different settings to deal with them … because it’s different.

ion1 · January 9, 2020, 11:30pm

OK … many thanks for this … loads of homework to catch up on.

Will start afresh and let you all know how it goes.

laters

rossko57 · January 9, 2020, 11:35pm

Nooo … it’s case of looking over what you have, thinking about what you’re doing and why, and fine tuning by adjustment.

Example; are you persisting on every update or every change? Why? (There might be good reasons for some, but not others)
If you are persisting everything you’ve almost certainly not yet thought it through.

“But it’s a Modbus problem!” - well Modbus is an industrial scale protocol that will happily stress all the other parts of the system that you’ve linked it’s data too.

Kim_Andersen · January 9, 2020, 11:42pm

I believe the whole system is stressed. Its an Rpi3A running way above its capabilities. Fine tuning modbus may help a little. But I still believe his system is highly stressed.
Next thing may be, that its running on a bad (slow and old) SD card. That would be the top argument

rossko57 · January 10, 2020, 1:18am

More than you might expect. This is about adjusting the water tap to control the flow into the leaky bucket. There’s nothing wrong with the tap, but it is capable of making the bucket brim over. By default, the tap is turned fully on.

Kim_Andersen · January 10, 2020, 9:19am

You may be right. I´ve only got two modbus devices. So I´m not suffering from the same issue, yet.

ion1 · January 16, 2020, 2:58am

Sage point and going through it all now.

‘With great persistence comes great responsibility.’ Spider-man.

thx

ps. new updated oh2.5 system and setup now running no errors 60+ hrs

ion1 · January 16, 2020, 3:13am

Well spotted … I think this is our main issue … as OH doesn’t seem to support rpi3A?

It works from our rpi3B burn image and the rpi3a has tons of cpu resources etc available but seems like there is a disconnect eventually causing a crash?

If this is true … any plans for OH to do img for the rpi3A range … it has better small form factor, lower cost and similar cpu spec to rpi3b range.

cheers

Kim_Andersen · January 16, 2020, 9:39am

I really wonder how many have thoughts about the Rpi3A. As mentioned, I didn´t even know it was exsisting. But it would be interesting to know, howcome it crashes and Rpi3B doesnt.

ion1 · January 16, 2020, 11:26am

Here’s the baby …

https://www.raspberrypi.org/products/raspberry-pi-3-model-a-plus/

rossko57 · January 16, 2020, 12:43pm

Okay, I doubt that has anything to do with 2.4 → 2.5

Did you address update rates, persistence, etc? I am convinced most issues like this are about pouring data into an openHAB design not optimized for this kind of business, resulting in i/o bottlenecks on sluggish hardware. Persistence, logging, Grafana, all queuing and competing for same resources.
Of course it shouldn’t crash, but at least one of these moving parts (including the OS and Java) can be expected to have bugs under stress.

I’m reminded I owe a blurb about Modbus-TCP tweaking, separate business but closely related. I started, but must finish that

rossko57 · January 23, 2020, 12:28am

I have added some “TCP tuning” info to the Modbus Performance tutorial thread referenced earlier.
This will likely make little difference at the oipenHAB end, but every little helps.

ion1 · January 27, 2020, 9:39am

Thanks for this TCP update.

The system has been running constantly for 100+ hrs with not.one single.error.log

As previous, pretty sure the issue looks to have been an incompatibility of the OH/ionware system with the new rpi A range SBC.

The new combined OH2.5 and ionware Plug&Play SD card image c/w iHome Smart Home softlogic control algorithm free download now available here …

Free sample(s) to test of the ionC1 control board to any educators out there.

enjoy!

ion1 · February 1, 2020, 1:37am

From original version of our ionware rpi0 system c/w Grafana/influx … the cpu was running 100%+ … it worked well but was overburdened.

Our latest version uses rrd4j and Chart item …

Not as pretty as Grafana … but it’ll do.

And cpu load <90% … so all good.

laters

Kim_Andersen · February 1, 2020, 2:42pm

I really dont fancy the rrd4j and chart, to be honest. But if it works for you, then all is fine, and great you got it sorted