Modbus errors when upgrading to OH3. Wierd

Probably it’s the end of the transmission messing up. Most serial implementations timeout after a period of no data, and say “reception complete”. Noise on the line ruins that.
This might be the area that differs a little from OH2 to OH3.

Yes, it’s the one specified by the Modbus specification. Hopefully the same as your Arduinos?

I don’t think you have visibility of the CRC bytes via DEBUG.

I did a simple test, and bypassed the star:
(not connected to all the other lines/arduinos)
So raspberry is connected, through 1 single line, to 1 arduino.

as i suspected: Same result: 100% CRC errors

Hence it cannot be a HW problem!

Arduino software is build upon following libraries:

#include <ModbusRtu.h>
#include <Wire.h>       //DHT:
#include <dht.h>
#include <OneWire.h>    //OneWire, temp:
#include <DallasTemperature.h>

You’ve also proved that OH3 works just fine by your standards with one of the Arduinos.

If you really want to nail this, you will need a line analyzer.

I will bet a beer you would find a glitch at the end of every Arduino transmission, probably as the Tx is disabled.

no, actually I proved that OH3-binding doesnt work.
since it cant even handle the simplest, correct HW-setup:
a line between raspberry and arduino

Okay.

RS485 problems are surprisingly slippery things, confounding logic.

Anyway, now that you’ve proved that OH3 is totally broken (but only for you), what are you going to do now?

Researched again. I am not the only one, who has problem with OH3-binding :

https://community.openhab.org/t/modbus-binding-not-working-on-oh3/111619
https://github.com/openhab/openhab-core/pull/2272

I have a zillion small ideas/tweaks. But no smoking gun
(thanks for your very interesting, educationg points about modbus)
But for noew. it is sooooo much easier to turn to planB:

  • use OH2, that works like a charm (at least for modbus)
  • wait for updated modbus binding on OH3.1 stable release
    I see they just installed a fix in snapshot version (not in 3.1.M3 that i tried)

The fix is only to workaround deadlock situation that might happen when serial port is closed and reopened . I would not expect it to help with crc and other serial errors.

1 Like

“I’ve noticed a number of owners of car model XYZ have fixed their flat tyre problem by changing the wheel. My XYZ is on fire, so I’ve ordered a spare wheel. Hope it arrives soon!”

I tried the snapshot yesterday, and yes: no succes. (still CRCerrors)

ssalonen - perhaps you have an idea, about where the problem is?

1.Raspberry4
2.CP2102 6 in 1 Multi-functional Serial Module Adapter
3.SINGLE cat5e cable. A+B in twisted pair (no star!)
4. max485module (terminated with 120ohm)
5. Arduino

And this setup works in OH2

I have no idea.

What I can say is that serial library (nrjavaserial) used by all OH3 bindings has been updated. OH3 uses now unmodified upstream version of nrjavaserial.

This seemed to introduce the deadlock bug I mentioned above. Perhaps there are other regressions as well.

You could try to check id there are reports in nrjavaserial github repo.

Problem (somehow) solved! (yeah :slight_smile: )

spoiler: It WAS a software problem.

Symptoms:
In my arduino (mini)code, i read a lot of temp-sensors, on all digital pins (2-12).
tempereture is written (x10) as a signed int16 (27.1C as 271)
My arduino code writes error values as negative numbers -10001 to -10005). To same int16 registers.
(OH3-data-thing registers are also defined as int16 (se above)

Somehow, OH3 (but not OH2!) gives CRCerrors for SOME negative values(!)

  • like -10002 , -10004 (for no sensors attached.)
  • but not -1002, -10020

My guess:
There is a bug somewhere in OH3. Maybee in the CRC-calculation

Personal strategy:
A) Stay on OH2 for a while, and wait for a fix in OH3
B) Change errorcodes numbers (to like -10021 to -10025) on all 10 arduinos in house, to satisfy OH3

PS.
This also explains why just 1 arduino worked on OH3. That particular arduino had sensor in all pins, hence no -10004

1 Like

That’s a really good step forward, good find.

There are no negative numbers in Modbus transactions,just 16-bit registers. The “user” at either end decides what interpretation to put on any 16-bit pattern (or indeed bundle up two into a 32-bit pattern etc.)

The CRC works purely on the 16-bit register level, has no idea what the contents represent.
Now it might be that CRC calculation muffs up handling carry from top bit; that would show up as apparently connected to stuff that you interpret as ‘negative’. But its just a pattern with a particular bit set.

Bear in mind there are two CRC calculations involved, one at each end of a transaction.
Have you considered the Arduino might get it wrong, and OH2 has the deficiency in not checking it properly?

To sort all this out is going to take detailed work looking at hex code.
Got an old laptop you can rig as a serial monitor and see what is really on the wire? Going to save a lot of finger pointing by direct observation.

@ssalonen - I have an idea CRC bytes don’t get shown in TRACE level messages? If true, would it be possible to show these in case of CRC error detection so that openhab’s view of received data is complete?

Agreeing what @rossko57 is saying, the code should work with bytes only. There should not be any change to CRC calculation in OH3, so it’s quite odd.

I checked the code and it actually logs the whole message as well, including the CRC

Would be nice to see what is the Arduino sending and compare the CRCs to expected. There’s online tools you should be able to use, found e.g. this one Online Checksum Calculator - SCADACore . @pete3 can you enable DEBUG level logging for net.wimpi.modbus.io. ModbusRTUTransport and then we know more.

@rossko57 But it is true that when CRC occurs we are not logging expected CRC vs actual CRC:

Good suggestion, I can have a look at it (EDIT: error message will include invalid CRC in the future Serial/RTU to error immediately when all bytes are not received and more explicit logging of invalid CRC bytes by ssalonen · Pull Request #15 · openhab/jamod · GitHub)

1 Like

It’s the library, that handles all Arduino modbus. (i am on v1.0.3)

made a small test program:

#include <ModbusRtu.h>
#define TXEN	3
//int16_t au16data[4] = { 1, 2, 3, -10024 };    //no errors
int16_t au16data[4] = { 1, 2, 3, -10004 };    //CRC errors
Modbus slave(7, Serial,TXEN); // this is slave @1 and RS-485

void setup() {
  Serial.begin( 2* 9600 ); //arduino mini
  slave.start();
}
void loop() {
  slave.poll( au16data, 4 );
}

=> CRC-error:

2021-04-21 15:51:33.538 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 1 out of 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=7, functionCode=READ_MULTIPLE_REGISTERS, start=0, length=4, maxTries=5]). Will try again soon. Error was I/O error, so reseting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: 
IOException CRC Error in received frame: 9 bytes: 
07 03 08 00 01 00 02 00 03  
[operation ID ef235475-fe3d-44f0-bce3-8d6e6eb2ab10]

And the things in OH3:

Bridge modbus:serial:Arduino1 [port="/dev/ttyUSBarduino",baud=9600,id=7, dataBits=8,parity="none",stopBits="1.0",encoding="rtu",echo=false,receiveTimeoutMillis=5000, connectMaxTries=3] {
    Bridge poller ArdWC01 [ start=0, length=4, refresh=16011, maxTries=5, type="holding"] {
        Thing data n0 [ readStart="0", readValueType="int16" ] //pin4   lilla_rød 
        Thing data n1 [ readStart="1", readValueType="int16" ] //pin4   lilla_rød 
        Thing data n2 [ readStart="2", readValueType="int16" ] //pin4   lilla_rød 
        Thing data n3 [ readStart="3", readValueType="int16" ] //pin4   lilla_rød 
    }
}

Will see if i get time to look more into it tomorrow.
(The code inside library and bindings is way beyond my head though)

@pete3 please repeat with the suggested openhab logging. Then we can see what crc bytes are Sent over the wire.

I linked the java library code so you see how it will look like in the logs.

It should help us to establish whether the issue is in the java modbus library or in C arduino modbus library :sweat_smile:

with { 1, 2, 3, -10004 }:

2021-04-21 19:27:04.908 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 07 03 00 00 00 04 44 6f 
2021-04-21 19:27:09.975 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - awaited 10 bytes, but received 8
2021-04-21 19:27:09.979 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response: 07 03 08 00 01 00 02 00 03 d8 d2 
2021-04-21 19:27:09.981 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 07 03 00 00 00 04 44 6f
2021-04-21 19:27:09.983 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: CRC Error in received frame: 9 bytes: 07 03 08 00 01 00 02 00 03

with { 1, 2, 3, -10024 }:

2021-04-21 19:32:27.110 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 07 03 00 00 00 04 44 6f 
2021-04-21 19:32:27.131 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response: 07 03 08 00 01 00 02 00 03 d8 d8 48 05

seems like bug is on Arduino side?

Actually, you are not receiving all the bytes expected with the problematic data set.

  • with { 1, 2, 3, -10004 }:

logs say awaited 10 bytes, but received 8 which means that 2 bytes were not received. This also probably explains why it took several seconds between Sent: ... and awaited ... log lines – openHAB side finally gave up waiting for the bytes.

This also shows the logging is quite bad, it should just bail out more violently when it does not receive all the bytes (EDIT: made correction for this so it’s more clear for the user Serial/RTU to error immediately when all bytes are not received and more explicit logging of invalid CRC bytes by ssalonen · Pull Request #15 · openhab/jamod · GitHub)

  • with { 1, 2, 3, -10024 }:

all bytes are received very fast (20ms) and CRC 48 05 matches up with the payload.


I’m quite stunned. And you used the simple arduino program to test all this? Does the CRC error and awaited ... log message happen each and every request/response, or only randomly?

1 Like

-10004 should show up as d8ec, not d8d2. If you are confident about expecting -10004, it looks like it has crapped out part way through that byte, not just forgetting CRC or getting it wrong.

1 Like

Well … that must be the long setting of my thing: “receiveTimeoutMillis=5000”

Yes, these are the responses, to the small code on post24

The behavior is 100% consistent (though i only check for about 1 min).

  • Either only CRCerror. (-10004)
  • or flawless readings (-10024)

so … bad Arduino-library?

1 Like

Try a different poll; same suspect register, length=1 or something.

I stand by “RS485 does weird stuff” so it is worth a try to see if thesame bit pattern gets through in a different packet structure. It’s really unlikely a genuine wire problem would be this consistent though.

1 Like