Modbus errors when upgrading to OH3. Wierd

I have no idea.

What I can say is that serial library (nrjavaserial) used by all OH3 bindings has been updated. OH3 uses now unmodified upstream version of nrjavaserial.

This seemed to introduce the deadlock bug I mentioned above. Perhaps there are other regressions as well.

You could try to check id there are reports in nrjavaserial github repo.

Problem (somehow) solved! (yeah :slight_smile: )

spoiler: It WAS a software problem.

Symptoms:
In my arduino (mini)code, i read a lot of temp-sensors, on all digital pins (2-12).
tempereture is written (x10) as a signed int16 (27.1C as 271)
My arduino code writes error values as negative numbers -10001 to -10005). To same int16 registers.
(OH3-data-thing registers are also defined as int16 (se above)

Somehow, OH3 (but not OH2!) gives CRCerrors for SOME negative values(!)

  • like -10002 , -10004 (for no sensors attached.)
  • but not -1002, -10020

My guess:
There is a bug somewhere in OH3. Maybee in the CRC-calculation

Personal strategy:
A) Stay on OH2 for a while, and wait for a fix in OH3
B) Change errorcodes numbers (to like -10021 to -10025) on all 10 arduinos in house, to satisfy OH3

PS.
This also explains why just 1 arduino worked on OH3. That particular arduino had sensor in all pins, hence no -10004

1 Like

That’s a really good step forward, good find.

There are no negative numbers in Modbus transactions,just 16-bit registers. The “user” at either end decides what interpretation to put on any 16-bit pattern (or indeed bundle up two into a 32-bit pattern etc.)

The CRC works purely on the 16-bit register level, has no idea what the contents represent.
Now it might be that CRC calculation muffs up handling carry from top bit; that would show up as apparently connected to stuff that you interpret as ‘negative’. But its just a pattern with a particular bit set.

Bear in mind there are two CRC calculations involved, one at each end of a transaction.
Have you considered the Arduino might get it wrong, and OH2 has the deficiency in not checking it properly?

To sort all this out is going to take detailed work looking at hex code.
Got an old laptop you can rig as a serial monitor and see what is really on the wire? Going to save a lot of finger pointing by direct observation.

@ssalonen - I have an idea CRC bytes don’t get shown in TRACE level messages? If true, would it be possible to show these in case of CRC error detection so that openhab’s view of received data is complete?

Agreeing what @rossko57 is saying, the code should work with bytes only. There should not be any change to CRC calculation in OH3, so it’s quite odd.

I checked the code and it actually logs the whole message as well, including the CRC

Would be nice to see what is the Arduino sending and compare the CRCs to expected. There’s online tools you should be able to use, found e.g. this one Online Checksum Calculator - SCADACore . @pete3 can you enable DEBUG level logging for net.wimpi.modbus.io. ModbusRTUTransport and then we know more.

@rossko57 But it is true that when CRC occurs we are not logging expected CRC vs actual CRC:

Good suggestion, I can have a look at it (EDIT: error message will include invalid CRC in the future Serial/RTU to error immediately when all bytes are not received and more explicit logging of invalid CRC bytes by ssalonen · Pull Request #15 · openhab/jamod · GitHub)

1 Like

It’s the library, that handles all Arduino modbus. (i am on v1.0.3)

made a small test program:

#include <ModbusRtu.h>
#define TXEN	3
//int16_t au16data[4] = { 1, 2, 3, -10024 };    //no errors
int16_t au16data[4] = { 1, 2, 3, -10004 };    //CRC errors
Modbus slave(7, Serial,TXEN); // this is slave @1 and RS-485

void setup() {
  Serial.begin( 2* 9600 ); //arduino mini
  slave.start();
}
void loop() {
  slave.poll( au16data, 4 );
}

=> CRC-error:

2021-04-21 15:51:33.538 [WARN ] [rt.modbus.internal.ModbusManagerImpl] - Try 1 out of 5 failed when executing request (ModbusReadRequestBlueprint [slaveId=7, functionCode=READ_MULTIPLE_REGISTERS, start=0, length=4, maxTries=5]). Will try again soon. Error was I/O error, so reseting the connection. Error details: net.wimpi.modbus.ModbusIOException I/O exception: 
IOException CRC Error in received frame: 9 bytes: 
07 03 08 00 01 00 02 00 03  
[operation ID ef235475-fe3d-44f0-bce3-8d6e6eb2ab10]

And the things in OH3:

Bridge modbus:serial:Arduino1 [port="/dev/ttyUSBarduino",baud=9600,id=7, dataBits=8,parity="none",stopBits="1.0",encoding="rtu",echo=false,receiveTimeoutMillis=5000, connectMaxTries=3] {
    Bridge poller ArdWC01 [ start=0, length=4, refresh=16011, maxTries=5, type="holding"] {
        Thing data n0 [ readStart="0", readValueType="int16" ] //pin4   lilla_rød 
        Thing data n1 [ readStart="1", readValueType="int16" ] //pin4   lilla_rød 
        Thing data n2 [ readStart="2", readValueType="int16" ] //pin4   lilla_rød 
        Thing data n3 [ readStart="3", readValueType="int16" ] //pin4   lilla_rød 
    }
}

Will see if i get time to look more into it tomorrow.
(The code inside library and bindings is way beyond my head though)

@pete3 please repeat with the suggested openhab logging. Then we can see what crc bytes are Sent over the wire.

I linked the java library code so you see how it will look like in the logs.

It should help us to establish whether the issue is in the java modbus library or in C arduino modbus library :sweat_smile:

with { 1, 2, 3, -10004 }:

2021-04-21 19:27:04.908 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 07 03 00 00 00 04 44 6f 
2021-04-21 19:27:09.975 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - awaited 10 bytes, but received 8
2021-04-21 19:27:09.979 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response: 07 03 08 00 01 00 02 00 03 d8 d2 
2021-04-21 19:27:09.981 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 07 03 00 00 00 04 44 6f
2021-04-21 19:27:09.983 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: CRC Error in received frame: 9 bytes: 07 03 08 00 01 00 02 00 03

with { 1, 2, 3, -10024 }:

2021-04-21 19:32:27.110 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 07 03 00 00 00 04 44 6f 
2021-04-21 19:32:27.131 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response: 07 03 08 00 01 00 02 00 03 d8 d8 48 05

seems like bug is on Arduino side?

Actually, you are not receiving all the bytes expected with the problematic data set.

  • with { 1, 2, 3, -10004 }:

logs say awaited 10 bytes, but received 8 which means that 2 bytes were not received. This also probably explains why it took several seconds between Sent: ... and awaited ... log lines – openHAB side finally gave up waiting for the bytes.

This also shows the logging is quite bad, it should just bail out more violently when it does not receive all the bytes (EDIT: made correction for this so it’s more clear for the user Serial/RTU to error immediately when all bytes are not received and more explicit logging of invalid CRC bytes by ssalonen · Pull Request #15 · openhab/jamod · GitHub)

  • with { 1, 2, 3, -10024 }:

all bytes are received very fast (20ms) and CRC 48 05 matches up with the payload.


I’m quite stunned. And you used the simple arduino program to test all this? Does the CRC error and awaited ... log message happen each and every request/response, or only randomly?

1 Like

-10004 should show up as d8ec, not d8d2. If you are confident about expecting -10004, it looks like it has crapped out part way through that byte, not just forgetting CRC or getting it wrong.

1 Like

Well … that must be the long setting of my thing: “receiveTimeoutMillis=5000”

Yes, these are the responses, to the small code on post24

The behavior is 100% consistent (though i only check for about 1 min).

  • Either only CRCerror. (-10004)
  • or flawless readings (-10024)

so … bad Arduino-library?

1 Like

Try a different poll; same suspect register, length=1 or something.

I stand by “RS485 does weird stuff” so it is worth a try to see if thesame bit pattern gets through in a different packet structure. It’s really unlikely a genuine wire problem would be this consistent though.

1 Like

yup…

We would be expecting (useful tool1 tool2)

Expected:    07 03 08 00 01 00 02 00 03 d8 EC 49 D2
Actual:      07 03 08 00 01 00 02 00 03 d8 d2   (note the wrong length)

where D8 EC corresponds to register -10004 and 49 D2 is the correct CRC with the correct payload

Aha - it is possible then that two bytes ec49 were dropped along the way

Reading only 1:
-10004:

2021-04-21 21:53:48.042 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 07 03 00 03 00 01 74 6c 
2021-04-21 21:53:50.071 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - awaited 4 bytes, but received 2
2021-04-21 21:53:50.073 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response: 07 03 02 d8 c9 
2021-04-21 21:53:50.074 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Last request: 07 03 00 03 00 01 74 6c
2021-04-21 21:53:50.075 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - failed to read: CRC Error in received frame: 3 bytes: 07 03 02

-10024:

2021-04-21 21:49:50.444 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Sent: 07 03 00 03 00 01 74 6c 
2021-04-21 21:49:50.459 [DEBUG] [t.wimpi.modbus.io.ModbusRTUTransport] - Response: 07 03 02 d8 d8 6a 1e

Hmm… we could try couple of things to diagnose further

  1. Shut down openHAB and query the data using modpoll. If it works there, the likely suspect is openHAB
  2. Create ‘sniffing’ virtual serial port using socat and point openHAB to use that one. All in/out traffic would be recorded. This would give further indication if the root issue is in openHAB

Got an old laptop you can rig as a serial monitor and see what is really on the wire? Going to save a lot of finger pointing by direct observation.

Be sure to use a decent USB-485 converter (i.e. FT232RQ). I had a lot of trouble in the past with cheap chinese converters.

I tried 1., and it worked.
(Sounds like OH-problem?)

modpoll -b 9600 -a 7 -p none -c 4 -t 3 /dev/ttyUSBarduino

Protocol configuration: Modbus RTU, FC4
Slave configuration...: address = 7, start reference = 1, count = 4
Communication.........: /dev/ttyUSBarduino, 9600, 8, 1, none, t/o 1.00 s, poll rate 1000 ms
Data type.............: 16-bit register, input register table

-- Polling slave... (Ctrl-C to stop)
[1]: 1
[2]: 2
[3]: 3
[4]: -10024

Protocol configuration: Modbus RTU, FC4
Slave configuration...: address = 7, start reference = 1, count = 4
Communication.........: /dev/ttyUSBarduino, 9600, 8, 1, none, t/o 1.00 s, poll rate 1000 ms
Data type.............: 16-bit register, input register table

-- Polling slave... (Ctrl-C to stop)
[1]: 1
[2]: 2
[3]: 3
[4]: -10004
-- Polling slave... (Ctrl-C to stop)
[1]: 1
[2]: 2
[3]: 3
[4]: -10004

I also did an easy test, apart from linux/OH3. Attached to windows USB using cp2102 interface.
Software: modbusMAT1.1 (ataytugal.com)

-10024
[07][03][00][00][00][04][44][6F]
[07][03][08][00][01][00][02][00][03][D8][D8][48][05]

-10004
[07][03][00][00][00][04][44][6F]
[07][03][08][00][01][00][02][00][03][D8][EC][49][D2]

so windows reads as expected from arduino

1 Like

Modbus-RTU has strict rules about maximum gaps between characters. Any implementation may be sloppy about implementing those, for send or receive.
Again, detailed analysis at the line level is necessary to finger point about that sort of thing.

It’s not unreasonable to ask for openHAB to be more forgiving, but we still don’t know what needs forgiving yet.

Any suggestions @rossko57 for troubleshooting?

Darn serial, we always seem to go very low-level with this one :slight_smile: Line level understanding would be probably the best insight into the topic. Is that what you meant with “old laptop as serial monitor”?

Based on javacomm serial library documentation (which is actually not used by OH but I guess the API is similar enough), we have “Note, however, that framing errors may cause the Timeout and Threshold values to complete prematurely without raising an exception.” – This is happening when we get “awaited 4 bytes, but received 2” log message. Does this give more hints?

I guess the socat suggestion might reveal something but then again it might hide the issues as socat would sit “between” the real serial port and openHAB.

I could try to make a special version of binding which tries to read bytes again if we timeout and not get all bytes at first time. It seems unlikely this would fix the issue though…

I don’t have a good enough understanding of what goes on between serial device and binding.
Something (the USB adaptor that I assume is in use) gets a byte at a time, something decides the bytes have stopped coming so “here’s a packet” and in our case it’s short.

A framing error pretty much means the stop bit was missed at the expected time, very low level error.
Poor bit timing by the transmitter or receiver can cause framing errors, and that could be sensitive to last data bit sent. That’d be the most significant bit, used as sign bit in signed integers…

I would expect bytes to be discarded at low level, but maybe not. That would leave the library to decide to pass along garbage or not, if it bothers checking for the framing error flag.

Can we tell if there really is a framing error at the hardware or do intervening libraries hide that?

Maybe @pete3 is able to try some read polls that include a larger proportion of bytes over hex80 value?