Screensot 6 - “extra poll” we might call that maybe.
How many poller things have you got at this time, just to be sure? The picture suggests only one, of course.
The double poll looks like things are working as they should, from openHAB viewpoint.
The first poll of the pair is measurably on same time schedule, like the preceding polls.
Assuming it’s a poll for the same data, the response does indeed look either short or has started very early.
Assuming also that response is truncated at one end or the other, it’ll give a CRC error, as evidenced in your log I think.
openHAB will not run the timeout - it’s got a response of some kind - so it will retry quickly, at soonest after the minimum gap configured. That would be the second poll of the doubled pair.
We can further assume the response was good this time around, because openHAB resumes its regular schedule i.e. no further retries.
I don’t believe ‘slave started responding early’ because it can’t respond until it’s finished receiving and CRC checking and decoding the poll query.
Also evident in this picture is variable time between query-response, no biggy of itself, slaves are allowed to be about other business and answer when convenient.
Screenshot 7 - “short response”
It was rubbish I gave before about slave ‘nack’ response to detecting a CRC. It is supposed to remain silent for that (after all if the message is corrupt, the slave cannot be sure it was the addressed target).
It can give a brief 5-byte exception response for “too busy” or “invalid command” type responses.
The question here then is what openHAB logged as going on, either CRC or exception, but I’m guessing you have seen no exceptions.
OH has definitely seen something
bad, because we soon retry. That retry is successful and normal business resumed.
Screenshot 9 - detailed short response
hmm, is that possibly a correctly constructed 5-byte exception? Maybe, but looks too long to me.
What really catches my attention is the very beginning of that response. A nasty little spike, not a full swing. We cannot tell if that is too narrow timewise to register properly, or if it really didn’t swing the full voltage.
Screenshot 10 - a longer but still short response.
Definitely not an exception response. I think we can take that one as good solid evidence of truncation, because you know it should be consistent length with the good responses.
Screenshot 11 - detail of that truncated response.
Ooh look there’s a nasty little spike at the end of this truncated response, like it’s been chopped off midway through a bit.
Now, looking at those detail pictures again. I’m not sure of polarity here, but I think the query transmission always starts and ends with a “high”? Have a look and check.
I’d expect that, it’d usually enable the TX at “stop” level, a brief pause (maybe just one bit time) and then begin the real serial data with a “start” bit. Of course it’d always end the same way, with a “stop” level. That might also extend a time before the TX is disabled.
The point here is that begin/end levels will always be same for that device.
Conceivably some devices could open with a “start”, but again it would always be the same begin/end levels.
I’m expecting that comparing good with bad slave responses, you will find that first and last levels are sometimes inconsistent.
You should also look to see if I am overreacting to the begin/end spikes, and you see anything like that in good responses.
I’ll make a wild leap - the slave RS485 transmitter chip enable is wonky, turning on or off part way through the data stream. Might be sinister causes like software control glitches or power supply spikes, but I would bet it’s just chip failure.
Weighing against that, why doesn’t it stop and restart part way through a data package? I wouldn’t be that surprised if it simply doesn’t work that way, but you might find eventually evidence of this if you look.
Why don’t I think it’s the master dongle TX randomly turning on/off out of turn? Because if it did it would generate high or low. Because we’ve got no bias (which I’ve now accepted ) we would see that as different from the idling no-volts condition that we can see.
Our little poll queries don’t offer much opportunity to get corrupted if that did happen, but surely we would once in a while. The result would be no response from slave, and a timeout log in openHAB, which we have not seen so far.
There is one more test to do before condemning it though - in the really unlikely event the slave takes around 500mS to respond to a poll, it could transmit a response just in time to cross paths with the next poll query, sometimes working, sometimes trampling. I don’t believe that for a second !
But let’s rule that out and change your polling period to 1500 or so and look to see if anything changes.