OH 3.2.0.M3 / ZigBee binding (Ember M35x Coordinator) / RFC2217 / hacked Silvercrest ZigBee gateway (EFR32MG1B-256k @ EmberZNet 6.7.8.0)

System infomation:

Both devices are connected by wired Ethernet.

Please see GitHub - Ordspilleren/lidl-gateway-freedom: Freeing the Silvercrest (Lidl/Tuya) Smart Home Gateway from the cloud. for more information on the hack.

RFC2217 didn’t work at all on OH 3.1.0, so I upgraded to OH 3.2.0.M3. Now ist works - kind of …

Symptoms are:

Obviously, the ZigBee binding is able to communicate with the serial gateway running on the Silvercrest Zigbee gateway - otherwise the Coordinator wouldn’t know that the EFR32 is @ EmberZNet 6.7.8.0:

The coordinator connects to the serial port,

16:07:20.480 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:coordinator_ember:9afb8fb0b3' changed from UNKNOWN to ONLINE

but the connection fails a few seconds later:

16:07:27.920 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Serial port 'rfc2217://192.168.1.167:8888' closed.

16:07:01.974 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:coordinator_ember:9afb8fb0b3' changed from UNINITIALIZED to INITIALIZING
16:07:02.095 [DEBUG] [gbee.handler.ZigBeeCoordinatorHandler] - Initializing ZigBee network [zigbee:coordinator_ember:9afb8fb0b3].
16:07:02.106 [DEBUG] [gbee.handler.ZigBeeCoordinatorHandler] - Config: zigbee_initialise found, initializeNetwork=false
16:07:02.156 [DEBUG] [ing.zigbee.ember.handler.EmberHandler] - Initializing ZigBee Ember serial bridge handler.
16:07:03.167 [DEBUG] [ing.zigbee.ember.handler.EmberHandler] - ZigBee Ember Coordinator opening Port:'rfc2217://192.168.1.167:8888' PAN:c65d, EPAN:AF5FABA8491616DB, Channel:11
16:07:03.169 [DEBUG] [ing.zigbee.ember.handler.EmberHandler] - Ember end device poll timeout set to (169 * 2^9) = 86528 seconds
16:07:03.199 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:coordinator_ember:9afb8fb0b3' changed from INITIALIZING to UNKNOWN
16:07:03.199 [INFO ] [openhab.event.FirmwareStatusInfoEvent] - Firmware status of thing zigbee:coordinator_ember:9afb8fb0b3 changed to UNKNOWN.
16:07:04.314 [DEBUG] [nding.zigbee.internal.ZigBeeDataStore] - Creating ZigBee persistence folder C:\Users\JK\DOWNLO~1\OPENHA~1.0\userdata\zigbee\zigbee_coordinator_ember_9afb8fb0b3\
16:07:04.395 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Connecting to serial port [rfc2217://192.168.1.167:8888] at 115200 baud, flow control FLOWCONTROL_OUT_RTSCTS.
16:07:04.466 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Enabling receive timeout is unsupported on [rfc2217://192.168.1.167:8888]
16:07:04.466 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Serial port [rfc2217://192.168.1.167:8888] is initialized.
16:07:15.825 [DEBUG] [gbee.handler.ZigBeeCoordinatorHandler] - Config: zigbee_trustcentremode=TC_JOIN_SECURE
16:07:20.480 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:coordinator_ember:9afb8fb0b3' changed from UNKNOWN to ONLINE
16:07:20.791 [DEBUG] [nding.zigbee.internal.ZigBeeDataStore] - 5C0272FFFE4BC7B4: ZigBee saving network state complete.
16:07:21.098 [DEBUG] [nding.zigbee.internal.ZigBeeDataStore] - 5C0272FFFE4BC7B4: ZigBee saving network state complete.
16:07:26.885 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:coordinator_ember:9afb8fb0b3' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR)
16:07:27.920 [DEBUG] [nding.zigbee.internal.ZigBeeDataStore] - 5C0272FFFE4BC7B4: ZigBee saving network state complete.
16:07:27.920 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Serial port 'rfc2217://192.168.1.167:8888' closed.
16:07:27.935 [DEBUG] [ing.zigbee.ember.handler.EmberHandler] - Initializing ZigBee Ember serial bridge handler.
16:07:27.935 [DEBUG] [ing.zigbee.ember.handler.EmberHandler] - ZigBee Ember Coordinator opening Port:'rfc2217://192.168.1.167:8888' PAN:c65d, EPAN:AF5FABA8491616DB, Channel:11
16:07:27.935 [DEBUG] [ing.zigbee.ember.handler.EmberHandler] - Ember end device poll timeout set to (169 * 2^9) = 86528 seconds
16:07:27.935 [INFO ] [hab.event.ThingStatusInfoChangedEvent] - Thing 'zigbee:coordinator_ember:9afb8fb0b3' changed from OFFLINE (COMMUNICATION_ERROR) to UNKNOWN
16:07:28.971 [DEBUG] [nding.zigbee.internal.ZigBeeDataStore] - 5C0272FFFE4BC7B4: ZigBee reading network state complete.
16:07:28.971 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Connecting to serial port [rfc2217://192.168.1.167:8888] at 115200 baud, flow control FLOWCONTROL_OUT_RTSCTS.
16:07:28.986 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Enabling receive timeout is unsupported on [rfc2217://192.168.1.167:8888]
16:07:28.986 [DEBUG] [inding.zigbee.serial.ZigBeeSerialPort] - Serial port [rfc2217://192.168.1.167:8888] is initialized.

@chris
What can I do/can be done to find the root cause?

Sorry - I’m really not familiar at all with RFC2217 and I have no way to test this myself at the moment. You can try enabling full debug logging (take a look at the docs) and see what that shows - presumably it’s going to show that communications stopped working, and then I’m not sure where to go next.

@chris
Thanks for your quick reply.

Just to be sure that my hack didn’t brick my Silvercrest gateway I tried to use it with Home Assistant: HA has no problems to communicate with my Silvercrest gateway (via ‘socket://192.168.1.167:8888’) and inclusion of ZigBee devices works flawlessly.

If there is any interest in getting RFC2217 to work in OH/ZigBee binding, I’m open to suggestions what to do to find the root cause. I’ll try to get a full debug log.

I’m happy to have it added - so long as it’s not written in a way that introduces a load of code specifically for this that can otherwise cause problems to the standard serial port. IMHO, RFC2217 support should largely be implemented outside the bindings so that it is generally available to all serial bindings and I’d like to avoid too many hacks in the binding to make it work.

I don’t have any way to test this though so it would need someone to look at it.

You might want to look at using the latest snapshot as there was an RFC2217 “Fix” added a month or so ago. I don’t know if it works or not though.

@chris

Full ZigBee debug log (for OH 3.2.0.M3):

rfc_problem.log (190.2 KB)

Of note:

A.

2021-11-02 02:06:29.670 [DEBUG] [nding.zigbee.serial.ZigBeeSerialPort] - Connecting to serial port [rfc2217://192.168.1.167:8888] at 57600 baud, flow control FLOWCONTROL_OUT_XONOFF.
2021-11-02 02:06:29.672 [DEBUG] [bee.handler.ZigBeeCoordinatorHandler] - null: networkStateUpdated called with state=INITIALISING
2021-11-02 02:06:29.707 [DEBUG] [nding.zigbee.serial.ZigBeeSerialPort] - Enabling receive timeout is unsupported on [rfc2217://192.168.1.167:8888]
2021-11-02 02:06:29.710 [DEBUG] [nding.zigbee.serial.ZigBeeSerialPort] - Serial port [rfc2217://192.168.1.167:8888] is initialized.
2021-11-02 02:06:29.857 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Connect
2021-11-02 02:06:29.859 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Reconnect
2021-11-02 02:06:29.863 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - AshReceiveParserThread thread started
2021-11-02 02:06:29.876 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameRst []
2021-11-02 02:06:29.882 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH TX EZSP: EzspVersionRequest [networkId=0, desiredProtocolVersion=4]
2021-11-02 02:06:29.886 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: TX EZSP queue size: 1
2021-11-02 02:06:30.969 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - <-- RX ASH frame: AshFrameRstAck [version=2, resetCode=11, Reset: Software]
2021-11-02 02:06:32.224 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Connected

B.

2021-11-02 02:06:34.699 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=15 00 01 52 00 1D]
2021-11-02 02:06:34.707 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - <-- RX ASH error: BAD PACKET 57 A1 A9 06 2A 15 B2 59 61 4D 57 A1 A9 06 2A 15 B2 59

C.

2021-11-02 02:06:46.661 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspGetNetworkParametersResponse [networkId=0, status=EMBER_NOT_JOINED, nodeType=EMBER_UNKNOWN_DEVICE, parameters=EmberNetworkParameters [extendedPanId=AF5FABA8491616DB, panId=FFFF, radioTxPower=13, radioChannel=11, joinMethod=EMBER_USE_MAC_ASSOCIATION, nwkManagerId=0000, nwkUpdateId=0, channels=07FFF800]]
2021-11-02 02:06:46.661 [DEBUG] [.internal.EmberNetworkInitialisation] - EzspGetNetworkParametersResponse [networkId=0, status=EMBER_NOT_JOINED, nodeType=EMBER_UNKNOWN_DEVICE, parameters=EmberNetworkParameters [extendedPanId=AF5FABA8491616DB, panId=FFFF, radioTxPower=13, radioChannel=11, joinMethod=EMBER_USE_MAC_ASSOCIATION, nwkManagerId=0000, nwkUpdateId=0, channels=07FFF800]]
2021-11-02 02:06:46.662 [DEBUG] [.internal.EmberNetworkInitialisation] - Error during retrieval of network parameters: EzspGetNetworkParametersResponse [networkId=0, status=EMBER_NOT_JOINED, nodeType=EMBER_UNKNOWN_DEVICE, parameters=EmberNetworkParameters [extendedPanId=AF5FABA8491616DB, panId=FFFF, radioTxPower=13, radioChannel=11, joinMethod=EMBER_USE_MAC_ASSOCIATION, nwkManagerId=0000, nwkUpdateId=0, channels=07FFF800]]

So I guess from a binding perspective, the RFC2217 is working “fine”. Ie it is communicating via the IP link to the remote port - the problem seems to be somewhere in the transport and data is getting missed / corrupted. The protocol should recover from a corrupted frame or two, although the NCP can get upset about this, and if problems persist it can throw an exception that requires a complete reinitialisation.

Again, the binding should handle this by closing down the serial port and restarting everything. This all works fine with a direct serial port, and I would assume should also work with an IP connection, but I don’t really know for sure how the remote end would handle this to allow a reset to occur.

@chris
Now running OH 3.2.0 #2542

For what it’s worth a new debug log (rename to *.zip):

3.2.0_2542.zip.txt (82.0 KB)

I would tend to agree that we are seeing communication problems. I give up. :slight_smile:

I’ll take a look at the log shortly, but is this using TCP or UDP? If it’s TCP, then we shouldn’t see an issue with the network link - it’s “guaranteed” that what goes in at one end comes out in the same order at the other end. If it’s UDP, then it’s another story, but I guess it’s not using UDP?

So, assuming TCP, I would be surprised to see corrupted frames - due to network problems. I would therefore propose that the issue could be that the computer with the dongle may not be keeping up with receiving the data and sending it down the network connection, and possibly data is dropped at the serial connection?

Most communications is “governed” - ie it’s a poll/response system, so the binding doesn’t send more than one ASH frame (ie the serial protocol) at a time while waiting for a response. This has a maximum size in the order of 120 bytes (mostly a lot less) so it’s really a bit strange that the hub shouldn’t be able to keep up.

So… Firstly, the initial problem is again something in the communications with the remote over the IP link, but it is a little strange…

Here’s the initial failure -:

2021-11-02 20:41:02.266 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH took EZSP frame from receive queue. Queue length 0
2021-11-02 20:41:02.266 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspSetConfigurationValueResponse [networkId=0, status=EZSP_SUCCESS]
2021-11-02 20:41:02.266 [DEBUG] [systems.zigbee.dongle.ember.EmberNcp] - EzspSetConfigurationValueResponse [networkId=0, status=EZSP_SUCCESS]
2021-11-02 20:41:02.266 [DEBUG] [systems.zigbee.dongle.ember.EmberNcp] - EzspSetConfigurationValueRequest [networkId=0, configId=EZSP_CONFIG_MAX_END_DEVICE_CHILDREN, value=16]
2021-11-02 20:41:02.266 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH TX EZSP: EzspSetConfigurationValueRequest [networkId=0, configId=EZSP_CONFIG_MAX_END_DEVICE_CHILDREN, value=16]
2021-11-02 20:41:02.266 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: TX EZSP queue size: 1
2021-11-02 20:41:02.266 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=false, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:02.297 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - <-- RX ASH frame: AshFrameNak [ackNum=7, notRdy=false]
2021-11-02 20:41:02.297 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-02 20:41:02.297 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=true, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:02.714 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-02 20:41:02.714 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=true, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:03.520 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-02 20:41:03.520 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=true, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:03.921 [ERROR] [internal.handler.RFXComBridgeHandler] - Connection to RFXCOM transceiver failed
org.openhab.binding.rfxcom.internal.exceptions.RFXComException: No serial port COM8
	at org.openhab.binding.rfxcom.internal.connector.RFXComSerialConnector.connect(RFXComSerialConnector.java:61) ~[?:?]
	at org.openhab.binding.rfxcom.internal.handler.RFXComBridgeHandler.connect(RFXComBridgeHandler.java:211) ~[?:?]
	at org.openhab.binding.rfxcom.internal.handler.RFXComBridgeHandler.lambda$0(RFXComBridgeHandler.java:184) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
2021-11-02 20:41:05.141 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-02 20:41:05.141 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=true, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:08.347 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-02 20:41:08.347 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=true, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:11.554 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-02 20:41:11.554 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=7, ackNum=7, reTx=true, data=BD 00 01 53 00 11 10 00]
2021-11-02 20:41:12.271 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - Sending EZSP transaction timed out after 10 seconds
2021-11-02 20:41:12.271 [DEBUG] [systems.zigbee.dongle.ember.EmberNcp] - EzspSetConfigurationValueRequest [networkId=0, configId=EZSP_CONFIG_APS_UNICAST_MESSAGE_COUNT, value=10]
2021-11-02 20:41:12.271 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH TX EZSP: EzspSetConfigurationValueRequest [networkId=0, configId=EZSP_CONFIG_APS_UNICAST_MESSAGE_COUNT, value=10]
2021-11-02 20:41:12.271 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: TX EZSP queue size: 1
2021-11-02 20:41:14.758 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Error number of retries exceeded [6].
2021-11-02 20:41:14.758 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Disconnected!
2021-11-02 20:41:14.758 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - AshFrameHandler close.

Note that the exception here is NOT from the Zigbee binding - it’s from the RFXComm binding. I’m going to guess that this also uses RFC2217 (??) and in this case it seems there might be a systematic failure somewhere as clearly the zigbee communications brokedown at the same time as the RFXComm link did.

Note that the log also contains a similar event around 13 minutes earlier, but it’s the same exception. In both cases the system is running fine for about 1.5 minutes (very roughly!), and in both cases it seems to fail during the configuration (but not at the same point each time). The configuration is a busy period, but as described above, each frame is effectively throttled as it always wait for the ACK from the previous frame before moving on to the next frame in the queue - this is how the ASH protocol works, and if there’s a timeout, or a corruption, then frames are retried.

In the log above we see exactly this happening - we see that the frame is retried a few times, then there’s the exception, then the retry continues for another few seconds before the timeout. The binding then goes OFFLINE, and it should try to connect to the serial port again.

Here it’s a bit strange - or stranger - as the system seems to shut down at this time-:

2021-11-02 20:41:16.979 [INFO ] [ab.ui.habpanel.internal.HABPanelTile] - Stopped HABPanel
2021-11-02 20:41:17.186 [INFO ] [basic.internal.servlet.WebAppServlet] - Stopped Basic UI
2021-11-02 20:41:19.218 [INFO ] [org.openhab.ui.internal.UIService   ] - Stopped UI

I don’t really understand why all this UI stuff shuts down - there should be no link, and yet it’s at almost exactly the same time. It might be unrelated, but it’s a bit suspicious!

So… I don’t think there’s a problem directly with the Zigbee binding, but possibly a wider issue that’s impacting the network, or RFC2217 specifically, and this is causing the Zigbee binding, as well as the RFXCom binding, to loose the “serial” connection.

@chris

Please ignore all stuff related to the RFXCom binding. The RFXCom binding isn’t configured to use RFC2217. It uses a physical serial port, but I’ve disconnected the RFXCom device from the computer.

RFC 2217 is a telnet extension and telnet is based on TCP.

Well - I can’t ignore what I’ve already seen :slight_smile: .

I don’t think that you should ignore the RFXCom issue in the context of the Zigbee issue as they are occurring at “exactly” the same time (within 1.5 seconds). Maybe it’s unrelated, and I always like to keep an open mind as one exception does not a trend make - it might just be a coincidence…

However, as mentioned above, the exception here is not the only occurrence in the log -:

2021-11-02 20:28:53.830 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Error number of retries exceeded [6].
2021-11-02 20:28:53.830 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Disconnected!
...
2021-11-02 20:28:56.937 [ERROR] [internal.handler.RFXComBridgeHandler] - Connection to RFXCOM transceiver failed
org.openhab.binding.rfxcom.internal.exceptions.RFXComException: No serial port COM8
	at org.openhab.binding.rfxcom.internal.connector.RFXComSerialConnector.connect(RFXComSerialConnector.java:61) ~[?:?]
	at org.openhab.binding.rfxcom.internal.handler.RFXComBridgeHandler.connect(RFXComBridgeHandler.java:211) ~[?:?]
	at org.openhab.binding.rfxcom.internal.handler.RFXComBridgeHandler.lambda$0(RFXComBridgeHandler.java:184) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]

So here we have another exception from the RFXCom binding, and again within 3 seconds or so of the issue with the Zigbee comms problem. Now… Maybe it’s unrelated, and I always like to keep an open mind as one exception does not a trend make - it might just be a coincidence… Two exceptions - well - still quite possibly a coincidence, but I’m a little more suspicious they could be related!

The RFC2217 system, I believe, is handled in the nrjavaserial driver - so probably the RFXCom is also using this. We know there are problems with this driver, although I have not seen a link between the serial and virtual ports. So, it does seem to me that there is a common element here - I’m not sure that it helps understand the issue at this point, but I don’t think you should ignore the possible link.

OK, AFAICT the RFXCom binding is rock solid, and you’re right, there might be a correlation. I’ll give the RFXCom device back to the RFXCom binding and provide a new logfile.

Steps planned:

  1. Disable all bindings connected to local physical serial ports (Z-Wave and RFXCom).
  2. Restart OH
  3. If 1. doesn’t persist: Disable all bindings connected to physical serial ports (Z-Wave and RFXCom).
  4. Create an Ember-Concentrator and configure it to use RFC 2217.

Anything else I could do to get meaningful log entries?

Well, it’s crashing from what I can see in the log, so it’s apparently got the “same” issue (at top level - ie serial port failure) as the zigbee binding :wink:

I think this is fine, but we still should not exclude the fact that there might be a common element. Still, let’s go with this as trying different things will potentially help understand what’s happening (or maybe just confuse the situation more :slight_smile: ).

All other bindings disabled,

1st_2nd_try.zip.txt (79.4 KB)
3rd_4th_try.zip.txt (55.9 KB)

Timeline:
GUI: Add Ember Coordinator
00:20 1st try: OH — WiFi — Hub — wired — ZigBee gateway
GUI: Delete Ember Coordinator
GUI: Add Ember Coordinator
00:26 2nd try: OH — WiFi — Hub — wired — ZigBee gateway
GUI: Delete Ember Coordinator
GUI: Add Ember Coordinator
00:40 3rd try: OH — wired — Hub — wired — ZigBee gateway
GUI: Delete Ember Coordinator
GUI: Add Ember Coordinator
00:45 4th try: OH — wired — Hub — wired — ZigBee gateway

Skimming through the files, I see:

  • RX ASH error: BAD PACKET
  • status=EMBER_NOT_JOINED
  • Error creating instance of EzspFrame / Caused by: java.lang.ArrayIndexOutOfBoundsException: Index 6 out of bounds for length 6

From a quickish look, I don’t really see any problem - obviously other than the fact that it stops working :wink:

Yes, there are a number of these and the binding seems to recover from them - as it should. The packets get retried, and life seems to go on. This sort of thing does happen on a serial / USB link as well, but it’s reasonably rare. I would suggest that this is indicative of lost data on the remote end.

That’s “OK” - the system has not got that far - it keeps falling over during the coordinator initialisation.

In themselves, these are not the cause of a problem - the frames just get dropped, but communications continues.

Ultimately, this always results in the NCP ceasing to communicate -:

2021-11-03 00:28:54.921 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH TX EZSP: EzspSetConfigurationValueRequest [networkId=0, configId=EZSP_CONFIG_SOURCE_ROUTE_TABLE_SIZE, value=16]
2021-11-03 00:28:54.921 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: TX EZSP queue size: 1
2021-11-03 00:28:54.921 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=false, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:28:54.961 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - <-- RX ASH frame: AshFrameNak [ackNum=4, notRdy=false]
2021-11-03 00:28:54.961 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-03 00:28:54.961 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:28:55.396 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-03 00:28:55.396 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:28:56.244 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-03 00:28:56.244 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:28:57.920 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-03 00:28:57.920 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:29:01.126 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-03 00:29:01.126 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:29:04.343 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - ASH: Retry Sent Queue Length 1
2021-11-03 00:29:04.343 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - --> TX ASH frame: AshFrameData [frmNum=4, ackNum=4, reTx=true, data=BD 00 01 53 00 1A 10 00]
2021-11-03 00:29:04.933 [DEBUG] [e.ember.internal.ash.AshFrameHandler] - Sending EZSP transaction timed out after 10 seconds

The binding tries to resend the data, but ultimately gives up.

Do you have Wireshark? It might be interesting to see what is happening on your TCP connection. IF you want to post Wireshark logs I can take a quick look at what happens when this all falls over.

There are two malformed packets from the ZigBee gateway:

I would have to correlate this to the OH log … - I will try tomorrow.

I would like to say that this shouldn’t matter. These are invalid TCP packets, so TCP should take care of this (ie by resending corrupted frames) such that what pops out in the binding is correct.

However, it will be interesting to see if the timing correlates with problems in the binding…

@chris

Serial gateway running on the ZigBee gateway is:

1st try: serial gateway without any command line option, i.e. using hardware flow control
2nd try: serial gateway with option -f, i.e. software flow control

OH logs and Wireshark packet captures:
sniffed.zip.txt (60.7 KB)

I don’t think that flow control type set in the binding will make any difference here since I don’t believe that this is passed from the client to the server. As I understand the server is started separately, and flow control might make a difference there.

From the binding log it looks pretty much the same in both logs. What is interesting, and I’d not spotted earlier, is that there is an “Error 81” in some of the logs (I’ve not checked if this is always the case). This error actually comes from the Silabs chipset and it means that the device has gotten out of sync with data, and it has asserted. The only recovery from this is to reset the device and start again. I don’t really know what would cause this here as I’m not familiar with the server application - I could guess that the ASH protocol might be getting out of sync, although delays through the network are pretty minimal compared to the relatively slow speed on the serial port.

Looking at the Wireshark log I don’t really see any issue as such - it just stops passing data which is consistent with the fact that the NCP is asserting.

@chris

And now for something completely different: :slight_smile:

Sonoff ZB Bridge (EFR32MG21@EmberZNet 6.7.9.0) running Tasmota 10.0.0 via RFC 2217

OH log:
Tasmota_Sonoff_ZB_Bridge_OH.txt (109.4 KB)

Tasmota console:
Tasmota_Console.txt (34.8 KB)

Looks familiar …

It works flawlessly with Home Assistant Operating System.