Ember Zigbee binding never populates routes or neigbors

Hello,

I’m using openHAB 3.2.0 and with the zigbee binding communicating with a Bitronvideo BV 2010/10 usb stick running firmware 6.7.8.0.
Everything is working, the door contacts are updating properly, I can control the remote sockets as well just fine.
Out of curiosity, I wanted to graph the network and I used the script detailed here: ZigBee network map when using only the ZigBee Binding - #21 by StefanH

But sadly, I only get the boxes, there are no links in between them. At first, I thought there was an issue in the script, but looking at the JSON it retrieves, I see that the zigbee_neighbors and zigbee_routes arrays are empty for all devices, even the coordinator.

Looking in the main UI, it’s the same, those properties are empty. I set the Mesh update period to 5 minutes and waited for two hours, but it did not change anything, despite actuating many of the sensors in the network.

Now, I agree that this is not a big issue because the zigbee network is responsive, but I’m curious as to what I have wrongly configured to prevent the routes and neighbors from updating.

Do you guys have any idea?

There is an option called “Mesh update period” - that sets how often this data is updated. Of course if your devices are battery devices (as I assume some are if they are door contacts) then they probably won’t get updated as these devices probably sleep and will not communicate with the binding. Depending on the exact devices you have, they also may not support the commands to read this data back - not all devices provide this function.

Thanks, I have set that option to 5 minutes and still no luck after a few hours of waiting.
Here is the graph as I generated it a few moments ago:

The “Ampoule” is a light bulb that I have joined to the network but that I have unplugged since then, so it’s expected for it to be offline. The “fenêtre” contact has not been activated for a while, so I believe it has gone to sleep as you indicated.

But I would have expected to have at least a link between the coordinator and the “Prise” (power socket) as both of those are mains powered.

From your answer, I sense that this is an expected situation anyway, and because it’s working, maybe I should leave it be.

It/s hard to comment much more as I’m not familiar with any of the devices so I don’t know if they support the commands required to retrieve this data. I would agree that in general what you’ve said is also as I would expect.

Note that you don’t need to hide the IEEE addresses - there is absolutely no reason to do this and it doesn’t improve security or anything if that’s what you think…

I see the same, routes and neigbors are missing in UI.
It has been like this for a while.
however in openhab-console it works:

openhab> zigbee neighbors 26490
Total neighbors: 14
Extended PAN ID       Extended Address      Network Address       Device Type           RX On When Idle  Relationship  Permit Joining  Depth         LQI         
AEF4E912E8CE2CD9      90FD9FFFFE647A13                     59630  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            62
AEF4E912E8CE2CD9      680AE2FFFE876B81                     44320  ROUTER                          RX_ON    SIBLING       UNKNOWN            15           107
AEF4E912E8CE2CD9      0017880103FE0276                     46392  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            59
AEF4E912E8CE2CD9      5C0272FFFE53C1E4                     47355  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            68
AEF4E912E8CE2CD9      D0CF5EFFFE44B835                     28231  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            84
AEF4E912E8CE2CD9      001788010633DBD1                     13530  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            73
AEF4E912E8CE2CD9      EC1BBDFFFE8EAE4B                     52352  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            73
AEF4E912E8CE2CD9      CCCCCCFFFEA4AFBC                     47281  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            88
AEF4E912E8CE2CD9      680AE2FFFEA4BF70                     24942  ROUTER                          RX_ON     PARENT       UNKNOWN            15           148
AEF4E912E8CE2CD9      EC1BBDFFFE003744                     21129  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            69
AEF4E912E8CE2CD9      001788010B5F73C4                     17742  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            72
AEF4E912E8CE2CD9      00178801062BF764                     30717  ROUTER                          RX_ON    SIBLING       UNKNOWN            15           101
AEF4E912E8CE2CD9      EC1BBDFFFE1806E0                     57304  ROUTER                          RX_ON    SIBLING       UNKNOWN            15            63
AEF4E912E8CE2CD9      847127FFFEC9A4E1                         0  COORDINATOR                     RX_ON    SIBLING       UNKNOWN             0            83

openhab> 

I do see a lot of these in the log

2022-04-06 17:09:01.597 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspSendUnicastResponse [networkId=0, status=EMBER_SUCCESS, sequence=82]
2022-04-06 17:09:01.817 [DEBUG] [s.zigbee.dongle.ember.ezsp.EzspFrame] - Error creating instance of EzspFrame
java.lang.reflect.InvocationTargetException: null
        at jdk.internal.reflect.GeneratedConstructorAccessor411.newInstance(Unknown Source) ~[?:?]
        at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:490) ~[?:?]
        at com.zsmartsystems.zigbee.dongle.ember.ezsp.EzspFrame.createHandler(EzspFrame.java:491) [bundleFile:?]
        at com.zsmartsystems.zigbee.dongle.ember.internal.ash.AshFrameHandler$AshReceiveParserThread.run(AshFrameHandler.java:309) [bundleFile:?]
Caused by: java.lang.ArrayIndexOutOfBoundsException
2022-04-06 17:09:01.818 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspIncomingRouteErrorHandler [networkId=0, status=EMBER_SOURCE_ROUTE_FAILURE, target=36C7]
2022-04-06 17:09:01.857 [DEBUG] [zigbee.dongle.ember.ZigBeeDongleEzsp] - RX EZSP: EzspMessageSentHandler [networkId=0, type=EMBER_OUTGOING_DIRECT, indexOrDestination=36C7, apsFrame=EmberApsFrame [profileId=0104, clusterId=0001, sourceEndpoint=1, destinationEndpoint=1, options=[EMBER_APS_OPTION_RETRY, EMBER_APS_OPTION_ENABLE_ROUTE_DISCOVERY, EMBER_APS_OPTION_ENABLE_ADDRESS_DISCOVERY], groupId=0, sequence=76], messageTag=75, status=EMBER_DELIVERY_FAILED, messageContents=]
2022-04-06 17:09:01.857 [DEBUG] [tsystems.zigbee.ZigBeeNetworkManager] - RX STA: msgTag=75 state=RX_NAK
2022-04-06 17:09:01.857 [DEBUG] [transaction.ZigBeeTransactionManager] - notifyTransactionProgress: TID=75, state=RX_NAK, outstanding=5
2022-04-06 17:09:01.857 [DEBUG] [transaction.ZigBeeTransactionManager] - Transaction complete: ZigBeeTransaction [ieeeAddress=BC33ACFFFE5D4C93 queueTime=52430, state=FAILED, sendCnt=1, command=DefaultResponse [Power Configuration: 0000/0 -> 36C7/1, cluster=0001, TID=75, commandIdentifier=1, statusCode=SUCCESS]]
2022-04-06 17:09:01.857 [DEBUG] [transaction.ZigBeeTransactionManager] - transactionListenerRemoved: 4 outstanding
2022-04-06 17:09:01.857 [DEBUG] [e.transaction.ZigBeeTransactionQueue] - BC33ACFFFE5D4C93: transactionComplete, state=FAILED, outstanding=0

I do not know if these are related, as these exceptions seems to have something to do about battery devices.

@chris Do you want a log, and witch timespan do you want?
Would be great if I somehow can PM or Mail the log to you.

My coordinator is the ITead Zigbee 3.0 USB dongle/stick/adapter based on Silicon Labs EFR32MG2.
I have sometimes seen this stick been reset because of no response, so I have blamed all things that are strange on it :slight_smile:

These are likely not related - this has to do with an EZSP packet that is incorrectly defined. I raised this with Silabs recently as there is an issue with the definitions in their documentation. All of these frames that have caused this are nothing that is used by the system.

I’d need to see the mesh update transactions to see what was (or wasn’t) happening there.

Ah, I did not think of checking in the console, and just like you, the neighbors are available there.

I noticed something very peculiar in the JSON returned by the /things REST API that is used by my script:

"properties": {
  "zigbee_logicaltype": "END_DEVICE",
  "zigbee_powerlevel": "MEDIUM",
  "zigbee_manufacturercode": "0x1141",
  "modelId": "TS0203",
  "zigbee_networkaddress": "28469",
  "zigbee_powersource": "RECHARGABLE_BATTERY",
  "zigbee_datecode": "",
  "zigbee_stkversion": "0",
  "zigbee_zclversion": "3",
  "zigbee_routes": "[]",
  "zigbee_lastupdate": "",
  "zigbee_stkcompliance": "21",
  "vendor": "_TZ3000_2mbfxlzr",
  "zigbee_powermode": "RECEIVER_ON_STIMULATED",
  "zigbee_powersources": "[RECHARGABLE_BATTERY]",
  "hardwareVersion": "1",
  "firmwareVersion": "0x00000045",
  "zigbee_neighbors": "[]",
  "zigbee_applicationVersion": "69",
  "zigbee_device_initialised": "true",
  "zigbee_devices": "[]"
},

All the properties values are strings when clearly some should directly be literals, like 1, 69, or even true.
What’s even more surprising are the array properties that have a value of "[]", that is a string containing an opening and a closing square bracket. But it should really be a direct array value, which would be [] (without the quotes) if the array was empty.

Looking at the very few other properties instances in the JSON that are not empty and not zigbee, they all exhibit the same behavior of being all strings when some could really be literals.

Could it be that a change was made to force the string type on all properties but overlooked the fact that some should be arrays?

Why should they be literals?

Properties is a String, String key/value pair, so Strings are expected aren’t they?

From a REST consumer point of view, it’s much nicer when items in the JSON have a proper type has it convenes semantic meaning. With all strings, it’s much harder to be confident about what’s inside, especially when it comes to number format (ie, decimal separator) or list representation (comma, semicolon, tab…)
That being said, from a developer point of view, I totally understand that it’s much easier if everything is of the same kind, strongly typed languages don’t like the notion of a thing that varies in type.

But that is not likely the culprit here anyway. I went looking in the source code for the Zigbee binding and found that THING_PROPERTY_ROUTES is updated in only three places, one in the coordinator and two in the node.
Now, the nodeRemoved method is setting its value to "[]" which is logical given the goal of the method, but I’m wondering if it could happen that nodeUpdated is called and then quickly followed by a nodeRemoved call that annihilates every work that was done previously.
Another possibility is that nodeUpdated is never called at all, but that would mean all properties would remain empty which is not the case here.

So the question remains: where is the console getting its values from, and why aren’t they available in the associated things?

Sure - and this is done for configuration, but not for properties. Properties are just a simple set of Strings, so it is fine,

It requests the information from the device.

Ok, so that gives a bit of credit to my theory that the nodeUpdated and/or nodeRemoved methods are somehow not called properly.
There a call to logger.debug inside nodeUpdated but I don’t see any in nodeRemoved
Or maybe there is log somewhere else that I could see.
Do you want me to do something in this area?

We can speculate on all sorts of things, but it’s just that - speculation. What’s needed is the logs during the mesh update as I asked for above.

Please ensure that logging is enabled as per the binding documentation.

Ok, here is a log extract that I believe is appropriate: openhab_extract.log (119.3 KB)

At 23:10, I typed those commands in the console:

openhab> zigbee neighbors 0
Total neighbors: 2
Extended PAN ID       Extended Address      Network Address       Device Type           RX On When Idle  Relationship  Permit Joining  Depth         LQI        
E5344F6F349B000F      50325FFFFE2CBEEF                     64711  ROUTER                          RX_ON    SIBLING       UNKNOWN            15           254
E5344F6F349B000F      A4C13816386EDEAD                     28469  END_DEVICE                     RX_OFF      CHILD      DISABLED             1           254

openhab> zigbee neighbors 64711
Total neighbors: 3
Extended PAN ID       Extended Address      Network Address       Device Type           RX On When Idle  Relationship  Permit Joining  Depth         LQI        
E5344F6F349B000F      A4C138968575EF00                     24021  END_DEVICE                     RX_OFF      CHILD      DISABLED             2           155
E5344F6F349B000F      000D6F000DBACAD0                         0  COORDINATOR                     RX_ON     PARENT       UNKNOWN             0            49
E5344F6F349B000F      A4C138FF87AFABBA                     21990  END_DEVICE                     RX_OFF      CHILD      DISABLED             2           101

And the log shows a longer period of time during which I believe there is the mesh update as I set its period to 5 minutes.
I see quite a few Node XXX is not updated messages but they don’t seem to come from the binding itself and I’m not sure they are the source of the issues I’m seeing.

Thanks for your time.

I was not sure what I was looking for,
so I fired up Wireshark and started sniffing the zigbee mesh.

Filtering in Wireshark for “zbee_aps.zdp_cluster==0x031” (“Link quality request”)
showed only result when I requested the neigbors for a node manually in the console.

However, after restarting the zigbee-binding I now see Link quality requests from the coordinator to the nodes every five minutes (=Mesh update period).
And now zigbee_neighbors and zigbee_routes show usable content in the UI.

So, somehow the mesh update thingies did not run at the times it should run.

Restarting the whole openHAB now to see what happens…

Something is clearing zigbee_neighbors and zigbee_routes as seen in the UI after
two, three, four or five minutes…

How this is related to the mesh update routines that was not running…

Ok, I think I got it:
When I change the color temperature or the dimmer on one of my bulbs, zigbee_neighbors and zigbee_routes strings disappear from the UI.
They remain “[]” until the next mesh update run.

Thanks, but I still need to see the logs to find out what is happening. It seems like the mesh update is not running, so probably this is a configuration issue. Possibly the configuration is now wrong for some reason. From the logs from @obones it doesn’t run, but I can’t see what the setting is.

Part of log file that contains two mesh updates:
openhab-meshupdate.log.zip.txt (214.7 KB)

I had to zip the log file and then give the .zip a extra .txt to upload it here.

No, the mesh update did not run for me either, but it started to run as sceduled when restarting the zigbee-binding (in openhab-console “bundle:restart org.openhab.binding.zigbee”).

Not sure what configuration you refer to.
Mine is probably OK now, as the scheduled mesh update started as it should do after a full openHAB restart.
Also, the mesh update survived the night with 5 minute intervals.

Here is a zip (renamed to txt for upload) containing the log:
openhab_extract_restart.zip.txt (138.8 KB)

What I have done is this:

  • In the ssh console, activate log
  • In PaperUI, click the “pause” button for the zigbee dongle
  • Wait 10 seconds
  • Click the “start” button for the zigbee dongle
  • Wait for the dongle to be online, as well as the zigbee socket
  • Open each door in turn to “wake up” the contact sensor attached to each
  • Wait for 5 minutes

There, I noticed that the routes and neighbors are available and represent the real network.

After waiting for about 10 more minutes, the routes and neighbors are gone on everything but the controller. This is when I created the log file and turned off debug mode.

Just before posting this message, I went back to look at the controller, and its routes and neighbors are now gone and it doesn’t look like they’ll come back.

I was referring to the mesh update configuration.

One thing I observed is that this config can’t be updated when the binding is running - it will only be changed on startup. So if the update period is changed to 5 minutes, it will only actually change when the binding restarts. This might explain why there was nothing in the initial logs from @obones, and might be adding to the confusion.

In general there should be no issue with running the mesh update at a high rate - it “just” creates a lot of traffic, so it can impact your “real” commands.

This was originally written for a large commercial customer for commercial building use, and they ran the mesh update at 1 minute rate - the only real downside is the traffic it generates.

What you see in the console, and what you see in the UI, are not linked. The console and binding requests the information separately.

Is your network actually working ok? From the log, it doesn’t look like it is - most of the commands are timing out -:

In your log there are 29 LQI requests (the command used to generate this data), and only 6 responses. From a quick search, it looks like all these come from the coordinator and one other device (FCC7).

So from this, I only expect the coordinator and FCC7 to show the data, and as far as I can see, this looks ok since I see messages in the log saying that the node is being updated in the binding handler. It doesn’t log the neighbour information though, but I see no reason that it should be there for FCC7, and I see no reason it would be there for other devices.

Did you check FCC7? This and the coordinator are the only devices I would expect to see as at least in this short log, only these two devices are responding, and updating the binding correctly.

I can’t comment on this without seeing the logs, but given the state of your network, I’m not surprised by this. At some point the coordinator will remove devices that aren’t responding from the neighbour tables - which seems to be most of them from what I can see in the log.

I don’t think there’s a bug here - as far as I can see from the logs, and the fact that it is working for @NilsOF I think it’s working ok. There is the confusing matter that the update rate is not changed until the binding restarts which is likely adding to the feeling that this isn’t working and I will take a look to see if this can be changed (I think it is ok to update this on the fly).