Zigbee devices offline/don't work after reboot on pi running OH2

Can you provide the log from before this so I can see when the messages stop? It looks like there is no communications at all (I’m assuming that you have logging for com.zsmartsystems.zigbee enabled?). This log shows nothing from the ASH layer, which may indicate something has stopped at that layer.

Hi,

Sorry I’m just now getting back to you on this. I actually had the sensors go down a few hours ago and I have all the logs. Here is the event log that shows the last communication with the sensors was at 2018-09-28 22:45:17.124

2018-09-28 22:43:42.372 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:42:42.342-0700 to 2018-09-28T22:43:42.349-0700
2018-09-28 22:43:42.377 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:42:42 PDT to 2018-09-28 22:43:42 PDT
2018-09-28 22:44:04.767 [vent.ItemStateChangedEvent] - KitchenTemp changed from 77.36 °F to 77.342 °F
2018-09-28 22:44:04.769 [vent.ItemStateChangedEvent] - KitchenTempC changed from 25.20 to 25.19
2018-09-28 22:44:04.788 [ome.event.ItemCommandEvent] - Item 'KitchenOutlet' received command OFF
2018-09-28 22:44:06.427 [vent.ItemStateChangedEvent] - zigbee_device_7d24e207_24fd5b00010ac3fc_24FD5B00010AC3FC_1_measurement_temperature changed from 76.334 °F to 76.352 °F
2018-09-28 22:44:29.305 [vent.ItemStateChangedEvent] - zigbee_device_7d24e207_24fd5b00010c75f5_24FD5B00010C75F5_1_measurement_temperature changed from 81.536 °F to 81.554 °F
2018-09-28 22:44:29.313 [vent.ItemStateChangedEvent] - OfficeTemp changed from 27.52 to 27.53
2018-09-28 22:44:29.328 [ome.event.ItemCommandEvent] - Item 'OfficeOutlet_Switch' received command ON
2018-09-28 22:44:42.381 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:43:42.349-0700 to 2018-09-28T22:44:42.359-0700
2018-09-28 22:44:42.385 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:43:42 PDT to 2018-09-28 22:44:42 PDT
2018-09-28 22:44:52.762 [me.event.ThingUpdatedEvent] - Thing 'zigbee:device:7d24e207:000d6f000d8b89c4' has been updated.
2018-09-28 22:44:55.169 [me.event.ThingUpdatedEvent] - Thing 'zigbee:device:7d24e207:000d6f000d8cdbe0' has been updated.
2018-09-28 22:45:15.290 [vent.ItemStateChangedEvent] - zigbee_device_7d24e207_24fd5b00010c75f5_24FD5B00010C75F5_1_ias_standard_system changed from OFF to ON
2018-09-28 22:45:15.304 [vent.ItemStateChangedEvent] - MotionOffice changed from OFF to ON
2018-09-28 22:45:15.308 [GroupItemStateChangedEvent] - gMotion changed from OFF to ON through MotionOffice
2018-09-28 22:45:17.124 [vent.ItemStateChangedEvent] - zigbee_device_7d24e207_24fd5b00010c7852_24FD5B00010C7852_1_measurement_temperature changed from 76.676 °F to 76.694 °F
2018-09-28 22:45:42.390 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:44:42.359-0700 to 2018-09-28T22:45:42.365-0700
2018-09-28 22:45:42.395 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:44:42 PDT to 2018-09-28 22:45:42 PDT
2018-09-28 22:46:42.398 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:45:42.365-0700 to 2018-09-28T22:46:42.372-0700
2018-09-28 22:46:42.404 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:45:42 PDT to 2018-09-28 22:46:42 PDT
2018-09-28 22:47:42.402 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:46:42.372-0700 to 2018-09-28T22:47:42.377-0700
2018-09-28 22:47:42.410 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:46:42 PDT to 2018-09-28 22:47:42 PDT
2018-09-28 22:48:42.419 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:47:42.377-0700 to 2018-09-28T22:48:42.392-0700
2018-09-28 22:48:42.432 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:47:42 PDT to 2018-09-28 22:48:42 PDT
2018-09-28 22:49:42.468 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:48:42.392-0700 to 2018-09-28T22:49:42.445-0700
2018-09-28 22:49:42.473 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:48:42 PDT to 2018-09-28 22:49:42 PDT
2018-09-28 22:50:42.478 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:49:42.445-0700 to 2018-09-28T22:50:42.455-0700
2018-09-28 22:50:42.482 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:49:42 PDT to 2018-09-28 22:50:42 PDT
2018-09-28 22:51:42.488 [vent.ItemStateChangedEvent] - ntp_ntp_local_dateTime changed from 2018-09-28T22:50:42.455-0700 to 2018-09-28T22:51:42.462-0700
2018-09-28 22:51:42.496 [vent.ItemStateChangedEvent] - ntp_ntp_local_string changed from 2018-09-28 22:50:42 PDT to 2018-09-28 22:51:42 PDT

Here is a link to the debug log for the time period before and after.

Hope this helps!

I have the exact same issue
using a ember dongle and a variety of bulbs and one motion sensor
ill setup the sniffer and run some packet capture

If you can get sniffer logs, that would be great. Feel free to open a ticket on my website rather than post here as you will need to provide me with the network key which you should definitely not post here.

thanks i’ve sent you what I have in debug logs, and a comment about the sniffer, Its bed time here so if you need more let me know ill get it to you in about 12 hours

1 Like

Has there been any resolution to this issue? I’m running into the same problem with a Nortek zigbee/zwave controller and Centralite motion sensor.

I’m running 2.3.0 release so I’m thinking if I upload the latest zigbee binding then that could clear things up. I haven’t gotten around to it but will post once it’s done.

I rarely reboot my pi any more but every few days the zigbee devices go offline. I’m using the restart command below and then the devices come back online.

sudo systemctl restart openhab2.service

Please provide debug logs, and preferably sniffer logs if you can provide them (this requires a separate sniffer) and I’ll be happy to take a look at the issue.

I’m very new to openHab. But I encountered the exact same problem. although I might have some things to add here.
I’m using my phone kind of as a bluetooth beacon.
The same thing happened to my bluetooth thing (the phone) when is restarted openHab. Not only the Zigbee things became offline, but also my phone is now shown as “offline” whereas prior to the reboot all my devices (Zigbee as well as the phone) were showing “online”

SO I don’t think we will have any luck by sniffing the Zigbee traffic, seems like a problem deeper down.

As an A/B test, i also added the Astro Moon phase “thing” which does not talk to something like the bluetooth/zigbee controllers do.
Apparently the Moon Phase thing was still shown as “ONLINE” even after the reboot.

If someone can tell me what logs exactly to provide (including a hint to the article explaining how to put it into debug mode) I’d be happy to copy’n’paste some log lines

Most of my devices are zigbee but I do have 2 wemo wifi devices and those have never gone offline.

Here are the notes I used to turn on karaf debugging. I use putty to ssh into the pi and at the openhabian prompt type

$OPENHAB_RUNTIME/bin/client

the password is habopen

to turn on debugging type

log:set debug org.openhab.binding.zigbee
log:set debug com.zsmartsystems.zigbee

BTW, if you ever want to turn off debugging then use

log:set info org.openhab.binding.zigbee
log:set info com.zsmartsystems.zigbee

Hope that helps!

What do you mean?

I’m not sure that I agree that sniffing the traffic won’t help - in fact I believe it will help, and I would request that if people are having problems, that this is something that should be looked at.

Please see the binding documentation - right at the end there is a section on trouble shooting. If the issue is the same as seen here, the log will likely not help much as we don’t see what is really happening between the device and the coordinator.

I can provide a debug log regarding this issue, where zigbee communitcation suddenly stops, also after rebooting… My zigbee devices (none Traadfri though) stopped responding the other day, (again). After the first reboot, some devices started responding again, but only for a few hours. After next reboot, none repsponded. But I havn´t had time to look at it.
I do however run debug logging 24/7, and right now I have 10 debug log files at size 10MB in my log folder covering the last few days :slight_smile: I can try find the time periode where the problems happened, then I´ll update the link to my dropbox I provided earlier. Or I could just copy all the files to the dropbox folder, if you want to have all of them?
Hopefully I´ll have some time to look at it tonight.

But I agree, sniffing may be required to spot this issue.

Thanks. I really welcome people to provide logs, and not just say “I have the same problem”. This I really do not like as often, what may look like the “same problem” may not actually be the same cause. It’s easy to jump to such conclusions, and also understandable even, but I would prefer people provided logs all the same :wink:

This would be really appreciated. If you can spot anything that looks abnormal, then I’m really happy to take a look at this in detail.

Well. I see several problems, (but no obvious errors).
It seems like, I have no Zigbee logging update since the 14th of October time 23:39. I guess either the coordinator no longer respond or the binding stopped working. I cant tell from the log.

Then I started digging for the devices…
My Trust motion sensor, who very often goes offline, seems to have stopped (polling) at:
13-Oct-2018 07:42:47 after that, nothing. Again, no obvoius reason, it simply stopped polling.
Sometime between 13-Oct-2018 07:42:47 and 13-Oct-2018 19:43:26 I have made a reboot, cause the motion sensor is polling again. The sensor responded until 14-Oct-2018 05:42:05, then stops. Next time it respond is 14-Oct-2018 07:25:54 . It seems like my Rpi for god knows what reason have rebooted sometime during the night at the 14-Oct.03:17.xx cause suddenly the binding is reloading everything. This wasn´t me rebooting for sure.
The very last message I´ve got from this sensor is: 14-Oct-2018 12:48:35 shortly after an reboot.

The Philips Hue motion sensor have triggered until 14-Oct-2018 21:59:42. Since then it havn´t triggered.
According to the openhab log file. The zigbee logfile have a few lines after this time, and then it ends at 23:39, and have not updated since, no errors or anything in either zigbee log or openhab log file.

It´s very hard to tell whats going on with my knowledge. Perhaps you have better luck, @Chris. I´ve updated the file in my dropbox to the last zigbee debug log. I guess this may be the best place to start or to tell, if it´s useable at all.

https://www.dropbox.com/s/b7w9ug3u6xvpvb4/zigbee.log?dl=0

Thanks @Kim_Andersen. Are you able to provide the log that matches up with your comments around 07:00 and 12:00 and 19:00 please? The log you linked only has the last time in it (around 22:00). If not, no problem, but I can’t really comment on these points otherwise.

This is certainly very interesting. As you say, not a lot to go on, but I will take a look at a few things - I suspect maybe there is a thread lock issue here so I will take a look for this. If you see this sort of thing again, then please provide the logs as I’d like to establish a pattern here (unless I find something obvious based on what I see in this log, but I will look at that tomorrow…).

(p.s. the sniffer dongles arrived tonight :slight_smile: - I will try and get them posted on Friday).

1 Like

All the mentiones problems with the Trust sensor is splitted into several log files…
These are the rest: I know its lots of logging, but now you have everything, and perhaps you can spot something I cant :slight_smile:

https://www.dropbox.com/s/oszncjy254gh9pm/zigbee.log.5?dl=0
https://www.dropbox.com/s/gb8hcxeshgna5ia/zigbee.log.6?dl=0
https://www.dropbox.com/s/tatqna64qx4a71v/zigbee.log.7?dl=0
https://www.dropbox.com/s/8dpo7yzms402rfh/zigbee.log.8?dl=0
https://www.dropbox.com/s/d4vnur1fpwsf97m/zigbee.log.9?dl=0
https://www.dropbox.com/s/qidmbmhfxhbp1bt/zigbee.log.10?dl=0

Since I just started debug logging a few days ago, I havn´t seen it like this. But I´m pretty sure it´s the same that happens each time. And I´ll keep an eye on it. Tomorrow I´ll try get the devices to respond again. And then hopefully it starts logging again as well.

This is a screendump just a few seconds old of my devices and their status in Openhab. Note specific the coordinator is ONLINE. That makes me wonder why the heck the debug logging just stopped on the 14th October without any notice.

Great… I really hope we can get to the bottom of this quite annoying issue.

1 Like

Thanks - I’ll take a look through these…

I think that this is a good example of why I like to see logs rather than having people say “I have the same problem”, or “the same thing happened again”… In all the logs I’ve been sent, I’ve not seen this specific issue, and as there are a lot of possible reasons for the binding to stop, or devices to be offline, it’s always good to deal with as many facts as possible :slight_smile:

@Kim_Andersen if you see this issue again, then I would like to get a thread dump if possible.

The specific attributes of this issue that you should look for are that the ASH layer messages are stopping - or maybe other things may also stop, but the ASH frames are the most obvious thing that I think will get stuck.

I think that there is a resource issue, and your system is running out of resources - the framework tries to create a new thread to process the incoming message, but this doesn’t happen and it seems that it blocks. Possibly some threads are not completing and are consuming resources, or maybe something else in the system is consuming resources and preventing the ZigBee system to operate.

I don’t agree with this. The polling is set to 2 hours - it polls at 05:42, then 07:42, then 09:42, 11:42, 13:42 - I then stopped checking, but this looks fine - right?

Hmm - I’m not sure what you are looking at. Why do you think the sensor stops polling?

image

image

image

image

image

There is a difference between polling, and responding. Clearly the binding is polling - we can see that as I’ve posted above. However, battery devices often sleep most of the time. In theory, they should wake up every 7.6 seconds, but often this is not the case. It is normal therefore that the device will not respond if it is not respecting the wakeup period, but this doesn’t mean there is a problem.

I wasn´t aware polling is set for two hours. But…

Part of my (probably failed) conclution is a mix between symptoms and respondings. When a symptom shows, that a device is no longer responding, I would somehow assume it´s not polling or communicating anymore. Ofcouse this could be wrong, but if it´s making a sucessfull polling, but dont respond, I would assume something far more serious is wrong somewhere? Remind you, these are motion sensors. Their job is to respond when motion is detected. If the binding can communicate with the device (polling) but the device suddenly stop responding, and after an reboot of the Rpi, it respond fine again, I would assume the problem is somewhere outside the device itself.