Zigbee2MQTT Requires Restart after OH3 Restart

  • Platform information:
    • Hardware: Raspberry Pi 4
    • OS: Openhabian 3.3.0
    • Java Runtime Environment: which java platform is used and what version
    • openHAB version: 3.3.0

I’ve recently started using Zigbee around my house with the Mosquitto broker and Zigbee2MQTT (Z2M) service to integrate into OpenHAB. I’m using the Tubes CC2652P2 based Zigbee to Ethernet/USB Serial Coordinator and everything seems to be fine until OH needs a restart. At that point, the LED indicator on the coordinator goes dark and the logs in Z2M just say

2022-10-25 15:28:19 Saving state to file /opt/zigbee2mqtt/data/state.json

over and over again; commands to zigbee devices from the Z2M interface return failure errors. It’s taken some luck to figure out what was triggering it but now I can’t figure out why it’s happening. Running a sudo service zigbee2mqtt status and sudo service mosquitto status shows that the two are “active (running)” but Z2M has the following errors:

● zigbee2mqtt.service - Zigbee2MQTT
     Loaded: loaded (/etc/systemd/system/zigbee2mqtt.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2022-10-25 14:55:49 CDT; 21min ago
   Main PID: 7246 (npm)
      Tasks: 23 (limit: 4915)
        CPU: 10.331s
     CGroup: /system.slice/zigbee2mqtt.service
             ├─7246 npm
             ├─7257 sh -c node index.js
             └─7258 node index.js

Oct 25 15:12:51 openhabian npm[7258]:     at Timeout._onTimeout (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
Oct 25 15:12:51 openhabian npm[7258]:     at listOnTimeout (internal/timers.js:557:17)
Oct 25 15:12:51 openhabian npm[7258]:     at processTimers (internal/timers.js:500:7)
Oct 25 15:12:51 openhabian npm[7258]: (node:7258) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch b>
Oct 25 15:15:53 openhabian npm[7258]: Zigbee2MQTT:debug 2022-10-25 15:15:53: Saving state to file /opt/zigbee2mqtt/data/state.json
Oct 25 15:16:11 openhabian npm[7258]: (node:7258) UnhandledPromiseRejectionWarning: Error: SRSP - ZDO - mgmtPermitJoinReq after 6000ms
Oct 25 15:16:11 openhabian npm[7258]:     at Timeout._onTimeout (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
Oct 25 15:16:11 openhabian npm[7258]:     at listOnTimeout (internal/timers.js:557:17)
Oct 25 15:16:11 openhabian npm[7258]:     at processTimers (internal/timers.js:500:7)
Oct 25 15:16:11 openhabian npm[7258]: (node:7258) UnhandledPromiseRejectionWarning: Unhandled promise rejection. This error originated either by throwing inside of an async function without a catch b>

The status errors begin to appear during the startup of OH as the MQTT devices begin to initialize. Thankfully, I can clear the issue with a sudo service zigbee2mqtt restart but it would be nice to have this handled automatically. Any ideas how to troubleshoot?

I don’t use zigbee2mqtt so this is more an answer from theory than from practice.
I would assume that zigbee2mqtt should be independent of OH but dependent on mosquitto.
What if you shutdown zigbee2mqtt before OH and start zigbee2mqtt again after OH started ?
This should be possible to be integrated into the services file.
As they should be independent of each other it is more than a ( dirty ) work around than a solution…

All three services (OH, MQTT, z2m) should run independently.

Just to be 100% sure: you are talking about restarting your oh service and not your entire raspberry?

Any chance that you still have the ZigBee binding in OH active and both OH and z2m try to access the same device?

Also found this on the z2m faq:

Summary

If after some uptime Zigbee2MQTT crashes with errors like: SRSP - AF - dataRequest after 6000ms or SRSP - ZDO - mgmtPermitJoinReq after 6000ms it means the adapter has crashed.

  • Normally this can be fixed by replugging the adapter and restarting Zigbee2MQTT
  • If you are using a CC2530 or CC2531 adapter consider upgrading to one of the recommended adapters. The CC2530/CC2531 is considered legacy hardware and runs into memory corruption easily.
  • Make sure you are using the latest firmware on your adapter, see the adapter page for a link to the latest firmware.
  • If using a Raspberry Pi; this problem can occur if you are using a bad power supply or when other USB devices are connected direclty to the Pi (especially occurs with external SSD), try connecting other USB devices through a powered USB hub.
  • Disable the USB autosuspend feature, if cat /sys/module/usbcore/parameters/autosuspend returns 1 or 2 it is enabled; to disable execute:
sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT="/&usbcore.autosuspend=-1 /' /etc/default/grub
update-grub
systemctl reboot

Thank you all for the attention to this issue, I greatly appreciate it.

I can consistently create the issue by restarting the openhab service, either using sudo service openhab restart or when updating the OH core (which restarts the service as well). A full system restart has no ill effect and everything works as it should.

Those look like some promising things to try; I’ll give them some consideration. However, I can, with certainty, say that the coordinator goes unresponsive during the MQTT initialization phase of the OH startup, regardless of coordinator uptime.