Raspberry PI 4 Openhabian
openHAB 3.4.0 Build #3029
Using the Ember EM35x Coordinator on a Dual USB Nortek stick
First, sorry this will be vague because I don’t really know how or why this is happening. Some background, my setup has been running very well for many many months. I believe this installation was created in August 2022. No new devices have recently been added.
The issue: two days ago, some Zigbee devices started to go OFFLINE. They could be brought back ONLINE (by disable / re-enable cycles) but would soon after return to OFFLINE.
So when I saw that (which is not very normal, especially with many devices simultaneously), I decided I would reboot the system. I rarely reboot as it can lead to a long process to bring Zwave back up (I have a long thread about that if interested).
Yes I need to move up to 4.X and plan to, but I am heading out of the country so it’s not an option right now.
So continuing, after the reboot, Zigbee would come up (i.e. 99% of devices became ONLINE) but soon after they would start to go OFFLINE again. I retried this process several times and it is repeatable. No, I dont have logs for the Zigbee at this point because I expect something else is happening.
So my theory, and this is where I would like some help to confirm it … I believe I have had some sort of SD card failures. My plan is to rebuild a 3.4.0 set up from scratch on a new micro SD and then restore the backup.
One question arises, is it possible that the backup was corrupted (assuming the SD failure theory) and so restoring the backup might not solve it? (I have backups from many days previous to the issue happening)
Thanks for any thoughts you might have or more questions that you need answered before you can comment.
could be possible that it’s the SD-card (can cause wired effects) but if only zigbee is affected I doubt it.
My guess would be that one device (e.g. a zigbee router or even the coordinator) causes the trouble. I had this with a plug, after removing the plug the problem was gone. Removing AC-zigbee devices would be one of my first attempts.
Why not check the logs? They’re always the first thing to check and often tell you what’s going wrong.
And if you’re about to reinstall then why not OH4?
It’s not clear. Your post title indicates that both Zigbee and Zwave are failing but your post only talks about Zigbee.
If it’s both, I’m not sure that an SD card failure would cause the problem. If it did, I wouldn’t expect it to impact both. Instead I’d expect the cause is some failure of the USB port, USB hub, or perhaps the USB dongle itself.
By the time the operating system sees them, the Zwave ad Zigbee are separate devices. In order for both to be impacted the same way would require something further up in the path to be the root problem.
But who knows. SD card failures can cause lots of weird stuff to happen.
Sure. It depends on when the SD card wore out compared to when the backup was taken. If the problem started after the backup you are probably fine. If it occurred before, any corruption to the file system likely got backed up too.
Yes both Zigbee and Zwave are having trouble, Zwave bridge will go ONLINE but none of the devices connect. But as I said, I have had a long term problem with Zwave as is detailed here: Zwave bridge wont go online. Zwave will come up after a few incantations (rebooting or unplugging all HW from Pi and letting it sit) … But Zwave never came active in all my attempts.
I have another spare dual dongle and did try it but saw similar results. I assume I can change the HW Dongle to an identical one and it wont cause any relearning etc. Is that correct?
I have backups from every day so I can pick a day or two before the problem occurred.
Because I dont have time right now as I am leaving the country in a few weeks.
So sounds like “sure anything could happen” with a failed SD so my next step is to replace it and see how it goes. will let you know.
In fact, to rule hardware out, I moved the Sd card to another Identical PI-4 with a different Dongle and the same (although other issues) were occurring (i.e. it didn’t just “come right up” which I would have expected (I assume)
Why I didn’t look at the logs … yes, I should have saved them, but I thought it was just a reboot needed and then ended up trying > 7 times and lost the log history. Lesson learned - although the log doesnt say much unless DEBUG for the Zwave and Zigbee are on … I can of course try it again, but I wont bother if the new SD card solves it.
I do not know the Nortek stick,
but it is one of the common things left.
You could try a USB extention cable to lift the stick a bit out of the noise from the rpi.
And avoid using the blue USB ports on the rpi.
Those blue ones are USB3 ports and creates more hf noise.
Thanks Bob. Yes restarting has been the routine. I’ve had an ongoing issue with zwave not coming up but after rebooting it would eventually comment up and be totally fine for many months. This time it would not come up after > 10 tries. So started to suspect something else. This is a new USB stick and a different Micro SD card.
It might be easiest to factory reset all 8 devices, then include with the new controller. They are currently paired with the old controller. The other option is to try to delete them with the new controller in exclude mode. What kind of new stick do you have?
Just to clear this up, that means you’ve switched to the version of openHABian that was configured to run OH3. That does not mean the version of OH that is configured to be installed is OH 3.
Yes, and the binding is deleted and reinstalled every time you clear the cache. You’ve done this already and not noticed I’m sure, because the information is stored in the Controller and the Things and contents of the $OH_USERDATA/zwave folder. The binding it just the code that runs, not the data about your config.
Note this is true of all bindings. The information is stored outside of the binding. However, it’s not always the case that that information is compatible if the Things were created with a version of OH earlier than the binding that is running, But the upgrade process usually handles this for you now.
The key thing here is that the nodes are stored on the old stick. You’ll never get those nodes to come online or work on the new stick until you pair the nodes with the new stick.
If you are not going to move all your devices to the new stick, it’s pointless to test anything with the new stick. The new stick doesn’t know anything about your nodes.
Yes understand that and makes perfect sense (although it seems different that Zigbee as I didnt have to reset them to get them to pair with the new stick)
At this point I am trying to figure out what happened to Zwave on the Old Stick (i.e. why it wouldnt connect after many attempts) … I am experimenting with one Zwave device and easily paired it on the new stick. Now I am going to try to re-pair it on the Old Stick (after exclusion etc) … that should tell me that the Old Stick is in fact working. Then I can see if I can reliably bring up the old stick.
Meanwhile I am learning and solving issues related to the Migration … so far only a few things related to Transformations and BigMath which I’ll dive into and can probably resolve.
Hopefully I can make some progress … my wife is frantic that I wont solve it all before we leave in a few weeks.
Well, I’m getting discouraged … I am now running on a different Micro SD (32GB), using a new USB Stick, running 4.1.1, extended the USB from the PI (noise concern) … and the exact same behavior is happening. i.e. Zigbee devices all come up (faster by the way with 4.x) but overnight many have gone OFFLINE. So essentially the same behavior I was having in the first place. I have found a way to get Zwave up reliably: disable the Zigbee, reboot, Zwave comes up quickly. Then enable Zigbee and it comes up normally. BTW, yes I repaired all the zwave devices to the new USB stick.
Looking for suggestions for other things to try. Will turn on zigbee debug log so I can see what happens when a zigbee is going OFFLINE.
I read the whole chain and have been following along all the advice you have been given is really good approach’s considering no logs have been provided and minimal details really offered.
I will add my 2 thoughts as well consume them as you wish.
Turning on debug logging for your original issue of ZigBee things dropping off is a good start. Testing on a different usb port will help eliminate a possible hardware issue with your pi also.
However also consider if other environmental influences had changed right around the time you started seeing ZigBee issues. keep in mind ZigBee shares its RF bandwidth space with Wi-Fi. Also, Zigbee is a mesh network and "most "USB sticks (controllers) only support ~ 32 DIRECTLY attached devices you can obviously have many more than that, but they will be handled through other Zigbee routers (like smart plugs as one example). It would be good to also check those “routers” rebooting them (unplug and plug them back in after a few minutes) or if they are inside a in wall junction box trip the breaker for circuit for a few minutes then power them back on. This will help eliminate a router like device that may have a corrupted cache and as part of your trouble shooting efforts is a good effort (think reboot my network if you will).
I think all the other pain you went through with upgrading as a “quick hope and prayer” approach only added to your injuries and as such you are now licking those wounds as well as the original cuts and bruises.
One final thought if I was troubleshooting a issue like you face I would shutdown my openhab instance while rebooting the “network” this will prevent a bunch of devices trying to "relearn new paths back to the controller.
Thanks Justin - not sure what you mean by 32 devices. Yes I have many more than that and have no other zigbee controllers. I of course have other technologies like WiFi that share the same frequency band. I’m pretty sure the Ember binding supports up to 250, I have mine set for 100.
Good suggestions which I’m looking at. Currently rebooting on a different Pi-4 to see.
The main issue has been the finicky zwave which would take hours of bringing the system up and down to get it to connect. That prevented me from doing any analysis or playing around since once I got the network up I would leave it and would be completely fine for many months. I now have a repeatable method (worked three times in a row) so I can experiment more easily.
If the new Pi-4 has similar issues, I’ll need to do some more thinking. RF interference is always a consideration, but very unlikely to have caused the scenario I am dealing with.