Zwave device going offline

I have an Aeotec zwave controller and several aeotec switches and bulbs along with a few GoControl zwave bulbs. I’ve been running with this setup for about a year and a half starting with only a few of these devices and building up to where I am now and have never had an issue with devices going offline. I came home to day and my garage door controller was not working and I look in Habmin and see several of these devices are offline, (NODE 4: Is currently marked as failed by the controller!, for example ). After a couple OH service restarts and Rpi3 reboots, everything came back online and worked just fine. Just now, one of my Aeotec bulbs went offline and I had to restart the service to get it back online. I’ve read a few posts here about devices randomly going offline, but there seems not to be a definitive solution. Is there a way to troubleshoot this to get to a solution? Is my controller dying? I did set the zwave binding to debug mode, but i saw nothing more descriptive than the error shown above. All answers greatly appreciated.
Thanks

What version of the binding are you using?

You should use debug logging to find out what is happening - there must be something logged in there. If this is just one or two devices, then it’s likely that the device is at the maximum range, and depending on things like other equipment that is powered on, weather etc, the device may simply be uncontactable. If this happens, the binding will mark the device offline. When it is again contactable, it will be marked online again.

Whichever version comes with OH 2.3. I don’t use the nightly snapshots.

I’ll set debug level and watch the logs the next time it happens. But many of these devices are fairly close to the hub and have been running without issue as long as I’ve had them. Up to a year and a half or so. There is one GoControl bulb that is maybe 5 feet away and a couple of Aeotec wall motes that are within 15 feet with nothing between them and the hub.

I guess something has changed though? The software you are running is pretty old, so it’s not that - maybe it’ the controller that is on its way out?

That’s what I’m wondering. If it is, I have a Linear zwave/zigbee controller I can replace it with. Its just a matter of getting all my devices paired with it. Is there a simple way to do this, or will have to reset everything and then pair them with the new controller?
Thanks for the reply.

I got a new Aeotec controller, backed up the original, restored to to the new controller and about 3 weeks later, I’ve run into the same problem. Rebooting fixes the issue. For now, I’m going to schedule weekly reboots of my Rpi3 until I can find a real solution.

My issue may have to do with file system corruption. I have an RPi3 with a 120GB SSD. The clue came from write errors from the karaf console.

Dec 17 17:32:22 openHABianPi karaf[605]: Write action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed! Input/output errorWrite action failed!

I’ve implemented an automatic file system check after startup, with a weekly reboot. I hope this solves the problem. Things had become rather chronic. Devices would stay online for several days, but lately, they would go offline several times a day. If, after time, things appear to have improved, I’ll post a follow up here.

Its been a few weeks now and I’m not losing zwave devices anymore, so I’m going to assume that the issue was file system corruption. For others who may be interested, here is what I added to /boot/cmdline.txt

fsck.mode=force fsck.repair=yes

I also have a cron task the reboots the system monthly to ensure that a file system check occurs at least once a month.

Alternatively, one can do a filesystem check and repair once by typing this at the command line:

sudo touch /forcefsck

Hope this helps someone.