Openhabian + RPi4 + Zwave Gen5 falls offline on restart

I’d suggest opening an issue on the zwave github with your debug logs. There is only one developer for both zwave and zigbee.

Perhaps I do not understand hardware pass through, but I use OH on WSL when I want to test zwave code. I attach a zstick with
>usbipd wsl attach --busid 1-1

where busid 1-1 is the zstick identified with (in command prompt)

C:\Users\Robert>usbipd wsl list
BUSID  VID:PID    DEVICE                                                        STATE
1-1    0658:0200  UZB (COM12)                                                   Not attached
1-2    093a:2510  USB Input Device                                              Not attached
1-3    1c4f:0016  USB Input Device                                              Not attached
1-11   0658:0200  UZB (COM3)                                                    Not attached

1-11 is my zniffer

Thanks will open the issue.

Your usbipd command is connecting the usb device over IP (hence USBIP). You are typing a command to the server end on Windows (usbipd) that forwards a command to linux to initiate a USBIP connection. I am running the NUC on Win10 and had to rebuild the kernel to support usbip, but apparently it’s native in the Win11 version.

I found it was great for development, but if I left it running for a couple days the connection would fail or lose sync. My second try was to remote the usbipd to a Pi, but had the same issue with the client side. That’s how I got on this whole track of using the OH remote binding to communicate to the zstick.

The other hairball with WSL is I am running Docker under WSL. To pass the serial port into Docker it needs to exist before the docker run command, which I was automating with a Dockerfile. On reboot I had to get the usbipd running on windows after WSL started and before Docker started. Doable, but I hate those kind of dependencies for production.

Since you are going through layer after layer of abstraction already anyway, why no run in a VM? VirtualBox is free, runs on Windows, and supports USB device passthrough.

Or you can just run OH natively on Windows. openHAB does not require Linux and can run on almost any OS that supports Java 11 (Java 17 for OH 4).

That would actually be a good test. Does the binding have any problems with the controller on Windows?

I’m testing how to deploy the main server. One important requirement is it restores automatically on a restart.

I really like the configuration management simplicity of docker. I have to spend some time to learn how to set up an OH docker container on Windows and that’s on the list of candidates. I have a testbed NUC to test a few configurations including WIndows docker running Linux images, Windows docker running a windows image and a Linux VM with docker. I’m assuming once this is running I won’t touch it very often so I want a software stack that’s pretty vanilla.

That doesn’t solve the zwave issue because I am trying to avoid a line-of-sight dependency between the OH main server and the zwave devices. On the PI I will try a USB drive and shutting down zram to see if that behaves any differently.

It doesn’t support USB passthrough.

Removes Windows docker from the list

yeah if yanking out the cord is a requirement, I’d not run zram. Zram kinda depends on cord not being yanked out

1 Like

Digging a little deeper, I found the zwave binding leaves behind a file

/var/lock/LCK…ttyACM0

with some numbers. If I delete the file I can restart the binding.

Is there a way to script deleting all the lock files on OH startup?

Have a look to this thread Aeotec Z-Wave Gen5+ Stick stays offline after container restart / Raspi reboot
A script that is example is posted there in

Good catch, @Wolfgang_S. It’s somewhat amusing that @Andrew_Rowe and I were both involved in that similar conversation last year, but in a slightly different context that made it unrecognizable.

Reading further into that thread, a fix to remove the stale lock file was introduced in 3.4, but maybe it only works for containers?

Thanks. It’s nice to know I’m not the only one seeing this issue and there’s a known workaround!

I also opened an issue on Github and Chris responded. I guess I’m curious if the binding could clear out the lock files when it initializes. Or if it wants to be nice it could only clear them if they’re older than the system uptime. Not sure if the binding can tell the exact path where the lock files live through the java libraries.

FWIW I think zram slightly masks the bug because (and I’m guessing here) the lock files don’t always get committed out of RAM cache. I tried moving OH to a USB mounted device on the PI - and saw consistent restart failures. Then I tried OH on a VirtualBox Ubuntu Server on a NUC and also saw consistent restart failures. Then went back to the PI, turned off zram on the SD Card file system and saw consistent failures.

No, that wouldn’t be the problem. The folder where the lock resides, /var/lock, isn’t put into zram. That’s an operating system controlled file system and not one that openHABian messes with.

In fact, if the lock files were in zram, the lock file would get lost when pulling the power because it doesn’t exist in disk.

When you pull the power, you prevent any cleanup from happening to remove the lock file which happens during an orderly shutdown. That’s why it fails on all those different platforms and configurations too.

no, the first link in this post is back to the original issue with NRJavaSerial by a contributor of that library. Wouter forked NRJavaSerial and patched it for us but this is a know bug with the library.

Andrew - is the upshot that deleting the /var/lock/LCK…device files is not a reliable way to restore the connection?

Seems like this issue has been around for years and may merit a fix. How can I help?

No that works. It has been years that this bug has been in NRJavaSerial. This post is from a github (closed) issue in the NRJavaSerial repository that was started in November 2020. This exact post from MrDos (who is a maintainer of that repository) explaining that he knows there is a file leak and really doesn’t know how to fix the problem. Ive seen this issue manifest it self in many odd ways. Sometimes seemingly unrelated problem end up being a result of this bug.
I think it also effected the modbus binding. If you search, there is a thread about using an alternative serial library.

know java?

1 Like

I can hack my way around in Java.

In the meantime I could see openhabian adding “rm /var/lock/LCK…*” to the startup script.

I would also add to binding docs (at least zwave, modbus and serial) there’s a known bug with stale lock files on Linux and to try removing them manually.

1 Like

Hi Asher - I am still experiencing a similar issue but have not rebooted in awhile so it’s on the back burner … here is my thread from this … also posted in the GitHub thread for this issue.

Thanks. From the various threads it sounds like there may be a fix but it’s not clear it’s integrated in OH 3.4.4. Does anyone know how to message Wouter since he seems to own this integration?

FWIW I just looked at the NRJavaSerial library code and I think the LCK files are created in a C library that the Java library wrappers. So this is an ancient anomalous behavior. (I’m trying to not use the word “bug”).

2 Likes

In the one thread (not the git issue, the one in our forum) I believe there is a script someone posted to delete all the stranded lock files.

it’s a bug

1 Like