Aeotec Z-Wave Gen5+ Stick stays offline after container restart / Raspi reboot

OK, I found the link to forum post on home assistant community forum about Aeotec Gen5 sticks and a lot of info about versions of this stick and why some work and a very fiddily looking hardware fix… check it out

Wild! From what I can tell looking a the link, this is a different problem (which I also read about), related to the Gen5 and USB detection, which results in the stick not working “at all” (USB) instead of “only sometimes” (like in my case, probably related to the NRJavaserial), correct? Discussions on this were the reason why I paid attention to getting the Gen5+ (it even says so on my stick).

Hahahaha! :slight_smile: Same for me, when I rebooted my machine earlier on, because I thought I could reproduce the problem to not work, to test out another fix-idea I had. Not sure if you’re running on docker, but in my case re-deploying the container always helps, maybe also for you.

Aha, no, unfortunately I run oh as a ‘native’ service on an gentoo/Asus PN50 box. But I have, for a while, been thinking about running oh on/in docker instead. Maybe next upgrade to 3.3 is a good time to actually move over to docker.
Thanks for reminding me! :slight_smile:

That is correct
I added the link because there is firmware version info on the different versions and how to figure out which version you may have for future reference

the nrjavaserial issue is kind of an edge case and so has not been addressed. In your case I’m guessing because of running in a docker??? Since it is a Pi, have you considered not running docker and maybe just flash a card with openhabian and see if problem goes away

I’m wondering, if the Aeotec Z-Wave Stick (incependent from Gen5 or Gen 5+) causes so many problems: Is there a reliable alternative around? Or is it that every USB stick will have the same problems?
Interestingly, with my Amber Wireless AMB8465 I’m not experiencing these problems, but then it’s also not running in the openHAB container but in a separate wmbusmeters container, with 100% reliability so far.

Apart from this annoying Aeotec Z-Wave-nonsense, I’m really happy with my docker setup. I wrote myself a short list of instructions on how to come from “empty SD card” to “fully restored system based on automatic daily updates” in < 30 minutes, in case my SD card breaks or I break my system beyond repair. I did a fire drill one month back and it worked pretty well. Let me know if there’s something I can share on that end.

Funny that you say that “OH on docker” is an edge case. I thought it’s the most predominant setup here in this forum. :wink:
Testing things with openHABian could be an option, thanks for the hint. However, I’m running lot of stuff on docker (wmbusmeters for heating and water meter readouts, influxdb instead of RRD4J, mosquitto for Nous plug readings, duplicati for automatic backup) and “testing it for a couple of weeks instead” would unfortunately require lots and lots of upfront work.

I’ve had problems with my Aeotec Stick (recent model) under a docker environment as well. The main problem was a stale lock file, preventing the stick from getting accessible.
And the poor man’s solution was adding an init file under /etc/cont-init.d/10_remove_zwave_lock:

#!/bin/bash -ex

ZWAVE_LOCK="/var/run/lock/LCK..zwave"

if [ -f "$ZWAVE_LOCK" ]; then
  echo "Removing stale ZWave lock file $ZWAVE_LOCK..."
  rm -f ${ZWAVE_LOCK}
fi

Problem solved for me, the lock file is getting removed at each start of the container.

1 Like

correction:
OH on docker running on Pi with Aeotec Gen5 zwave stick having restart issues is kind of edge case is more what I meant

This is definitely the nrjavaserial bug, it is a file leak of the lock files

this is the reference module by silicon labs available at digikey for $40 usd right now, make sure you get the correct one for your region
https://www.digikey.com/en/products/detail/silicon-labs/ACC-UZB3-U-STA/6111632
I use a linear HUSBZ-1 which does zigbee and zwave was only $30 usd back then now $50
https://www.amazon.com/GoControl-CECOMINOD016164-HUSBZB-1-USB-Hub/dp/B01GJ826F8/ref=sr_1_3?crid=2Y2XU0A0I6H51&keywords=HUSBZ-1&qid=1656872598&sprefix=husbz-1%2Caps%2C67&sr=8-3
purchased in Oct 2018 rock solid

Thanks! Will try this out. Also a good opportunity to learn more on unit files.

Thanks! Will go down this path if the frequent „offlines“ will become too annoying or if I don’t manage to get @Ardanedh‘s solution to work.

Sry, I did understand your post correctly in the first place. I was only surprised to read that, since the Gen5+ Stick is pretty standard for Z-Wave (which itself is pretty standard as a wireless protocol) running on docker (which I understood also many people use), is an edge case in itself. But then again, if „standard“ means only ~20% per case, the entire chain results in 0,2 * 0,2 * 0,2 = ~1% of all installations, hence not more noise on that particular issue in the forum. :wink:

agreed on @Ardanedh work around is very elegant :+1:

What I am ultimately trying to do is draw enough attention to the nrjavaserial issue that hopefully someone with more java programing ability then myself contributes a fix perhaps as was done for the modbus binding. My linked post above is from July 2021 and the original issue was discovered quite prior. Obviously it is still tripping up the lucky few.

Keep in mind that you can’t really draw conclusions about “what most people are using” from discussions in this community, because we rarely hear from people who don’t have any issues. It’s perhaps more accurate to say that discussion here is a reflection of “what people are struggling with”.

Anecdotally, I would say that there are more people talking about the Aeotec Gen5 Z-Wave stick (in all of its variations) than any other controller. I wouldn’t be surprised if it was both the most used device, and the one that causes the most problems for users.

Also anecdotally, I would guess that there are more instances of openHABian than any other setup. Both because openHABian is the logical starting point for new users (particularly those with less technical skill) and because a lot of intermediate/advanced users are comfortable dedicating an RPi to openHAB to keep things simple. However, that might have changed in the past two years due to the shortage of RPis.

Anyway, what I really came here to say is that I use a Zooz USB Z-Wave Plus S2 Stick ZST10 controller and have never had any issues with it. This should not be confused with the Zooz USB 700 Series Z-Wave Plus S2 Stick ZST10, since 700-series controllers are not supported by openHAB at this time.

Yes, Zooz gave their new device an even longer name and the exact same “ZST10” model number. No, Zooz doesn’t really understand how marketing works. Solid controller, though.

Just to note that the binding uses what ever serial library openHAB core provides (through org.openhab.core.io.transport.serial) - it’s not something the binding can change unless we move away from using the OH provided services and directly link a serial library (which was frowned upon in the past).

Personally I’ve stopped using nrjavaserial for other projects as it’s just too much hassle and causes too many problems.

Thanks @rpwong, very true points!

So then there appear to be two good hardware alternatives available (GoControl and Zooz) which have both proven to be working reliably with @Andrew_Rowe and @rpwong respectively, in case @Ardanedh‘s workaround for users of the GEN5+ doesn’t work (which it should) or @Andrew_Rowe not being successful in raising enough attention for the NRJavaserial fix.

I have to say I love this project and this forum! Thanks a lot to all of you! :slight_smile:

3 Likes

Well, from my side, I can say that before some early version 3.x (I have forgotten exactly when this nrjavaserial stuff changed), I never had issues with my serial ports for zwave. I guess it is some kind of race condition.
I would be surprised if Aeotec Gen5 is part of the problem here, like @rpwong says, it is probably the most common stick, but what do I know - it could of course be a combination of the computer and the stick.

Removing the lock file is needed, if it is left, but that is (at least in my case) not the full solution, at least not if I just restart oH. Maybe if I restart the full machine. But since I run oH on a server doing lots of other stuff, I prefer to just try to restart oH a gazillion times until the serial port is working again. But it can literally take hours of trying in worst case. Maybe this alone is a reason for start using docker with oH.

This issue is the only real issue that I have with oH, but since I have not enough knowledge or skill to fix it, I accept it and will not complain - apart from it oH and the zwave binding is the best.
If someone would have a go at a direct binding solution, or a change of the provided serial library into something else, I would be happy to put 100% effort into testing the alternative.

I could be wrong, but I don’t think this should be a problem in the zwave binding. The binding doesn’t (directly) use nrjavaserial - as above, it’s manage through the OH core proxy.

I’ve seen some issues where (IIRC) nrjavaserial was opening ports to check if they exist, and depending on the timing, this can then cause the application (ie binding) to fail as the port was not available. I’m not sure if that problem still exists or not though.

I would also agree that it’s unlikely that the Aeotec is the problem - again - always keeping an open mind, but it has worked well for a long time.

Oh, no I am not suggesting that! I think (guess) also that the issue is in the oH supplied service. I always thought this was changed somehow in the early version 3 - either changed into the nrjava serial stuff from something else, or the nrjava serial where upgraded etc.

I also only have the zwave stick attached to this box, no other serial sticks etc to compare with.

That is not my understanding.
Full disclosure: I am not a java programmer. I can only give my interpretation of the forum posts I’ve read.

The issue with nrjavaserial is that it leaks lock files. How this manifests itself is the problem that Ardanedh and Cplant are having is that when they restart openHAB, their usb zwave dongles can not use the usb port because the port is blocked by a lock file. When nrjavaserial assigns a device to a particular usb port it creates a lock file. When it shuts down, or a device is unplugged and no longer using the port, the lock file is supposed to be destroyed or deleted, freeing up that port for further use. Because of the bug in the software, these lock files are not deleted and when a new device attempts to use the port, it can not because of the lock file. Over time, there are often many lock files created.
The script Ardanedh has placed in his init file deletes all the lock files before the container is started. Stopping the host that the container is running on will often delete the files as well. Again, this is just my understanding from reading the forum posts.
The one thread I linked above has posts from one of the developers of nrjavaserial in which he states how the lock files are created, how they are supposed to be destroyed and the portion of the code which does not seem to be working as it is supposed to. He goes on to explain what steps he has taken to fix the issue unsuccessfully and what steps he thinks may have to be taken to fix it.

Edit:
I dug around on git and found some commits to core concerning nrjavaserial. One by wouter recently here on Apr 8 which was merged. Should be in 3M
This is a fix for this Modbus issue which includes a very long discussion

OK, maybe I should rephrase; the Issue I am seeing (should really only speak about my own issues), is probably some kind of race condition. Yes, also in my setup lock files are created and sometimes not deleted as they should. But that is only part of the issue. I delete many lock files during my not so lucky days trying times starting oH. In my case, it is not as easy as making sure there’s no lock files before starting oH.
For me, stopping, starting or unplugging/replugging/changing the port in UI make it start after a while. But also, like I said, sometimes it just starts at the first go. I have not been able to see a pattern when it starts and when it does not.

The things @chris mentioned (if still around in nrjava) (the serial port testing by opening), could be part of the issue.

OK Micael, it actually sounds like the same issue is the possible problem for you as well. I have been digging thru git and found one very recent commit which was by Wouter.

Most importantly this fixes a file descriptor leak when checking lock dir permissions.

Please see my edit in my above post for links. Apparently core is running a patched version of nrjavaserial. As of Apr 8, there should be a fix. What versions is everyone running?

So to summarize, this is not a Zwave binding issue. Nor is it a Aeotec Gen5 stick issue. Modbus binding users are having problems as well. Please see this post by ssalonen in Mar 2021 which include links to other discussions concerning nrjavaserial

Please be aware of known issues with serial devices with openhab3 (regression in serial library used by openhab) Serial ports getting blocked after some re-connecting · Issue #1842 · openhab/openhab-core · GitHub . See also discussion in Modbus Binding not working on OH3 . I am not aware that this would be any better in 3.0.1 unfortunately

Edit:
OK I finally found the thread about running an alternate serial library. Wouter has written a patch and some users have used it to cure the problems they were having
https://community.openhab.org/t/oh3-x-alternative-java-serial-provider/128462

1 Like

Thanks for digging into this Andrew!
Any kind of poking in nrjavaserial brings up my hope for the new 3.3 version. :slight_smile: I am still on 3.1, but have decided to upgrade to 3.3 as soon as time permit.

1 Like

Just for the sake of completeness (or more like: as a documentation for myself when I bump into the problem next time, and until I’ve come across doing @Ardanedh’s init-file fix:

  1. Log into the openHAB docker container:
    docker exec -t -i openhab /bin/bash

  2. Access the respective folder with the lock file that shouldn’t be there
    cd /var/run/lock

  3. Delete the respective file that shouldn’t be there:
    rm -f LCK..ttyACM0

  4. Restart the openHAB container (in my case via Portainer)

  5. Done.

Worked 100% of the time for me, and is at least a bit more elegant than re-creating the entire container.

2 Likes