Attached Hardware: Amber Wireless AMB8465 (to read out Wireless M-Bus from water & heat meter), Aeotec Z-Wave Gen5+ (to read out power meter)
Docker / Portainer 2.14.0
openHAB Software: 3.3.0 Release Build
openHAB Bindings: Homematic Binding (logging the heating, controlling lights and blinds), Gardena Binding for Gardena Gateway / smart irrigation control (logging soil humidity), Alexa Binding (controlling the Homematic lights)
Homematic IP Hardware: CCU2 (2.59.7), multiple Homematic IP devices
Problem description:
From day 1 onwards, my Z-Wave Stick Gen5+ stays offline sometimes after Raspi Reboot or Container restart, with the error message âController is offlineâ:
2022-06-29 15:27:08.643 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:35e12a8479' changed from UNINITIALIZED (DISABLED) to INITIALIZING
2022-06-29 15:27:08.655 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:35e12a8479' changed from INITIALIZING to OFFLINE (BRIDGE_OFFLINE): Controller is offline
==> /var/log/openhab/openhab.log <==
2022-06-29 15:27:08.590 [INFO ] [zwave.handler.ZWaveControllerHandler] - Attempting to add listener when controller is null
2022-06-29 15:27:13.659 [DEBUG] [ort.serial.internal.RxTxPortProvider] - No SerialPortIdentifier found for: /dev/ttyACM0
In Portainer, the log contains the following entry (when the stick works), which looks odd to me? RXTX Warning: Removing stale lock file. /var/lock/LCK..ttyACM0
I did read through all the threads on this topic I could findâŠ
The âmost commonâ problem with the Aeotec Z-Stick Gen5 is with the first two revisions in combination with Raspberry Pi 4. Since youâre using the Gen5+ stick, it should already have the (hardware) fix, and since youâre using a RPi 3 it should be a problem in the first place. So, I think itâs pretty safe to say that you havenât got this particular issue.
I recently had the RPi 4/Gen5 stick problem, and while it wasnât detected most of the time, I did manage to âseeâ it at least once. So, I know that hardware can cause it to work intermittently. Thatâs all I have to add, unfortunately. I would try to check if it appears and disappears from ls /dev/tty* when it goes offline in openHAB - it could be a clue to whether the problem is with openHAB or somewhere else.
The Z-Wave USB Stick is at all times (even when not recognized by openHAB) listed as ttyACM0 under ls /dev/. I therefore also believe itâs not a matter of âis-the-device-recognized-by-the-OS-on-a-low-levelâ (which appears to be the case, there are also plenty of good-looking entries in the syslog:
Jul 2 20:14:34 raspberrypi kernel: [ 4206.367128] usb 1-1.3: USB disconnect, device number 5
Jul 2 20:14:39 raspberrypi kernel: [ 4211.784142] usb 1-1.3: new full-speed USB device number 7 using dwc_otg
Jul 2 20:14:39 raspberrypi kernel: [ 4211.917443] usb 1-1.3: New USB device found, idVendor=0658, idProduct=0200, bcdDevice= 0.00
Jul 2 20:14:39 raspberrypi kernel: [ 4211.917485] usb 1-1.3: New USB device strings: Mfr=0, Product=0, SerialNumber=0
Jul 2 20:14:39 raspberrypi kernel: [ 4211.919956] cdc_acm 1-1.3:1.0: ttyACM0: USB ACM device
When itâs correctly loading after container restart / Raspi reboot, the log looks the following:
2022-07-03 13:53:08.359 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller
2022-07-03 13:53:08.362 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.
==> /var/log/openhab/events.log <==
2022-07-03 13:53:20.498 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:serial_zstick:35e12a8479' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2022-07-03 13:53:20.518 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:35e12a8479:node5' changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
2022-07-03 13:53:21.154 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:35e12a8479:node5' changed from ONLINE to ONLINE: Node initialising: REQUEST_NIF
2022-07-03 13:53:22.439 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'zwave:device:35e12a8479:node5' changed from ONLINE: Node initialising: REQUEST_NIF to ONLINE
I also believe it does not have to do with the USB-problem reported under the threads above. If this would be the case, the syslog-entries would look different.
Hypothesis:
I believe it does have to do with something Iâd describe as ârouting-the-serial-connection-through to openHABâ.
Alternatively it might have to do with what I found in the syslog? dockerd[564]: time="2022-05-27T09:45:13.620500774+02:00" level=warning msg="path in container /dev/ttyACM0 already exists in privileged mode" container=9410337d32adb2f10d4a49180971c7fafaea26cd9864dbb59d201d03919c8744
Btw: If anyone has an idea on how to test the stick (when not being recognized by openHAB in Docker) in Raspberry OS, thatâd be great. Because if it is working in Raspberry OS, it would mean it has to do something with the ârouting to openHABâ.
Update: I found a workaround which always (reproducibly!) solves the problem, though the workaround confirms my thought that there has to be a neat solution to the problem: Every time when I just re-create the entire stack in docker (without changing anything) the Z-Wave stick gets recognized again.
I read of various workarounds (e.g. here and here), but the best option for a semi-experienced User like me is maybe just wait and deal with it and, in the meantime, just re-create the openHAB container every time the problem occurs, correct?
Iâm pretty sure this could be the case. Itâs been a recurring problem Iâve seen pop up in the forum numerous times over time.
There is another binding that also used a serial connection and the author of that binding made a change to the serial library used in the binding and it cured the problem. I think it is the Modbus binding maintained by ssalonen ???
Dig a little if you want more info. Search nrjavaserial on this forumâs search and you will see a pile of threads about it. The author of the zwave binding has been super busy with other stuff and may not have the time to change out the serial library the binding uses. Maybe someone else can contribute a fix.
The second thread I linked in that post is not to a openHAB issue, it is an issue raised on the NRJavaserial git and the authors of the library are explaining they canât figure out where the file leak is coming from and as such, a fix for NRJavaserial may not be imminent.
Oddly, this problem only seems to effect users of the Aeotec stick. Iâve recently saw a thread in the forum stating there are 4 versions of this stick and a link to another forum with details and even a hardware fix that involved adding a jumper or some such
Well, I have had this problem for a long time now (whenever the change was), and it is really annoying. But like you say, sometimes it just works. I avoid to reboot the server that hosts oh at all cost, because of this. Sometimes it takes me hours of tedious tries to get the controller online again.
Having said that, last time I rebooted, oh zwave started to work at first go, which meant serious happiness and joy for the rest of the day. Wine bottle opened and instant cheering!
OK, I found the link to forum post on home assistant community forum about Aeotec Gen5 sticks and a lot of info about versions of this stick and why some work and a very fiddily looking hardware fix⊠check it out
Wild! From what I can tell looking a the link, this is a different problem (which I also read about), related to the Gen5 and USB detection, which results in the stick not working âat allâ (USB) instead of âonly sometimesâ (like in my case, probably related to the NRJavaserial), correct? Discussions on this were the reason why I paid attention to getting the Gen5+ (it even says so on my stick).
Hahahaha! Same for me, when I rebooted my machine earlier on, because I thought I could reproduce the problem to not work, to test out another fix-idea I had. Not sure if youâre running on docker, but in my case re-deploying the container always helps, maybe also for you.
Aha, no, unfortunately I run oh as a ânativeâ service on an gentoo/Asus PN50 box. But I have, for a while, been thinking about running oh on/in docker instead. Maybe next upgrade to 3.3 is a good time to actually move over to docker.
Thanks for reminding me!
That is correct
I added the link because there is firmware version info on the different versions and how to figure out which version you may have for future reference
the nrjavaserial issue is kind of an edge case and so has not been addressed. In your case Iâm guessing because of running in a docker??? Since it is a Pi, have you considered not running docker and maybe just flash a card with openhabian and see if problem goes away
Iâm wondering, if the Aeotec Z-Wave Stick (incependent from Gen5 or Gen 5+) causes so many problems: Is there a reliable alternative around? Or is it that every USB stick will have the same problems?
Interestingly, with my Amber Wireless AMB8465 Iâm not experiencing these problems, but then itâs also not running in the openHAB container but in a separate wmbusmeters container, with 100% reliability so far.
Apart from this annoying Aeotec Z-Wave-nonsense, Iâm really happy with my docker setup. I wrote myself a short list of instructions on how to come from âempty SD cardâ to âfully restored system based on automatic daily updatesâ in < 30 minutes, in case my SD card breaks or I break my system beyond repair. I did a fire drill one month back and it worked pretty well. Let me know if thereâs something I can share on that end.
Funny that you say that âOH on dockerâ is an edge case. I thought itâs the most predominant setup here in this forum.
Testing things with openHABian could be an option, thanks for the hint. However, Iâm running lot of stuff on docker (wmbusmeters for heating and water meter readouts, influxdb instead of RRD4J, mosquitto for Nous plug readings, duplicati for automatic backup) and âtesting it for a couple of weeks insteadâ would unfortunately require lots and lots of upfront work.
Iâve had problems with my Aeotec Stick (recent model) under a docker environment as well. The main problem was a stale lock file, preventing the stick from getting accessible.
And the poor manâs solution was adding an init file under /etc/cont-init.d/10_remove_zwave_lock:
#!/bin/bash -ex
ZWAVE_LOCK="/var/run/lock/LCK..zwave"
if [ -f "$ZWAVE_LOCK" ]; then
echo "Removing stale ZWave lock file $ZWAVE_LOCK..."
rm -f ${ZWAVE_LOCK}
fi
Problem solved for me, the lock file is getting removed at each start of the container.
Thanks! Will try this out. Also a good opportunity to learn more on unit files.
Thanks! Will go down this path if the frequent âofflinesâ will become too annoying or if I donât manage to get @Ardanedhâs solution to work.
Sry, I did understand your post correctly in the first place. I was only surprised to read that, since the Gen5+ Stick is pretty standard for Z-Wave (which itself is pretty standard as a wireless protocol) running on docker (which I understood also many people use), is an edge case in itself. But then again, if âstandardâ means only ~20% per case, the entire chain results in 0,2 * 0,2 * 0,2 = ~1% of all installations, hence not more noise on that particular issue in the forum.
What I am ultimately trying to do is draw enough attention to the nrjavaserial issue that hopefully someone with more java programing ability then myself contributes a fix perhaps as was done for the modbus binding. My linked post above is from July 2021 and the original issue was discovered quite prior. Obviously it is still tripping up the lucky few.
Keep in mind that you canât really draw conclusions about âwhat most people are usingâ from discussions in this community, because we rarely hear from people who donât have any issues. Itâs perhaps more accurate to say that discussion here is a reflection of âwhat people are struggling withâ.
Anecdotally, I would say that there are more people talking about the Aeotec Gen5 Z-Wave stick (in all of its variations) than any other controller. I wouldnât be surprised if it was both the most used device, and the one that causes the most problems for users.
Also anecdotally, I would guess that there are more instances of openHABian than any other setup. Both because openHABian is the logical starting point for new users (particularly those with less technical skill) and because a lot of intermediate/advanced users are comfortable dedicating an RPi to openHAB to keep things simple. However, that might have changed in the past two years due to the shortage of RPis.
Yes, Zooz gave their new device an even longer name and the exact same âZST10â model number. No, Zooz doesnât really understand how marketing works. Solid controller, though.
Just to note that the binding uses what ever serial library openHAB core provides (through org.openhab.core.io.transport.serial) - itâs not something the binding can change unless we move away from using the OH provided services and directly link a serial library (which was frowned upon in the past).
Personally Iâve stopped using nrjavaserial for other projects as itâs just too much hassle and causes too many problems.
So then there appear to be two good hardware alternatives available (GoControl and Zooz) which have both proven to be working reliably with @Andrew_Rowe and @rpwong respectively, in case @Ardanedhâs workaround for users of the GEN5+ doesnât work (which it should) or @Andrew_Rowe not being successful in raising enough attention for the NRJavaserial fix.
I have to say I love this project and this forum! Thanks a lot to all of you!
Well, from my side, I can say that before some early version 3.x (I have forgotten exactly when this nrjavaserial stuff changed), I never had issues with my serial ports for zwave. I guess it is some kind of race condition.
I would be surprised if Aeotec Gen5 is part of the problem here, like @rpwong says, it is probably the most common stick, but what do I know - it could of course be a combination of the computer and the stick.
Removing the lock file is needed, if it is left, but that is (at least in my case) not the full solution, at least not if I just restart oH. Maybe if I restart the full machine. But since I run oH on a server doing lots of other stuff, I prefer to just try to restart oH a gazillion times until the serial port is working again. But it can literally take hours of trying in worst case. Maybe this alone is a reason for start using docker with oH.
This issue is the only real issue that I have with oH, but since I have not enough knowledge or skill to fix it, I accept it and will not complain - apart from it oH and the zwave binding is the best.
If someone would have a go at a direct binding solution, or a change of the provided serial library into something else, I would be happy to put 100% effort into testing the alternative.