Z-wave devices going offline

I recently migrated from OH2 and a Vera Edge with the 1.x mios plugin to OH3 with Aeotec Gen5 z-stick direct attached.

Most is working well but I have a couple of devices that keep going offline which never had issues on my Vera. Both device types are sadly not z-wave plus and are somewhat unique so I wouldn’t like to replace them.

One is my watercop WV-01. If I go push a button on the valve, it immediately shows back up as online and is responsive to commands from OH. The valve is about 15’ open air in the same room as the z-stick so I know the signal is pretty good. My Vera was in the same location as the new z-stick. I have set OH to poll it every 10 minutes which I feel might have helped but it still happens.

My other is my Honeywell ZWStats. I have 3 of them and all 3 seem to randomly have the loss. Pulling them off the wall and putting them back corrects the issue for a few days. They are easily reachable from many other devices.

Any thoughts or suggestions on what I might do to correct this?

Example of one of my ZWStats (42):

The valve is currently offline which probably explains that all connections are “unidirectional” right now?(40):

Generally being marked as offline (I’m assuming the message under the node in the UI page is Node is not communicating with the controller) is when the device is has not communicated with the controller when the controller sent a message. I would have to say it is rare. Most posts in the forum seem to be about devices that are showing online but are not responding as expected (usually battery ones.)

The map is OH is a programming marvel, but is of little use for troubleshooting communication problems. With the size of your network, a zniffer is a good investment to figure out the actual paths. The only thing that seems odd is the lack of a controller. I have 46 nodes and the following map, but with the zniffer I know 34 nodes are in direct communication with the controller and no node is more than 2 hops away.

I do not think more frequent polling is a good idea.
Sorry I could not be of more help.

Bob

It’s funny you mention that lack of controller in the map. I have thought it odd that it doesn’t show device “1” or the controller but it has always been that way since I made this jump.

I’ll have to look into the zniffer but other than it showing me the real routes which probably don’t really matter that much in the grand scheme of things as they can change and I shouldn’t be messing with them…I’m not sure what it will do for me.

For all it’s faults, the vera was a solid z-wave controller and I rarely ever had trouble. Perhaps I was spoiled.

I purchased the aeotec z-stick several years ago and never used it. It seems to still be fairly modern and I flashed it to the latest firmware. Perhaps I should look at something else or is there no evidence it’s the culprit?

IMHO I do not think it is the controller or the devices. Do you have any duplicates or devices that you ignored after scanning? These could effect routing.

Also worth a try is to disable the nightly heal. Once you get a device back online, it set a new route back to the controller. The heal will partially scramble the routes again. My guess is there is a node that is not communicating reliably in the network.(doesn’t have to be the ones that are going offline).

Debug is also an option, but with this being sporadic, the debug log is going to be too big before something shows up as the problem.

Bob

Nothing has been ignored or duplicated. I just pulled the stick and compared the complete list of included devices against my spread sheet I keep and everything is correct. I’m using Sigma’s Z-Wave PC Controller software.

I turned on debug last night and sent it to a different log but as you mention, it’s probably already huge and I’m not sure where to begin with it. (yup, I have 8 17M logs in the last 10 hours.) I plugged one of the logs into the viewer and none of the nodes had any alerts or timeouts etc.

I sent on/off to my valve which was still offline. Logs definitely showed timeout. I pushed the mode button on the valve which caused it to check into the controller or something and then it worked as expected immediately. Sure points more at the valve as the issue but it used to work fine with Vera.

Another odd thing I haven’t figured out yet is I do see some associations in Sigma’s software between a few devices. I don’t know how those would have gotten setup or if that would cause any issues.

Sounds like you have your bases covered.

I would have to look more closely at the binding code (or have an expert weigh in), but I think once the binding determines the node is offline, it will not send messages, but if it gets a message from the node it will set as “online”. So I still do not think your test indicates a bad device, especially since it worked before.

Also don’t know if the PC controller version you have is part of Simplicity studio, if it is, you can do a network health check on your powered nodes.

Good luck Bob

I disabled the nightly heal but I woke up to one of the thermostats offline so that doesn’t seem to help.

I pulled down Simplicity studio and messed with it a bit but I couldn’t wrap my brain around it in the time I had available. I’ll try to read through the user guide this evening.

It feels like there should be some way to get logs out of the system when things go sideways that doesn’t require turning debug on for days on end then sifting through it.

I’m not really sure where else to go. All this stuff worked for years on my Vera.

I’d leave the heal off and see if the situation gets better over a few days.

Your diagram shows all mains powered devices. That’s great because it means they can participate in routing messages, but seems unusual with all the battery devices out there. Were any powered devices misclassified, really battery) (I don’t know how, but thought I’d ask). That would mess up communications.

Bob

Thanks for your reply.

I have 1 battery powered schlage lock. Everything else is mains powered. I have been under impressed with every battery powered z-wave device I have tried driving me to alternatives.

How do I tell if it thinks it is not battery powered? I don’t see anything in the properties that seems to indicate that. It has a battery channel. Is it just the lack of yellow ring in the map? Item 41 on the right there.

image

Yes it is just the yellow halo, so all is good. I knew it was unlikely, but do not have any other ideas at this point.

What I like about the sniffer is it is real time and does run not on my OH machine Rpi4. I do agree after my initial finds of problem devices, I do not use it much, just weekly to check if routings are still the same.

Bob

Sadly not getting better. Lost 2 of the 3 thermostats today and the valve. Network was very sluggish which is always the giveaway to go look at things.

I’ll buy a sniffer or something if there is a proper path to identifying a problem but I don’t see a means to an end here. These devices worked for years on Vera. They seemingly won’t work reliably on z-stick/oh3. Somethings not right and I just am at a loss on how to troubleshoot it. Vera is no longer an option given the EOL of v1 addons so I’m really stuck here. These devices are all on the supported list.

In my experience the odds strongly favor a network communication issue that could be uncovered with a Zniffer. However, the network health function of the PC controller might help. Here is an earlier version of mine (green means good communication)Zooz ZST10 network stable

However, one question I haven’t asked (and I don’t see in your posts) is what are you running OH3 on? There is a known issue with some early versions of the Aeotec Zstick with a Rpi4 that requires using a USB hub (doesn’t have to be powered) between the Rpi and the stick.

I don’t have that issue, but use a powered hub for the zstick anyway, so the zwave radio is always getting steady power, rather than rely on the Rpi power. I find that helps communication.

Lastly, you might want to review this post for other ideas.

Bob

That’s a good question. OH3 is a debian 11 VM running on esxi 7.0.2 with USB passthrough. The stick is on the end of a 4’ USB extension to bring it outside of the server rack for reception reasons.

I will read through that entire thread this evening, thanks!

So even if I were to get a zniffer and see a bad communication, what do you do, just manually change the routes and keep it from changing them back?

What I used it for was to ID (and replace) noisy devices (motion sensor reported every 8 seconds), adjust parameters on reporting (percent reporting of watts and lumens is very problematic), eliminate command polls, etc. to get frames down to about 10 a minute. Less traffic = less lags, less cancelled messages. I did use the PC Controller to set a few routes initially, but that was not very successful. I’m thinking there is a traffic issue behind what you are seeing, but it could be something else.

Bob

Did you buy the $400 development kit or get a uzb and flash the firmware? The uzb3 isn’t a US frequency that I see. uzb7 doesn’t say the frequency and I only see mention of flashing the uzb3 anyway.

Bought a static UZB3 -U from Digikey for about $40, and flashed.

Meant to share spreadsheet in last postZwave nodes12-2.pdf (230.8 KB)

Bob

I opened a ticket with Aeotech as I saw others on the forum reported decent support results. It was a bit hard to get going with them but after some painful back and forth they finally recommended downgrading firmware of the zstick to an older version which had the same zwave chipset that my old Vera had.

I have done so and cautiously holding my breath here, everything has been online for a couple of days.

Interesting ! Would have never thought of that. Hope it works.

Bob