Zwave handling very, very slow after updating 2.1->2.5.2

Hi all. I’m in the process of updating an old installation (2.1) to the current 2.5.2 stable.

  • Debian 10 on an Intel NUC. AEON ZWave stick.
  • Have tried both a clean apt-based install as well as a manual 2.1->2.5 update.

I see the post by @crazyves ([SOLVED] Zwave Network very slow or not working after Update to 2.5) that looks very similar to what I’m seeing, including the pattern with the CANs in post [SOLVED] Zwave Network very slow or not working after Update to 2.5. I’ll upload my own debug logs momentarily.

In my experience, the commands do eventually go through, but it’s after 10 minutes (yes, almost exactly 600 seconds, reproducibly) to take effect.

When I go back to my previous 2.1 install, the delay goes back to the ‘normal’ 5-10 second average, which seems to be different that what was seen in the other thread… (Note that the 2.1 install is using org.openhab.binding.zwave-2.1.0-SNAPSHOT.jar due to needing the ZWave security support introduced then, and ran stably for several years!)

Also, note that I have 2 ZWave sticks (both Aeon); one is used for the ‘normal’ 50+ node ‘main’ network, while the second is used for only 2 lock devices running security. I found that the security binding could not complete in the ‘busy’ network, but running the two sticks side-by-side worked well for several years. For debug purposes, the second (‘secure’) stick is not in the system (and was never in the system for the ‘clean’ install), but both behave the same on both 2.1 and 2.5, so I don’t think it’s the stick.

Beyond the debug logs, is there anything else I can provide that will help debug the reason for the extreme lag?

Edit to add: Going back in my notes, I see that about a year ago I tried to migrate to 2.4, but had the same problem. At the time there was no compelling reason for me to use 2.4, so since I was also fighting some other problems at the time, rather than fight it at the time I just went back to 2.1. The point: I was seeing high ZWave latency back on 2.4 as well.

Have you checked with ‘top’ if the java (OH2) process, or any other, is hogging the CPU?

What addons do you have installed? There have been many changes between 2.1 & 2.5.
Any invalid addons configured in 2.5 cause all addons to reload every minute trying to load the invalid ones.

I have (based on the previous thread), and it’s not… system is basically idle. There is a noticeable spike on openhab boot and zwave enumeration, but after that dies down cpu usage is ~3%. There isn’t any reason I can see that the OS or USB device would be waiting on any CPU resources.

With that much of a version jump you should delete (NOT Exclude) all your device Things and re-discover to get the new binding settings. They will keep the same Thing IDs so your Items and rules will still work.

When I did the migration from 2.1, I had weather, nest, tesla, astro, and a few other bindings.

When I did a clean install, though, only the zwave binding was loaded. It was a very clean install; didn’t touch the files at all, just used paperui to install the zwave binding, and saw the same behavior.

Currently I’m back working with the migrated old install (cuz I have lots and lots of devices, along with rules and other things, and who wants to re-enter that?), but for debug purposes I can easily switch to a ‘clean’ install.

I assume you do not have any zombie or ghost nodes in your network to mess up routing. Let’s see if Chris has any suggestions tomorrow. It is night in the UK now.

I second that.

Thanks for your help, Bruce.

This ghost-node thing is what I’m most worried about, as this hardware is several years old, and has had a large variety of devices run on it over time. There are several devices that have failed or been removed over time… I did remove all the ‘not responding’ and ‘failed’ nodes that the binding discovers (based on threads such as Zwave ghost devices), but honestly I’m not convinced that everything is totally happy and consistent, and I can’t find better details on how to debug much deeper into figuring out the node-level health.

What I really don’t want to do is the obvious step of excluding all the devices, hard resetting the controller (or just buying a new one to be sure…) re-adding everything (including several nodes that are physically very difficult to access)… and then find out that the problem still exists. :frowning:

Will wait to see if Chris has any thoughts later…

do you have a windows pc? search this forum for PC controller software to check your network’s health

1 Like

If you jumped from 2.1 to 2.5 you have to delete all of the z-wave things as so much code was rewritten around 2.4.

I have made a jump from 2.3 to 2.5 recently and it took some time for the zwave network to settle down and response rates to return to their previous timings.

I have a VM I can use on a mac to run Windows stuff… but can you point to a specific zwave health tool? Are you referring to the Zensys tools? I’ve played with those, used them to remove ghost nodes, etc. When using the tools, the zwave devices respond instantly, unlike the openhab 2.5 binding. (Response time using zensys is <1s. Response time with openhab 2.1 is ~5-10 seconds. Response time with openhab 2.5 is 600 seconds.)

I did a completely clean install of 2.5.2, and still saw the problem. So, no old .things/.items/.cfg/etc files hanging around. (Clean install refers to Openhab; I did not exclude/reset controller/include all the devices.)

That is not normal so you are starting with an issue.

That is confirmed by the fact your secure locks would not work and considering the extra packets to negotiate security I am not surprised at that latency.

I consider normal as mostly sub 1 second and majority perceived as instant and that is with 150 nodes.

How many nodes do you have? You say a lot so default config on some of them would not be wise.

Secondly read this person’s experience.

https://forum.fibaro.com/topic/49177-is-manual-routing-possible/#comment-203474

He had an issue with locks and assumed his issue was routing as you have assumed it is the binding.

In the end he had a single node spamming the network.

Your issue is probably similar.

A zniffer trace

this was the outcome from the other person that thought his secure locks had an issue.

Actually it was quite trivial. Turns out one of my FGS-223s went rogue and sent power reports every second effectively spamming the network. And it just happens that it’s situated right in the middle of the apartment and a lot of traffic is supposed to be routed via it. Basically just didn’t let through the wake-up beam, so the lock just couldn’t wake up when it was needed. After I fixed the rogue parameters of the FGS, HC2 almost instantly found a good route to the lock and now it works great! Hail to the king zniffer! ![:slight_smile:

Hi, that is exactly what i did not want to do aswell when i had this problem, but in the end it was the only thing that fixed it.
What you can do, and also what i did, is, use the Aeon backup tool to do a backup of your stick, then reset it and exclude/reinclude only a couple of nodes. Then see if that works, if not, you can rewrite the backup to the stick and only have to reinclude a couple of nodes to the network again.

From my experience with this problem and from what i’ve read so far, i think the only solution is to reset and rebuild the zwave network, unfortunately.
Just make sure it really isn’t the addon problem described by Bruce from the 2.5 update notes.

Sidenote: After my reset/rebuilding of the zwave network, it now works way better than before. It took some days (overnight heals i guess) but my network is now better than ever.

Greetings
Yves

1 Like

Frankly there’s no reason why a network (ZWave nodes to remain unchanged !) should exhibit a that vastly different behavior just when you exchange the binding.
Did you truely delete and manually recreate or autodiscover all of your ZWave things after you upgraded ? Did you do it again when you went back to your old software?
Files are not the only source of config data.
Autodiscovered and GUI input data are stored in JSONDB, NOT in .things/.items.
The need to recreate things came along as a breaking change that was introduced in 2.4.
Going 2.1 to 2.5 means a HUGE bunch of changes.
I’m sure you have read all the release notes of every release that you jumped over ?

I also wouldn’t be sure that to run multiple sticks is not an issue here.
Did you recreate the controller things, too ?

That is usually not needed but if I did that I would record the Thin ID so I could set it back to the same value. That ID affects the other devices on the network.

If he upgraded from 2.1 to 2.5 it MUST be done…the 2.4 version of the ZWave binding was a major rebuild and all Zwave items and things had to be deleted and recreated.

Including the controller? That is the first I heard of that,