Very slow ZWave network

Unfortunately, I am not getting the same thing as you:

There’s no Heal option in the stick page.

PS: The heal option appears on other devices, not on the stick’s page. Does it do the same thing as a heal of the whole network? Clicking on one device’s heal option does not seem to help anything.

PPS: Trying heal on the closest node to the NUC/ZWave stick seems to have awoken a bunch of actions on the stick and it seems to be flashing as expect during normal operations. Seeing some of the debug messages, I ended up unplugging a smart strip (node 55 in the logs) and so far the network is back to normal. Will post further updates.

1 Like

For what I just tested heal of the specified device cause heal just this one. My zwave network had the night heal network disabled for a long time (I forgot to reeneable) so I didn’t had neighbors in any of them. Now I have neighbors for this one healed device - 13.

image

I had hopes but they’ve been dashed.
Back in the ‘stuck mode’ where the stick light stays on each color for 30s or more.

I said on the device Thing, not on the zwave controller Thing.
You may either heal each individual device or wait for the nightly heal to heal them all at once.

…or change the heal time to something that is near in the future so it happens when you can have a look at the logs. Don’t forget to change it back to a time when network traffic is low as the healing process produces a lot of traffic on top of the normal traffic.

I do not know what the “Synchronize network” option does but tried that in my setup just now (I am on OH3). Before that I had one node that was marked in red. It is a node that notoriously has communication errors. After the “synchronisation” this node was green again and the network map had changed a bit: basically the angle of view was different from what I could see, but that seems to change every time when the map is shown as far as I can tell now after checking it several times. Never had a look at it in the testing phase of OH3 :wink:

It synchronises the network state with the SUC.

1 Like

Thanks sihui. Then the one node coming back online was just coincidence :slight_smile:

The heal at 2AM happened and produced a lot of output in the log, instead of just a one liner that I had seen before. Of course it seems to be due to the fact that I put the ZWave into debug logging.
Whatever the cause, it seems that now my network is behaving normally. No more dead pauses.
So for now I’ll consider the problem solved unless it goes bad again soon. In that case I’ll report back and try to figure out what could be the issue.

2 Likes

Turn off DEBUG logging to avoid huge log files. Set it back to INFO.

Good advice :slight_smile:

On another note, I do now get a network map. (very crowded):

2 Likes

Unfortunately, after 36h of flawless work, my ZWave network is back in limbo… :cry:
@chris do you have any pointers on how to address this?

PS: It’s periodic again. Works for a while, gets super slow for a while. The percentages are more on the side of working or so it seems, but still a real nuisance.

In that case I suggest to attach a longer debug log.

@stefan.oh You may have missed the one I posted two days ago in response to your request. Does it not contain enough?
I can redo another one with the latest situation (two nodes were taken out of the network)

The log from 2 days ago would only be a valid starting point if no changes were made to the setup. But I was under the impression that there were changes. That was the reason for me to ask for a more recent log.

I’ll turn on the debugging and will post a log once it is long enough

@stefan.oh Here is as long a log as the 1MB upload limit will permit.oh.log (1004.9 KB)

Splendid :+1: The log covers app. 9h of time, that should give us an idea of what is happening.
I’ve uploaded the log into the Z-Wave Log Viewer that Chris provides on his website.
Looking at the Filter option I count only 27 nodes that take part in the communication. Your network map shows way more than these 27 nodes. Also in the network map I see #76 as the node with the highest number, but in the filter view of the log viewer there is a node #255. Filtering for this node does not give any hints, maybe this is just data garbage that cannot be associated with a single node.

Filtering randomly for one node, #61, shows some traffic. Temperature is reported, but it seems that you took care to avoid traffic: only changes of at least 0,5 degF are reported, or multiples of that. That’s good, avoiding unnecessary traffic is one of the top priorities in a network with limited bandwith but a lot of participating communication partners.

And to the next random node, #28. Only ONE message is shown. Either it is a node that has not much to report or it has no chance to do so. I cannot tell.

Node #8 is similar, only 2 points in time it is seen in the logs.

On to node #75. Now it gets interesting. There are several messages “DeleteSUCReturnRoute”. Is there more than one controller in your setup? What are the settings for your controller(s)?

Node #48: Only one point in time this node communicates. But it is shown that it took over 20 seconds to receive an acknoledgement from the device. Only 3 other nodes were communicating at 16:41:xx. That does not look like congestion of the network.

Node #55: Between 7:54 and 9:39 the node does not report anything, there are messages containing “DeleteSUC ReturnRoute”. But at 16:22 it starts reporting kWh, beginning every two minutes, then it becomes more erratic.

I cannot pinpoint something (to me) obvious. But I’m curious why a lot of nodes shown in your network view are not seen in the logfile.
Looking at the file without the logviewer at Chris’s website I see messages like

2020-12-27 08:04:32.099 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: SECURITY not supported
2020-12-27 08:04:32.099 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Command Class COMMAND_CLASS_SWITCH_MULTILEVEL is NOT required to be secured
2020-12-27 08:04:32.102 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 29: SECURITY not supported

Are your devices included with security enabled? That puts a strain on all the processing, in the devices as well as on the controller. Maybe that is a hint.
Did you check the load on the system where OH is running?

2 Likes

This morning, I discovered one thing that may have caused the network to relapse.
I had unplugged a ‘smart’ power strip. It’s node #55 that you did spot. When I moved thing around before the trouble started, a printer was unplugged from that strip and another one plugged in. So I thought it could have been a contributing factor in the network problem. After a heal everything went well.
Yesterday night, I discovered that my spouse had plugged that strip back in… So that seems an indication that it might be a trigger of the trouble.
I unplugged it again and started a heal 43 minutes ago. The network is not back to normal yet. The heal takes a long time.

node75 is not something I see configured nor is it in the inbox.
node8 is a relay switch.
node28 is a dimmer and is physically the closest to the NUC/Zwave controller stick
node23 and node29 are dimmers. I have no idea why they show up with security.
I do have two door locks that have ZWave but I never paired them, so they should not appear in the log.
I have a memory that node255 represents the controller itself. I got this impression from some tinkering a long time ago and could be wrong.

As a side question. Issues of slowness have been dogging me throughout the use of OH/Zwave for about 5 years now. I’ve added Zigbee devices but have much fewer of those than ZWave (4 lightbulbs and 2 switches). Is zigbee also prone to issues and I am just lucky, or is it a more reliable network/protocol?

The system running OH has very low load. Its only use is to run OH.

@chris is our expert developer for both those bindings so I will let him comment.

The controller is usually #1 as far as I have seen it in the past. In your network map you will notice it is painted a bit bigger than the other nodes and has an orange ring.

I am a bit confused about the network map you posted on the one side and other information on the other side: the log does not show traffic for a lot of nodes printed in the network map. But the log shows entries for a node #75 that is NOT shown in the picture you posted. Is it one and the same installation we are looking at?