Very large ZWave Network (130+ nodes)

Hi All,

Seeing alot of nodes not talking to the controller, and the map shows nodes yet not forming neighbors.
There doesnt appear to be any major issues in usability, but something isn’t right.

ZWave Binding - Latest
Aeotec Z Stick
OH2 version - 2.4 Stable
Server spec: Intel i3 8100, 16GB Ram, SSD disk, on gigabit ethernet

The console is full of these:


09:40:06.433 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node121' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:16.606 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node63' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:17.742 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node140' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:19.579 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node143' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:24.295 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node68' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller

The map shows this after a few days. All of the polling of the devices has been turned off to save zwave bandwidth.

Thanks

2 Likes

Is the network running the daily network heal?
Are the problem devices battery operated?
@5iver @sihui and @chris may have some ideas bu these answers would be helpful.

HI Bruce the entire network is mains powered and yes the heal is on.

1 Like

I know some of my battery powered sensors sometimes hang during the heal.
What version of the binding are you using? There have been a lot of changes moving from 2.4 to 2.5M2

Im using the latest binding.

Kris
I know you’ve been here for awhile but bloody hell what platform are we talking about here?!? what version of OH? what version of the zwave binding?? Sounds like things are overloaded, this isn’t a Pi is it?

1 Like

Updated my original post. The server isnt overloaded at all , CPU is at 1%

ok so just the zwave controller is overloaded which is what controller again???
sorry for not knowing all this intuitively

How can the controller be overloaded? Its designed as far as I’m aware to accept up to 255 nodes?

Aeotec Z Stick

Ill put that into the post too :stuck_out_tongue:

Potentially I need additional controllers but my experience with OH2 and using multiple controllers has been its never really worked correctly.

1 Like

@dastrix80, see if this may be relevant…

I currently have 124 devices…

openhab> smarthome:things list |grep zwave| wc -l
124

… and cannot run a heal without the network becoming completely unusable. I’m using S1665, but the version of OH may not be a factor. A network with bad/outdated routes definitely aggravates this. I suggest disabling the daily heal and restarting OH to clear out the hung threads. If you need to, manually heal your devices. Some debug logs would help to confirm if you are experiencing the same thing that I have been reporting. This only appears to effect large networks.

1 Like

Hi Scott thats very very useful! Thank you. I will disable the heal tonight and restart OH2 and see what comes back.

It doesnt ‘appear’ to effect usability as it all seems to operate with good speed so it’s a bit of a strange one.

How does one manually heal a powered device like a Fibaro Dimmer ?

Ill look through that article.

Hi Scott,
What would be the critical number of nodes that you would consider “large”, and should not have a daily heal?

Hi Scott,

Dont think im running into this issue, my commands still work, its just nodes are going offline constantly.

This does not sound like you are describing the same issue. When I run the daily heal, commands are delayed for multiple minutes or never get through and I often get queue full errors.

Habmin> Configuration> Things> select your Thing> Tools> Show Advanced Settings> Heal the device

image

I’m only guessing that there is some correlation with the number of nodes due to the lack of reports. I don’t know the cause and I’ve been busy with other things, so haven’t been bugging our very busy zwave maintainer about it. The workaround is the same as what is needed for this issue, which may be different and effects most (all?) with battery powered devices. Since people are turning off the daily heal to get around the other issue, we won’t really know until that one is resolved. Hopefully, the fix takes care of both.

If your zwave devices stop functioning after a daily heal, turn it off. You can easily test this by setting the heal time (configured in the controller Thing) to the current hour, which appears to start the heal immediately.

The controller can easily get overloaded if the network is busy - I see this quite often.

No - 232 nodes IFRC. But that’s not really the issue - it can probably get overloaded with just a few badly operating, or badly configured devices.

I’ve got to ask - have you actually checked the debug logs to see what is happening?

Hi Chris

no nodes are dead. I will get the debug onto it tonight and see what I can see. But theres an overwhelming amount of traffic so its like finding a needle in a hay stack.

We have approx 121 nodes right now, but we are adding more to about 140 in total.

I’m not sure what you are referring to with this statement? I didn’t mean to insinuate that nodes were dead - my point above was really if they are badly configured, or operating badly they can overload the network. I do see this quite often. Have you checked this?

This might be your problem then? You need to ensure that the system is managed properly and if you’re flooding the network, then things will definately fall over.

I have an online log viewer that I’d suggest you use - it makes it easier to find the needles!

https://www.cd-jackson.com/index.php/openhab/zwave-log-viewer

If you’re not looking at the logs, then really you should do this before asking for help as you are in the best position to work out your problems. The logs provide you with a lot of useful information and those trying to help you don’t have access to this so we’re just guessing.

Hi Chris

When you say badly configured, how can you configure a node badly?

Regards

For example -:

  • Setting the associations incorrectly - eg setting the controller into more association groups than is necessary. This is a common problem - people add the controller into lots of groups in a random attempt to get things working, and this will cause duplication/triplication/quadlication (:wink:) of messages being sent on the network depending on the number of groups.
  • Setting devices such as metering devices to report more often than is necessary, or with too small a delta. This can cause some devices to send multiple updates per second - the controller can probably handle one or two of these, but dozens, or hundreds will quickly swamp the network.
1 Like

Thanks Chris, understood. I believe the associations are correct, they are set only for Group 1 or for all 3 button presses with a roller shutter.

I’ve attached a log, which on the log viewer shows alot of messages relating to ‘update neighbor failed’

zwave.txt (98.2 KB)

Debug seems to indicate a neighbor update request, after a few fails, it marks the thing offline, then it comes back online again.

1 Like