Very large ZWave Network (130+ nodes)

dastrix80 · August 25, 2019, 11:53pm

Hi All,

Seeing alot of nodes not talking to the controller, and the map shows nodes yet not forming neighbors.
There doesnt appear to be any major issues in usability, but something isn’t right.

ZWave Binding - Latest
Aeotec Z Stick
OH2 version - 2.4 Stable
Server spec: Intel i3 8100, 16GB Ram, SSD disk, on gigabit ethernet

The console is full of these:


09:40:06.433 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node121' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:16.606 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node63' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:17.742 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node140' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:19.579 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node143' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:24.295 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node68' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller

The map shows this after a few days. All of the polling of the devices has been turned off to save zwave bandwidth.

Thanks

Bruce_Osborne · August 25, 2019, 11:57pm

Is the network running the daily network heal?
Are the problem devices battery operated?
@5iver @sihui and @chris may have some ideas bu these answers would be helpful.

dastrix80 · August 26, 2019, 12:16am

HI Bruce the entire network is mains powered and yes the heal is on.

Bruce_Osborne · August 26, 2019, 12:55am

I know some of my battery powered sensors sometimes hang during the heal.
What version of the binding are you using? There have been a lot of changes moving from 2.4 to 2.5M2

dastrix80 · August 26, 2019, 12:57am

Im using the latest binding.

Andrew_Rowe · August 26, 2019, 1:23am

Kris
I know you’ve been here for awhile but bloody hell what platform are we talking about here?!? what version of OH? what version of the zwave binding?? Sounds like things are overloaded, this isn’t a Pi is it?

dastrix80 · August 26, 2019, 1:33am

Updated my original post. The server isnt overloaded at all , CPU is at 1%

Andrew_Rowe · August 26, 2019, 1:39am

ok so just the zwave controller is overloaded which is what controller again???
sorry for not knowing all this intuitively

dastrix80 · August 26, 2019, 1:40am

How can the controller be overloaded? Its designed as far as I’m aware to accept up to 255 nodes?

Aeotec Z Stick

Ill put that into the post too

Potentially I need additional controllers but my experience with OH2 and using multiple controllers has been its never really worked correctly.

5iver · August 26, 2019, 3:06am

@dastrix80, see if this may be relevant…

I currently have 124 devices…

openhab> smarthome:things list |grep zwave| wc -l
124

… and cannot run a heal without the network becoming completely unusable. I’m using S1665, but the version of OH may not be a factor. A network with bad/outdated routes definitely aggravates this. I suggest disabling the daily heal and restarting OH to clear out the hung threads. If you need to, manually heal your devices. Some debug logs would help to confirm if you are experiencing the same thing that I have been reporting. This only appears to effect large networks.

dastrix80 · August 26, 2019, 4:09am

Hi Scott thats very very useful! Thank you. I will disable the heal tonight and restart OH2 and see what comes back.

It doesnt ‘appear’ to effect usability as it all seems to operate with good speed so it’s a bit of a strange one.

How does one manually heal a powered device like a Fibaro Dimmer ?

Ill look through that article.

epicurean · August 26, 2019, 4:19am

Hi Scott,
What would be the critical number of nodes that you would consider “large”, and should not have a daily heal?

dastrix80 · August 26, 2019, 7:15am

Hi Scott,

Dont think im running into this issue, my commands still work, its just nodes are going offline constantly.

5iver · August 26, 2019, 7:15am

This does not sound like you are describing the same issue. When I run the daily heal, commands are delayed for multiple minutes or never get through and I often get queue full errors.

Habmin> Configuration> Things> select your Thing> Tools> Show Advanced Settings> Heal the device

I’m only guessing that there is some correlation with the number of nodes due to the lack of reports. I don’t know the cause and I’ve been busy with other things, so haven’t been bugging our very busy zwave maintainer about it. The workaround is the same as what is needed for this issue, which may be different and effects most (all?) with battery powered devices. Since people are turning off the daily heal to get around the other issue, we won’t really know until that one is resolved. Hopefully, the fix takes care of both.

If your zwave devices stop functioning after a daily heal, turn it off. You can easily test this by setting the heal time (configured in the controller Thing) to the current hour, which appears to start the heal immediately.

chris · August 26, 2019, 7:30am

The controller can easily get overloaded if the network is busy - I see this quite often.

No - 232 nodes IFRC. But that’s not really the issue - it can probably get overloaded with just a few badly operating, or badly configured devices.

I’ve got to ask - have you actually checked the debug logs to see what is happening?

dastrix80 · August 26, 2019, 8:02am

Hi Chris

no nodes are dead. I will get the debug onto it tonight and see what I can see. But theres an overwhelming amount of traffic so its like finding a needle in a hay stack.

We have approx 121 nodes right now, but we are adding more to about 140 in total.

chris · August 26, 2019, 8:08am

I’m not sure what you are referring to with this statement? I didn’t mean to insinuate that nodes were dead - my point above was really if they are badly configured, or operating badly they can overload the network. I do see this quite often. Have you checked this?

This might be your problem then? You need to ensure that the system is managed properly and if you’re flooding the network, then things will definately fall over.

I have an online log viewer that I’d suggest you use - it makes it easier to find the needles!

https://www.cd-jackson.com/index.php/openhab/zwave-log-viewer

If you’re not looking at the logs, then really you should do this before asking for help as you are in the best position to work out your problems. The logs provide you with a lot of useful information and those trying to help you don’t have access to this so we’re just guessing.

dastrix80 · August 26, 2019, 8:17am

Hi Chris

When you say badly configured, how can you configure a node badly?

Regards

chris · August 26, 2019, 8:21am

For example -:

Setting the associations incorrectly - eg setting the controller into more association groups than is necessary. This is a common problem - people add the controller into lots of groups in a random attempt to get things working, and this will cause duplication/triplication/quadlication () of messages being sent on the network depending on the number of groups.
Setting devices such as metering devices to report more often than is necessary, or with too small a delta. This can cause some devices to send multiple updates per second - the controller can probably handle one or two of these, but dozens, or hundreds will quickly swamp the network.

dastrix80 · August 26, 2019, 10:06am

Thanks Chris, understood. I believe the associations are correct, they are set only for Group 1 or for all 3 button presses with a roller shutter.

I’ve attached a log, which on the log viewer shows alot of messages relating to ‘update neighbor failed’

zwave.txt (98.2 KB)

Debug seems to indicate a neighbor update request, after a few fails, it marks the thing offline, then it comes back online again.