Seeing alot of nodes not talking to the controller, and the map shows nodes yet not forming neighbors.
There doesnt appear to be any major issues in usability, but something isn’t right.
ZWave Binding - Latest
Aeotec Z Stick
OH2 version - 2.4 Stable
Server spec: Intel i3 8100, 16GB Ram, SSD disk, on gigabit ethernet
The console is full of these:
09:40:06.433 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node121' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:16.606 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node63' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:17.742 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node140' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:19.579 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node143' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
09:40:24.295 [INFO ] [ome.event.ThingStatusInfoChangedEvent] - 'zwave:device:6dad8bea:node68' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller
The map shows this after a few days. All of the polling of the devices has been turned off to save zwave bandwidth.
Is the network running the daily network heal?
Are the problem devices battery operated? @5iver@sihui and @chris may have some ideas bu these answers would be helpful.
I know some of my battery powered sensors sometimes hang during the heal.
What version of the binding are you using? There have been a lot of changes moving from 2.4 to 2.5M2
Kris
I know you’ve been here for awhile but bloody hell what platform are we talking about here?!? what version of OH? what version of the zwave binding?? Sounds like things are overloaded, this isn’t a Pi is it?
openhab> smarthome:things list |grep zwave| wc -l
124
… and cannot run a heal without the network becoming completely unusable. I’m using S1665, but the version of OH may not be a factor. A network with bad/outdated routes definitely aggravates this. I suggest disabling the daily heal and restarting OH to clear out the hung threads. If you need to, manually heal your devices. Some debug logs would help to confirm if you are experiencing the same thing that I have been reporting. This only appears to effect large networks.
This does not sound like you are describing the same issue. When I run the daily heal, commands are delayed for multiple minutes or never get through and I often get queue full errors.
Habmin> Configuration> Things> select your Thing> Tools> Show Advanced Settings> Heal the device
I’m only guessing that there is some correlation with the number of nodes due to the lack of reports. I don’t know the cause and I’ve been busy with other things, so haven’t been bugging our very busy zwave maintainer about it. The workaround is the same as what is needed for this issue, which may be different and effects most (all?) with battery powered devices. Since people are turning off the daily heal to get around the other issue, we won’t really know until that one is resolved. Hopefully, the fix takes care of both.
If your zwave devices stop functioning after a daily heal, turn it off. You can easily test this by setting the heal time (configured in the controller Thing) to the current hour, which appears to start the heal immediately.
no nodes are dead. I will get the debug onto it tonight and see what I can see. But theres an overwhelming amount of traffic so its like finding a needle in a hay stack.
We have approx 121 nodes right now, but we are adding more to about 140 in total.
I’m not sure what you are referring to with this statement? I didn’t mean to insinuate that nodes were dead - my point above was really if they are badly configured, or operating badly they can overload the network. I do see this quite often. Have you checked this?
This might be your problem then? You need to ensure that the system is managed properly and if you’re flooding the network, then things will definately fall over.
I have an online log viewer that I’d suggest you use - it makes it easier to find the needles!
If you’re not looking at the logs, then really you should do this before asking for help as you are in the best position to work out your problems. The logs provide you with a lot of useful information and those trying to help you don’t have access to this so we’re just guessing.
Setting the associations incorrectly - eg setting the controller into more association groups than is necessary. This is a common problem - people add the controller into lots of groups in a random attempt to get things working, and this will cause duplication/triplication/quadlication () of messages being sent on the network depending on the number of groups.
Setting devices such as metering devices to report more often than is necessary, or with too small a delta. This can cause some devices to send multiple updates per second - the controller can probably handle one or two of these, but dozens, or hundreds will quickly swamp the network.