Wow, many thanks for your great support Chris!
In my test environment everything works stable now. So the update did the fix!
I will test it tonight in my production environment also.
Thank you again for all your work!
I took the plunge and installed the new binding.
It really made a difference
Only NodOn Wall remotes are not included for some reason.
Log also complains about 3 nodes every 30s, even though they all work perfectly well.
2017-04-24 10:26:22.369 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 24: Is currently marked as failed by the controller!
2017-04-24 10:26:23.096 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 29: Is currently marked as failed by the controller!
2017-04-24 10:26:23.123 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 22: Is currently marked as failed by the controller!
2017-04-24 10:26:52.379 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 24: Is currently marked as failed by the controller!
2017-04-24 10:26:53.107 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 29: Is currently marked as failed by the controller!
2017-04-24 10:26:53.132 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 22: Is currently marked as failed by the controller!
2017-04-24 10:27:22.391 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 24: Is currently marked as failed by the controller!
2017-04-24 10:27:23.116 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 29: Is currently marked as failed by the controller!
2017-04-24 10:27:23.141 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 22: Is currently marked as failed by the controller!
2017-04-24 10:27:52.401 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 24: Is currently marked as failed by the controller!
2017-04-24 10:27:53.126 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 29: Is currently marked as failed by the controller!
2017-04-24 10:27:53.148 [WARN ] [rialmessage.IsFailedNodeMessageClass] - NODE 22: Is currently marked as failed by the controller!
Node 24 and 29 are Aeotech Multisensor 6 on USB power.
Node 22 is a NodOn Wall remote on battery. (node6 also a NodOn remote is OK)
Have restarter OH2 several times.
Give me a honk if you want a debug log @chris
Great stuff
So, this means that the controller thinks the device has failed. I had assumed when I wrote this that the controller would have the most reliable view of a devices state, so unlike previous bindings where I had kept an internal state, I decided to use the controllers state. However, it seems that this is less reliable than Iād hoped, so I may need to change this conceptā¦
Yes. Seems strange that the controller insist they are faulty even though it is relaying messages to and from them.
I have not power-cycled my PC and stick. Could that help do you think?
No - probably not.
From my initial experience with this, I found that the controller would report some battery devices as FAILed, until they wake up, but Iāve seen other situations where this isnāt necessarily the caseā¦ I hope to try and get some more information about this from Sigma and will then decide how to proceed.
I too have a Zooz 4in1 sensor on battery power that gets marked as failed. If I manually toggle the node to wake up it will green up for a few moments, and then the controller tried to talk to it again and marks it as failed again. Certainly seems strange like the controller or something is still trying to ping a sleeping deviceā¦
Try healing the device from HABmin, that helped for me.
br,
Raymond
With all these reports, Iāll add that Iāve seen some improper response from some of my non-battery powered devices. When restarting, I often find I have to monitor the startup and validate no devices get marked as ānot respondingā because if they are left in this state for too long, they then get marked as Failed. Itās not always a guaranteed problem, but Iāll say that I believe the newer method has proved less reliable than the old method. So Iāll add a +1 to the votes to give @chris some more work in reverting back to the old method!
What is ānot respondingā state mean?
Iāve always found that the controller sorts things out in the end, but IMHO itās not doing a great job of detecting/deciding when a node has failed. Iād like to understand how this works (ie inside the controller) better and itās on my list of things to speak to Sigma about when I speak to them next.
Has anyone successfully securely included a Kwikset 910/914 lock on this binding? I have other locks that are included, but I excluded one in testing. After 30 or more attempts, I cannot get this lock to securely include.
Yes, I have a 910 and itās a pain. It wonāt secure include unless I pull it from the door and put it within inches of the controller. I have a gen 5 stick. After the inclusion and everything looks good in habmin (uses security is a green check mark) then put it back in the door and potentially wait a while while it figures out your mesh again. My controller is one hop from the lock so this takes a little while and a couple of restarts of OpenHab.
@chris I think thatās exactly what the state was listed as at times ānode is not respondingā. Another time it happened because I turned the breaker off and so the power for the device was lost for about 30 minutes while I was installing another switch for example. When I brought the power back on, it had already marked the device as failed and I had to go exclude the device, then re-include it again. Just another example to help outline more times. It seems to me it would be when the controller is re-establishing links, there is room for error (not sure the reason though) and if a device goes offline for a short while, it can become detached as well. Same thing happened for the garage controller one time - which drove my chat with you about creating a portable device to secure include it again.
Now Iāve just learned if Iām going to restart OH, that I go and validate/check on all my devices as it starts up. On occasion I see a ādevice not respondingā message, so I hit the On/Off button for it manually or thru OH, and suddenly itās ācommunicatingā again and has no issue. But if left untouched, they often stay in that state, then turn to a Failed state and I can no longer get them back without doing a full exclude/include process.
Ok, I donāt know this message. Thereās one called āNode is not communicating with controllerā - is that it maybe?
This definitely should NOT be necessary. Remember, the controller will do this even if in OH I have my own status, so adding an internal state wonāt solve this problem if it really is happening as you say (which I very much doubt).
Again, Iād be very surprised if that was really the case (like REALLY surprised). You should not need to exclude the device and reinclude again - this is clear and is certainly not my experience. As above, if this is really the case then we can not solve this - sorry - but I think youāre wrong.
Just to reiterate - if what you say is true, and as soon as the controller marks a device as failed you have to exclude and re-include, then Iām afraid that the āold methodā will not solve this problem.
My experience is very different than yours hereā¦
Ya that sounds familiar. I believe that would be the one.
On the others - I will see if I can observe it next time I need to restart. I donāt try to restart frequently, but Iāll try to be mindful if I do need to and grab a snippet of screenshots for you. I donāt think Iāll be cutting the power to anything though, so I canāt say Iāll likely have any evidence of that again, but Iāll keep it in mind if it does.
So that means that the device is FAILED as far as the controller is concernedā¦
So what do you mean that after itās left in the ānon respondingā state for ātoo longā they get marked as FAILED - what does failed mean? Iām just trying to establish what the different states are so I can work out what the binding is doing, but Iām a bit confused - sorry.
Sorry I know my lack of using the exact messages is likely causing it to be difficult.
If it getās marked as āNode is not communicating with controllerā - itās not actually failed. I can then hit the on/off switch manually or tell OH to turn it on/off. Suddenly it will communicate again, node goes green, and all is well.
If however I was to leave the āNode is not communicating with controllerā message, it will eventually turn into a different message, indicating that the node is ACTUALLY failed and has been marked as failed. I forgot the exact message that was appearing, but it was not the same as not communicating. When it reached this state, it was truly incapable of being healed, reset, or to work again without manual exclude then include.
Fair enough, point taken
Glad it isnāt just me. I will keep trying and not give up hope. Out of curiosity, when you successfully included, were you in Low Power Inclusion mode, High Power Inclusion mode, or Network Wide Inclusion mode? Iām currently trying on Low Power and High Power.
Itās no problem, but I just want to be clear so that when Iām looking for problems, I know what Iām looking for ;).
Yes - the controller says it has failed. See the code here -:
switch (event.getState()) {
case FAILED:
logger.debug("NODE {}: Setting OFFLINE", nodeId);
updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.COMMUNICATION_ERROR,
ZWaveBindingConstants.getI18nConstant(ZWaveBindingConstants.OFFLINE_NODE_DEAD));
break;
}
andā¦
OFFLINE_NODE_DEAD = "Node is not communicating with controller"
So this message is set when the controller has marked the device state as FAILED.
Ok, Iād like to understand what this message is as I donāt see any other messages along these lines in the sourcecode.
This is really quite major so Iād appreciate it if you can clarify the above points. Iām not sure thereās anything I can do about it though - changing the way the node is detected as dead/failed will not help in this case, but first letās understand exactly what the problem is.
Just to chime in in case you need more data points, I also have three battery sensor nodes (2 door sensors and one multi) in the same state where in HABmin they say āNot Communicating with the Controllerā but if a trigger event happens they send a report to the controller and my related rules fire. I just donāt get the regular reports on things like battery status from them and see the 30 second error warning of āNODE 44: Is currently marked as failed by the controller!ā
Thanks - this is also what I see and is quite normal. My major concern lies with devices that need to be excluded from this state.
I would expect the controller to mark it as ānot failedā if it received data, but I think this might only happen when the device wakes up - not just when it sends a report - as this is the only time the controller can communicate with the device.
I think it was on the default which seems to be Network wide. Although I know for sure the 910 will not include over multiple hops ā never has for me on many different systems and two usb sticks. Iāve always had to put it physically close to the controller. Make sure to exclude it from OpenHab and the controller itself and then initiate the inclusion from habmin.
Also make sure the batteries are full (or brand new)! I had a bad battery at one point (one of 4 AAs that it takes) and it made it do weird things. I verified the voltage of each battery one at a time and also the voltage of the pack as a whole.