OH2 Z-Wave refactoring and testing... and SECURITY

Just upgraded to 2.2.0-Snapshot and the 2.2.0.201710241752 ZWave binding.

In binding from 20171012 these errors were gone, but they are back in abundance now:

2017-10-31 23:32:48.205 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.221 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.226 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.248 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.251 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.254 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.258 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.261 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.264 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.267 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.270 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.273 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.277 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.285 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.288 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.300 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.306 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.309 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.312 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.315 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:32:48.318 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.300 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.305 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.313 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.322 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.330 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.340 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.376 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.379 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.384 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.387 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.391 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.
2017-10-31 23:33:03.400 [ERROR] [lmessage.RequestNodeInfoMessageClass] - Request node info not placed on stack due to error.

The biggest bursts were 1-2min after service restart. Settled down after some time. (5-10min)

Everything seems to be working OK. Just clutters up the log a bit …

Someone said it was only a one byte LRC or possibly one byte CRC, on the messages aired (pre gen 5). This may all be wrong though.

Anyway, who knows why you ended up with a node 2. There’s no way to know without at least seeing the logs which I’m guessing you don’t have as I’ve mentioned this 3 times and you’ve not responded? Maybe the logs won’t help much either, but at least we can confirm what the binding does was correct.

As far as I can tell, this issue still exists in the binding. Same issue exists today. Logs show no indication of the issue.

Can you be specific about the problem. I tried to follow your link back, but I’m not sure what this is (this thread is a little on the long side). Can you reference the message that describes the problem maybe?

Sorry, yes, I can see how that would be helpful! I am referring to the issue with secure inclusion that seems to occur when you have more than a handful of nodes in the network. Both @shawnmix and I were experiencing similar behavior - you can securely include if you have a freshly reset controller, but somewhere past the neighborhood of 20-30 nodes and it becomes unstable. At my 70 nodes, securely including another node is impossible. I tried another 10 times yesterday, so my total number of attempts now is at about 40 tries.

Here’s the end of the thread on this, maybe it’ll help jog the memory.

I am off work for some time and can commit dedicated time to testing or collecting data on this now. The only thing I don’t want to do is probably exactly what you want me to do - hard reset the controller. I can attest that on the same controller, same build of OH2, same build of this binding, I could not securely include nodes. When I accidentally hard reset the controller, I suddenly had no problem adding my 5 secure nodes. Now that I’ve reincluded the remaining nodes and am back up to 70, I can no longer securely include a 6th node I purchased as it fails on secure inclusion every time.

Ok, yes, I know the issue, and unfortunately I don’t expect this to go away as nothing has changed. I can’t replicate the issue, and I’ve not had other thoughts from ZWave. I checked the source in some Sigmas docs that I have and I’m also reasonably confident that I’m doing things correct, so I’m not sure where to take this.

I am not either. The majority of my network is mains-powered. My OH2 server is on a UPS, so I shut off the mains power for the whole house to try to limit the amount of Z-Wave communication going on during the inclusion process. Same issue. Here’s a snippet from the log viewer. I am not sure I am reading it right, but it looks to me like the controller assigns an ID (101 in this case), advances to the security stage, sends SECURITY_SCHEME_GET, receives an ACK from the node, accepts it, then times out at 0ms saying there is no ACK? This then just repeats until it apparently times out and moves on.

No - this isn’t a timeout - it’s a response from the controller to say that the device did not respond. Also, don’t be fooled when it says 0ms - probably you have changed your log format and it’s now not compatible with the viewer so this is wrong.

Got it. Well, I will keep fiddling with it. Unfortunately not being able to resolve this and not having a workaround is a pretty big problem - I either need to figure this out, find a way to run Z-Wave in my home separate from OH2, or just move away from OH2, which is definitely not what I want to do.

I agree and I’d love to resolve it, but I can’t replicate it here and I’ve tried a couple of things that haven’t helped, and I spent quite a lot of money to get a Schlage lock that everyone had trouble with, and also the devkit, but so far, nothing has helped which is “slightly annoying” :wink:

I’m open to ideas…

1 Like

I have figured out a work around for this, at least when using the Aeon Z-Stick Gen5 (white). Maybe it does or does not give a clue to the root cause of the problem. Workaround for secure inclusion failing when a certain threshold of nodes exists in the network:

  1. Download Zensys Tools (link)
  2. Download Aeon Backup tool (link)
  3. Copy your network security key from Habmin
  4. Shut down OH2 and remove the Z-Stick
  5. Place the Z-Stick in a Windows PC
  6. Perform a backup of your Z-Stick following the directions in step 2
  7. Open the Zensys Tools, connect to your Z-Stick
  8. Highlight your Static Controller node (should be first in the list)
  9. Go to the Command Class tab on the right, select COMMAND_CLASS_SECURITY_V1 for the Command Class, NETWORK_KEY_SET for the Command Name
  10. Enter your key into the Network Key byte field, replacing the “00” with your key you copied from OH2
  11. To test, go find a working secure node and click the Node Info button. You should see all command classes and the log will indicate that the node was “added to the secure network”. So far, so good. If you don’t get “added to secure network”, then the query isn’t working and the following steps won’t do you any good.
  12. Click “Add node”
  13. Start the inclusion on the device you wish to add, ensure the device is within a few feet of the Z-stick. You should see an indication that this device was added to the secure network.
  14. Shut down the Zensys Tools, move the Z-stick back to OH2, move the device you just included to within a few feet of the Z-stick
  15. Start OH2 and go get a snack. Wait 20-30 minutes for the binding to settle down and you should see the new node you added in your Inbox, identified and waiting.
  16. Add the node to OH2 and validate that security is checked. Your device should now be working.

The only issue I am having is that the binding doesn’t seem to want to enumerate all of the channels. For the particular device I am using the only channel I need is exposed by default (Barrier State), and the controls work. I haven’t figured out how to get the binding to discover the remaining channels.

Tagging @shawnmix as this may be useful for your setup!

3 Likes

What is the device?

Linear GD00Z-3 - but keep in mind I had these same issues with Kwikset locks (a handful of models), so I am 99.9% sure this issue isn’t device specific.

So, to be clear, on all your devices you only have 1 channel? That is strange as I’m sure that this is not the case with most people (and not the case here that’s for sure).

If that is the same as the NGD00Z-4 that’s in the database, then there is only 1 channel so what you see is correct based on the database definition. It doesn’t look like the device supports a lot of functions so I’m not sure what you think is missing?

It actually shows up as a NGD00Z-4. I have two though - one I included via OH2 after my accidental hard reset, the other I just did. The one I did early on shows 17 channels:

Again, not a big deal. If there is only one channel, then maybe this solution works for other secure devices as well.

Someone added a lot of random channels to this device a while ago and it looks like they have been removed now. Your old device probably still has the old definition.

Clearly devices can have moe than one channel so this is fine.

Delete the Thing and then rediscover to remove those extra channels, which the device does not support. I went through the same confusion when I added my second one. You can find more details here on the history of that device, item definitions, and some example rules (I’m using lambdas now, so let me know if you’d like me to share them)…

Yep, fixed it! I wish there were a way to know when a Node needs that.

@nolan_garrett - thanks for the info on the ZenSys. I’d seen approaches like this before, and as I think you’ve indicated and realized - it’s a bit annoying to do just to get a secure inclusion. But at least it’s a workaround. The problem for me, I’m a Mac shop at home, so trying to get a portable device running windows is annoying. VM’s off the ESX server to run locally on a laptop, then fiddle with USB passthrough, gets a bit ugly! :wink:

I wish we could identify what is different about our setups though that is causing this issue that isn’t seen by @chris. He’s gone a lot of good work digging into the core of things, I just can’t help but feel like we’re all overlooking something minute. Is there some type of like a “full report” we can output somewhere or some way to see all the nitty gritty settings on the ZSticks or something?

I’d also agree with @nolan_garrett about the device updates. Maybe there is a way to auto-refresh the devices at some type of set interval? Perhaps a component to the binding that can force a refresh of the XML files? Perhaps its just something to be added in HABmin vs the binding? I’ve found this on occasion that I needed to manually delete and re-add to update XMLs in the past, and as you’ve seen Nolan - I had the same issue initially with the Garage controller that showed a ton of channels that were useless.

1 Like