OH2 Z-Wave refactoring and testing... and SECURITY

chris · July 21, 2018, 11:35am

Ok, maybe it’s just the log you sent from 2.2 missed the bit before this - something must have been sent from the switch to tell the binding to send the poll - can you see if you can provide that as well. The only thing I can think of is that the timing is different and the new binding is polling quicker than the old one.

mhilbush · July 21, 2018, 11:36am

I’ve seen this a few times when a device gets into a funky state.

@r27 Try using the air gap switch to cycle power to the device.

r27 · July 21, 2018, 11:38am

I have 3 of these devices, all acting same with new z-wave. With all respect I don’t think that this is device issue. But thanks for suggestion.

5iver · July 21, 2018, 11:39am

When I first dropped it into addons, I got a pile of errors. Still looking at those. After restarting the binding, everything looked OK. Deleting/rediscovering DZMX1 was successful and device is working properly. Your workaround appears to have resolved this for now! For someone whose so busy, you get a ridiculous amount done!

r27 · July 21, 2018, 11:41am

Wow. This makes me lier I am leaving … will be back …

chris · July 21, 2018, 11:41am

I just checked the code that handles the NIF and sends the poll, and it’s exactly the same as in the master branch and hasn’t changed for 2 years (last update 31 Jul 2016) - in fact this is the same from the very first OH2 as the commit message is “Initial Commit”. The timing is the same at 75ms.

chris · July 21, 2018, 11:44am

While I do agree it’s very unlikely given it worked when you swapped back to the old version, and also it’s affecting 3 devices, please can you try what @mhilbush suggested - just so we can rule it out?

At the moment, we seem to be in a situation where everything is exactly the same, so “obviously” there can’t be a problem - except there is .

mhilbush · July 21, 2018, 11:47am

No worries. Just offering up a few ideas. If it’s happening with all 3 devices, then it’s probably not the device.

@chris I’m trying to remember other times when I’ve seen application busy. Neighbor updates can take quite a while. When a device is doing a neighbor update, will it report busy? Edit: I’m not necessarily saying that’s the problem here.

chris · July 21, 2018, 11:51am

I’m not sure - I don’t know that I’ve seen this before - at least not in a systematic way. Looking at the log from @r27 there are 2 updates approx 30 seconds apart, and seemingly nothing else happening between them, so I don’t think it’s that.

It may still be a timing issue - maybe 75ms is just too marginal, and maybe on some (fast) systems it is more marginal, and maybe the new binding is just a bit faster… Maybe … I don’t think there’s just one issue, so it’s possibly a buildup of issues like this (maybe! ).

mhilbush · July 21, 2018, 11:57am

Combo of a fast system AND having removed some of the logging. (although I can’t remember if you removed it or just moved to TRACE level).

It could very well be that I saw it because I’m running on an OC’ed core i5.

What is the hex sequence for app busy? I have some old logs lying around that I can grep for it.

mhilbush · July 21, 2018, 12:07pm

Did you conclude whether the issue with manual thing definition has the same root cause as the issue you just opened? Or, possibly related to the issue I opened a while ago.

In the issue I opened, it appeared that the config description simply is not used when defining a thing using the Thing DSL. In any event, I probably should update my issue title to be less narrowly defined.

chris · July 21, 2018, 12:40pm

I’m not sure it’s related (at least not directly). The issue here is that we have the ability to define parameters in multiple places - statically in the XML, and also through a config provider. Additionally, you can have parameters that are only applicable for a single thing, or for the thing-type. These should be merged, but the issue here is that unless there is config parameters defined in the XML, then the system is not calling the config provider.

I’m assuming it this wasn’t the case in the other issue that there were multiple sources of config and it’s probably a DSL issue…

mhilbush · July 21, 2018, 12:45pm

Correct. In the issue I opened, when processing the Thing DSL, it makes use of no config description when doing the normalization (i.e. it normalizes without using any config description regardless of source).

chris · July 21, 2018, 12:49pm

I’ve updated the binding to change the poll period from 75ms to 175ms - let’s see if that makes any difference…

grepping for 22 01 00 is probably best (22 is command class, 01 is busy, 00 is try again later).

mhilbush · July 21, 2018, 12:57pm

Thanks. I have a couple instances of APP BUSY in my logs. I’ll take a look at them and let you know what I see. FTR, in this case, it’s from a system with a much slower CPU than the one with the OC’ed core i5. I haven’t looked at that system yet…

Edit: The device generating the APP BUSY is a Leviton VRS05 Scene Capable Switch.

Edit: I also have a couple instances of APP BUSY in another system. Looking at those now.

Devices:

Leviton VRMX1 Scene Capable Dimmer (node 56)

Leviton VRMX1 Scene Capable Dimmer (node 50)

Notice a pattern here. All the same brand…

r27 · July 21, 2018, 1:17pm

I’ll be away next 48 hours. I’ll post test results later. Thank you.

chris · July 21, 2018, 1:27pm

Seems to show up a bit in a google search - eg - http://forum.micasaverde.com/index.php?topic=40103.0

mhilbush · July 21, 2018, 1:28pm

@chris Sorry for the many edits to the above post. It should be complete now with a summary of the APP BUSY messages I see in my current logs. I have some older logs I’ll look at as well.

The APP BUSYs occur over a period of several days and there are very few of them when compared against the total traffic.

I’ll install the latest binding and observe what happens.

mhilbush · July 21, 2018, 1:39pm

Interesting.

What’s odd is that there’s nothing else unusual going on with these nodes.

I still think cycling the power using the air gap switch is a good first step for @r27 – just to rule out the devices being confused in some way. Also, installing the binding with the 175 ms delay.

I think excluding/including also might be an interesting thing to try, although in my case, they APP BUSYs are so sporadic that I’m not going to do that just yet.

I just updated the binding in 3 systems (2 systems have Leviton devices, and 1 system has no Leviton devices). For now, I’ll just monitor over the next several days to see if any more BUSYs are generated.

5iver · July 21, 2018, 2:00pm

I am also seeing these in my logs. Most are from Leviton devices, but most of my devices are Leviton. I do see them from other devices too though. Trying the new version now.