OH2 Z-Wave refactoring and testing... and SECURITY

chris · April 25, 2017, 11:09am

Well, the controller we are using is licensed, right? It’s a standard controller, and it is apparently providing this information. However this is exactly why I decided to use the controller for this - they should have the best view of the network.

Why? So, the controller tells us it’s alive, and now we need to tell the controller it’s alive? My expectation is the controller already knows this as it is the one sending us the information from the device in the first place.

Sigma don’t release information on using the serial API. I talk directly to Sigma regularly so I’ll try and find out more, but given that this information is not publicly released, they might not be forthcoming with information.

chris · April 25, 2017, 11:12am

That’s not correct - at the moment even if the controller thinks the device has failed (based on the information I have) we can still send data to the device. This was my point above - the controller should therefore know that the device is alive…

RayBe · April 25, 2017, 11:26am

@chris, did something stood out for you in the log? I didn’t saw something strange.

DanielMalmgren · April 25, 2017, 11:28am

Oh. I thought the controller blocked attempts to communicate with FAILED nodes.

Well, anyway, the problem as I see it is that often the controller seems to have erroneous information. For some weird reason it decides that nodes are dead that I know for sure are working. I have seen many times data being sent from a node to the controller and seeing in the logs that the controller simply drops the data because it has marked the node as failed, how stupid is that? (this might have been in OZW but I guess software doesn’t matter in this case)

chris · April 25, 2017, 11:38am

No - nothing notable.

chris · April 25, 2017, 11:42am

No - not in my experience.

I don’t know what error this is - can you provide a log as I don’t know would cause this in the current code.

DanielMalmgren · April 25, 2017, 11:48am

As I said, this wasn’t in the current code, so maybe it isn’t relevant at all.

I’ll have a check this afternoon when I get home. It could be that the “failed” nodes simply haven’t sent anything to the controller since I restarted the binding. Seems unlikely though, one of them is a power meter which usually flucuates quite lot.

chris · April 25, 2017, 12:07pm

Ok - I didn’t see that this was for another code. As I’ve said above, this code is completely different so we need to be clear to avoid wasting time.

DanielMalmgren · April 25, 2017, 12:31pm

Sorry. Understood.

What I can say about the current code though is that requesting the IsFailedNode status every 30 seconds seems to be kinda meaningless, even if we ask a thousand times the controller still claims it is failed. But I guess the controller will never actively tell us when a node comes to life so we have to poll the status, right? Is there any way we could differentiate this behaviour between pure sensors devices (we will eventually get to know they’re living anyway when they send us something) and other devices (which can be good to know if they’re alive if we want to send them something)?

chris · April 25, 2017, 1:13pm

Why? I could change it to 60 seconds - what does it matter to the user if it’s 5 seconds or 60 seconds? It’s a period I chose to check the status of devices. If you think there’s a better time then I’m happy to discuss but this was chosen to provide timely updates without flooding the system in any way.

That’s not correct. At least in my experience, at some point the controller will decide that the device is online and it will return this.

Correct - this is exactly why I poll every 30 seconds.

No - that’s not correct. For sensors (I guess you mean battery devices) receiving data doesn’t mean they are alive. The only time to really know that the device is functioning properly in the network is during a wakeup period.

DanielMalmgren · April 25, 2017, 1:25pm

Ok… My experience is that nodes can be marked as failed for days and weeks and it’s not until the controller actually gets some communication from the node it may be marked as ok again.

No, I’m not talking about battery devices. I’m talking about devices whose primary function is to send data to the controller as opposed to those (switches, dimmers, whatever) who also have channels that we wish to manipulate from the controller. I mean, do we really care if our temperature sensor is alive or not? When we get data from it we see that it is. But maybe I’m misunderstanding something here, since you’re saying that receiving thata from a device doesn’t mean it being alive. I just don’t really understand how a dead node could be sending anything at all.

rgerrans · April 25, 2017, 1:34pm

My issue/question is that for my door sensors I’m only getting trigger event (door alarm) reports. I’m not getting battery level reports from them. In the past with the test binding I have been able to get them in a constant connected state by triggering a manual wakeup when the controller was communicating with them and then get regular battery level updates. With the old OH2 binding I also saw more consistent battery level reports.

Thanks.

chris · April 25, 2017, 1:36pm

Well, yes, of course. If the device is really offline and not communicating, then I would assume that it stays marked as offline. The point is though that the only way to know this is to continue to poll the device and you are saying that this is “meaningless” which I don’t agree with.

A node can send, but not receive - it’s unusable and will not work.

chris · April 25, 2017, 1:39pm

This probably means the device is not waking up the binding isn’t receiving the wakeup message. I would suggest to check the wakeup settings, then check the log to see what is happening.

It is probably not related to the binding. The binding sends the polls every hour, or every time the device wakes up - whichever is longer.

rgerrans · April 25, 2017, 1:45pm

It was set by default to once every 60 minutes. As a test I pushed it back to once every 30 last night but I just looked and that change is still pending, so I think your likely right that the device just isn’t waking up and sending battery reports thought it will still send an alarm report. What is the best way to generate meaningful data to see what might be causing this?

My thought for a data test would be to (turn on debug):

Change a configuration setting to see the communication between the controller / device
Then trigger an alarm event to get that communication

Any other actions that you would suggest as part of the test?

DanielMalmgren · April 25, 2017, 1:47pm

Ok, well in that case I guess I’m wrong then. I just really don’t like things being polled, but if that’s the only way, it feels more like a shortage in the serial api and I guess we can’t do much about that.

Ah, I see. When you put it that way it makes sense.

Starting to understand that you’re really doing everything you can from the binding’s point of view, guess it’s not easy backwards engineering something and then getting it rock stable

chris · April 25, 2017, 1:48pm

The device will need to wake up before you can send the wakeup ;). So, to get this set if it’s not already, you’ll need to manually wake it.

Triggering an alarm event means nothing - it’s not related to wakeup in any way. You need to manually wake the device, or wait until it wakes up by itself.

rgerrans · April 25, 2017, 1:59pm

Wouldn’t an alarm event make the device wakeup, or are they only outbound events and don’t trigger a regular wakeup action (still trying to understand all the nuances here)?

chris · April 25, 2017, 2:12pm

Polling is pretty standard stuff I’m afraid Even when something is sent unsoliticed, polling is still needed otherwise you don’t know if no message means everything is fine and there’s nothing to send, or everything has gone to hell and nothing is working at all… A low rate poll IMHO is in any case needed (low is a relative term - relative to the capacity of whatever you’re polling) - I poll the controller quite often, and devices much less often…

It’s not totally reverse engineered - I have the serial API documents, but they are quite old so have quite likely been updated and potentially improved, but as it’s all backward compatible it still should be applicable. This helps with understanding the frames, but not understanding what is really happening under the hood… I’m also hoping that in future we may be able to improve things through various means, but that’s for another day and another discussion…

chris · April 25, 2017, 2:13pm

No - not at all.

Yes - this is the point. An event is the device sending data, but it won’t listen for any response. A wakeup is the device sending a message to tell the controller it can receive messages for the next few seconds.