Z-Wave sometimes doesn't update state of switch. Nonce failure?

Hello,

So I was thinking I would use the state of one of my Fibaro Walli Switches to also turn on another light.

I.e, when Fibaro state updates to ON → Turn my Zigbee light on. When Fibaro state updates to OFF → Turn my Zigbee light off.

This works, sort of. Randomly this doesn’t work which is quite frustrating. I’d say that on average every 5-10 times the Walli Switch does not get a status update. If I look in the Android app for example it might say it is off, when in fact it is on, and vice versa.

I have tried to look at the logs to figure out why. The only reoccuring thing that seems suspicious is the fact that something to do with the nonce seems to fail randomly.

E.g

2021-06-08 23:00:00.861 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR No valid NONCE! null
2021-06-08 23:00:01.815 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR No valid NONCE! null
2021-06-08 23:00:02.199 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR No valid NONCE! null
2021-06-08 23:00:02.611 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR NONCE ID invalid! 35<>79
2021-06-08 23:00:02.841 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR NONCE ID invalid! 35<>79
2021-06-08 23:00:02.881 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR NONCE ID invalid! 35<>79
2021-06-08 23:00:04.453 [DEBUG] [mmandclass.ZWaveSecurityCommandClass] - NODE 15: SECURITY_ERR NONCE ID invalid! 79<>236

I found some old threads dicussing this but the only conclusion seems to be “bad z-wave devices” or “signal issues”. So is it really that bad that one of the largest manufacturers and a weak signal (1 meter from USB-Z stick and multiple other devices) would cause a two way communication to fail? That surely cannot be the case? If it is, the Z-Wave is a joke and I regret buying any Z-wave devices.

Thread 1: OH2 Z-Wave refactoring and testing... and SECURITY - #1568 by chris
Thread 2:Danalock not reporting status - #13 by chris

Any ideas? The full log of the command and all items is attached. Node 15 is the Fibaro Walli Switch.

I know our own specialist @chris is a master of those kinds of things. Anyone have suggestions or had similar problems?

failing z-wave.txt (117.2 KB)

From a very quick look at your log, it seems that the controller is receiving the same message multiple times. This could be because the device is sending it multiple times, or it could be because the message it routed and also received directly. In any case, the binding only accepts the first command in secure mode - this is to prevent record and reply attacks. What you are effectively seeing is just the binding reporting that it’s ignoring the repeats.

Below is an example from your log -:

Here you see the same message (with nonce ID 35) received 3 times. The first time it is decoded, and you see the meter channel is updated. The subsequent reports that are received almost immediately are ignored.

So, this is fine - no joke, and nothing to worry about :slight_smile: . This is only a debug message, so just disable debug and it will be gone :slight_smile: .

1 Like

Okay, thanks for replying so quickly! So I guess that’s not what causes my issue then. My real concern is that sometimes, seemingly random every 5-15 times I press the switch, the switches state is not updated, hence not triggering my rule.

This of course causes my Z-Wave light and the corresponding Zigbee light to get out of sync. Now in order to turn on the Zigbee light I’d have to turn off the Z-wave switch and then turn it on again, hoping for better luck :frowning:

(Also, I tried to find the log tool you use but I seemed to only find the same broken link in every thread. Is it available anywhere?)

Thank you @chris

I don’t seem to find anything strange though, but of course my Z-wave skills are limited.

Yesterday pretty much every physical button switch sent the state to OH. Today most of the times I turn on or off this light the state does not change at all in OH. This is kind of a major problem as the state in OpenHAB rarely is correct in regards to the actual state of the device…

If you could see nothing else strange in the log, how come my state just seems to disappear into thin air? I mean surely Fibaro can’t have these severe issues in their HomeCenter with the state not updating? Hence it shouldn’t be the device/s. Ideas?

Update:
Attaching a log (test3zwave.txt)
test3zwave.txt (144 KB)

In attached log we can see the following (watch node 15)

  • First the light/switch is off
  • I press the wall switch → The light turns on → Update onoff set to on (as seen in log)-> Openhab shows switch status on
  • I press the wall switch → The light turns off → Update onoff set to off (as seen in log) → Openhab shows switch status off
  • I press the wall switch → The light turns on → No OnOff update in log → OpenHAB shows my light is off, the light is in fact on.

Testing with another identical device in another room (attaching test4zwave.txt). Watch node 5. In this scenario/log:

  • The switch is off
  • I press the wall switch
  • OpenHAB state shows the switch as off, but the light is on
    test4zwave.txt (60 KB)

Testing again with Node 5:
test5zwave.txt (68 KB)

  • First the light/switch is off
  • I press the wall switch → The light turns on → Update onoff set to on (as seen in log)-> Openhab shows switch status on
  • I press the wall switch → The light turns off → Update onoff set to off (as seen in log) → Openhab shows switch status off
  • I press the wall switch → The light turns on → No OnOff update in log → OpenHAB shows my light is off, the light is in fact on.

At least in the last example, the device uses a different command to report the state. It is using the BASIC_SET command to update the state rather than the BINARY_REPORT it uses elsewhere. Probably the database needs to be updated to handle this device differently so that it handles these basic reports.

I have now been comparing here:
OpenSmartHouse Z-Wave Device Database

With the reference manual here:

I can not see what could be wrong from my perspective, but then again, I’m not all that well into all the tricks and details of Z-Wave so I might just have water well above my head. I also tried comparing to Fibaros other dual switches etc but I can not find anything obviously different.

Now that I know the device I’ve made an update to the database.

Great, I saw that the XML was updated on Github so I will try it out!

Did this ever make it into 3.1? It should have since this was added ~12 days before the tag was created, right?

I never got to trying it out. But now since I updated to 3.1 the other week I thought my issue might improve.

However, it seems like once in a while the state from the switch is still not updated in OpenHAB when I press the physical button. The whole idea of a two way communication protocol gets kind of dumb when messages might not get where they should anyway… :cry:

Edit: Did another test. Here is the Z-wave log for a failure.

Steps

  1. Start logging to a new file
  2. Press physical button on the wall (Walli Switch, Z-wave node 15)
  3. The relay turns off
  4. 10 minutes later OpenHAB still shows the light as ON, but it is off and has been for 10 minutes.
  5. Checking the log I can not see anything strange, can you? What could be the cause?

fail.log (185.1 KB)

The log looks fine. It only shows one set of comms as you can see below and the binding seems to decode all these messages so I don’t think it’s missing anything.

Well, as with any two way communications system, it only works if there really is communications :wink: . If something doesn’t report, or if there is intermittent communications for some reason, then it can’t magic up information that was not transferred. This is the same for anything - if you cut your fibre, the internet won’t work…

Yes I see you point there but honstely
If I have the following mesh in a medium sized apartment and like 5-10% of the time OpenHAB does not update the state, either something with Z-Wave itself or OpenHAB must be kind of wonky.

I mean I can run my WiFi for hours and have zero packets lost as the few ones that do get lost are resent etc. I’m not entirely updated on how Z-Wave handles this, but either the entire protocol or the binding seems to not try very hard. My entire point is, there’s no broken fiber, it’s a mesh with many good paths and messages shouldn’t get lost.

It could of course be some issue with my setup as well. I don’t know if a lot of people experience similar issues…

Node 15 has literally almost every node as a neighbor (1,2,3,4,10,16,18,20,21,22,27) as well as the controller. It’s less than 2 meters from my Z-Wave USB stick. The device is from one of the largest most reputable vendors. The device is certified by the z-wave alliance, and sometimes the state doesn’t update…

There must be some explanation for this somewhere either in the specification or the binding or SOMEWHERE.

It’s hard for me to really comment too much - all I can say is that routing, and low level network stuff has nothing to do with the binding or openHAB.

I’m not really sure how you know if packets are being lost? In any case, ZWave is totally different - different frequencies for starters.

As I said earlier, from the log, there is no report, so something fundamentally went wrong and the device did not send a report as far as I can see.

Understandable. I’m not trying to bash on the binding at all, I’m just generally trying to find out where it goes wrong. Because as you mention, there seems to be some fundamental problem and it’s quite big to be honest if a protocol with only certified devices manages to lose messages in this way.

Of course. My point really was that with such a big mesh and devices being so close to the controller the signal really shouldn’t be an issue. For WiFi however I’m referring to e.g TCP.

If I send a package from e.g my lightswitch, I will expect the controller to say “sure thing, I got your update, i will bypass this to OpenHAB”. If the light switch does not get that message it should say “hmm, I sent my state to the controller but it never confirmed it…maybe I should try again”. Whereas Z-wave (at least for me right now) seems to go.

Lightswitch: “hey here’s my updated state”
Controller:
Lightswitch: “all is fine”

I guess I will have to exlude everything and try with only one device. If that doesn’t do it, get a more modern usb stick. By the way, I noted that one button cell battery powered device is listed as routing, maybe this can be the issue. But the message should be able to take other paths as well so it doesn’t make sense either.

According to the information above, you clicked the button on the wall - if the information does not get to the controller due to poor communications (for whatever reason) then the controller simply doesn’t know and can’t acknowledge the update. Otherwise what you describe is what normally happens, but as with any communications, if things don’t work for some reason, then, well, it doesn’t work.

And that is the exact reason why the switch on the wall would resend the message until it receives
a confirm in any kind of smart protocol.

Without having gone through the Z-wave standard it sounds as if Z-wave, compared to the “Internet world” uses something comparable to UDP, while TCP would be a lot better for the purpose. Because 9/10 or even more it works perfectly, so I do not see why a simple retry wouldn’t work, just like in Internets TCP for example.

E.g: Retransmission (data networks) - Wikipedia

This of course would have nothing to do with the binding or what you have created but still kind of interesting to reason about how that decision came to be, in case that’s how it is. I guess I will have to read the Z-wave protocol specification in order to see.

I have a zsniffer setup to observe zwave communications. The UI picture is nice, but doesn’t show the actual paths. Also there is a series of posts from @robmac on zwave routing basics. Easier to read than the Silab documents.

No protocol will resend forever until it receives a confirmation - none. In the case of ZWave IIRC it will try 3 times - generally trying different routes in the process. But it clearly can’t continue this forever or it will have an adverse impact on the wider network.

It is an ack’d protocol, although not a streaming protocol like TCP since that is simply not required here, but also not a send and forget protocol like UDP.

I really suggest that if you’re interested, then have a read of the docs or as Bob said, get a sniffer.

However, understanding the protocol might be all very interesting, but it will not resolve your issue.

I agree, then the question is what the issue could be.

As you clarigy it should try different routes and so on. As I showed earlier in the thread the node I have specifically seen this issue on (it’s probably others as well) has 11 direct neighbours in the mesh. Also it’s the node closest to the controller (about 1.5 meters). This really should rule out any connection issues to be honest. It seems very unlikely the messages wouldn’t get to the controller, but who knows…

I’ll have to get a sniffer at some point I guess…

Have you contacted Fibaro?
Have you swapped the switch with a different one to see if perhaps a hardware issue with this particular switch is the problem?