Slow Z-wave network

@robmac

Ok, thank you!

I have some nodes that don’t really care about my setting of not reporting a change of less than 20% in watts. This is one of these nodes. They also report very small changes even though the text in OH for the device setting says it never reports a change <1W. I didn’t think this could cause the behavior we see though because I believed it was normal with a retry that soon when the other end is not responding. The times I have seen in the log when an ACK actually happens it happens after around 20 ms. But I will try setting it to disable these reports altogether.

This is a Qubino Flush 2 Relay (ZMNHBD) with firmvare version 1.2. I have been in contact with Qubino support regarding this earlier, because another problem with this specific node is it reports -214748364,8 kWh in its energy reports (max negative 32 bit integer). They told me it was a known issue in this firmware and that I should try excluding it, resetting it and including it again. I have tried excluding and including but haven’t managed to reset it to factory defaults. Maybe I shall try that some more. They told me to send an RMA if problems persist, but I really do not want to send back the ~10 of these devices with old firmware that I have installed in different locations in the house.

Thank you for your patience guys!

The binding simply polls all devices during startup to establish their current state - so that the UI reflects the current state of the device, and not the state when OH was last running.

If people want to have a system that correctly reflects the state of the device, then I’m not sure of other options, but I’m open to suggestions. OpenHAB expects the device to be

@robmac @chris
I have now turned off reporting on node 4 and verified in zniffer that no more meter reports get sent. Then I rebooted the whole computer running OH. Nothing changed to the better unfortunately, so I will put my Aeon Z-stick back into service and try how that works with polling disabled. I will also order one more UZB3 to replace the Z-stick later. The one I currently own is zniffer.

I don’t know if I am just unlucky or if there are loads and loads of z-wave controllers and devices with buggy firmware out there. Knowing what I know now I would never have invested in Z-wave devices for my electrical installation. Plejd is taking over Scandinavia and it just works. But it is a locked system. If these kind of devices is what the open world has to offer then the future belongs to locked systems unfortunately :frowning:

Will send a report with more troubleshooting info when I have tried the Z-stick again.

I doubt that the firmware is buggy. However, currently all ZWave controllers use fundamentally the same firmware from Sigma/Silabs - that’s likely to change in future, but for now it’s still the way it is.

Unfortunately ZWave isn’t (IMHO of course :slight_smile: ) quite a stable as it once was - there’s a lot of evolution in the standards these days and that leads to confused implementations. The way ZWave introduces new features is often quite muddled compared to other protocols as well. It is however still a very widely supported system and in general there is good compatibility across devices.

2 Likes

No criticism it was a statement of fact. Zwave does not like lots of traffic so it is a stress test.

Not many options but as you asked. For a small network it is not a big issue and as long as you do not restart it is never an issue. For large networks with lots of sleeping nodes it is a bit of a pig.

Some other systems…

Openzwave does the same as the binding so anything based on that does the same.

One option I used to like was homeseer which allows you to mark which ones you are bothered about and it collects only those. I have not used since 2011 but I would be surprised if they removed it.

For Fibaro they don’t bother but it is easy to do something like homeseer. Very easy to write a quick script to get the ones you care about quickly and the rest at a slow pace in waves. This could be done in openHab so not a bad option just lave it to a script.

Both of those options allow you to not bother with many sleeping nodes that tend to be the type that report and if they are regular reporters probably report before they wake anyway and cause the worst of the startup in OpenHab.

lots of bugs in Quibino which is why they offered to let you return.

Lots of people run large zwave networks with no issues but if you get even one bad device it can cause lots of issues.

Do not forget @chris is “handcuffed” to some extent my restrictions & requirements of openHAB.

From my experience, this binding is worlds better than openzwave. I used them with Home Assistant.

The sleeping nodes should not be an issue, and will not participate in the startup medley since they are sleeping :wink: . The binding will not send anything to them until they wake up, which will all be at random times spread over the next hour or two, so they are not in any way participating in the “stress test”.

There’s no doubt that polling all devices on startup can create a considerable amount of traffic and I fully agree with your original point that people shouldn’t get too worried about this. I don’t really think it’s a problem in any way - it just takes a bit of time, but is necessary to ensure that the system is synchronised with reality. In fact this is not even part of the binding since at the moment openHAB forces this by sending a REFRESH command to all channels on startup.

@chris @robmac

Hi guys,

I have now used the Aeotec Z-stick again for a day with polling and power reports turned off for all devices. I must say that my network is more stable now, but there are still often huge hiccups that I find not acceptable. For example I tried turning on a fibaro dimmer just now and it took about 30-40 seconds for it to come on. Then I tried turning on another light (qubino dimmer) in the same room and it also experienced a latency of 10-20 sec. I then went to my PC to take a log when doing the same thing for a third light (also qubino dimmer).

The logs are here node13_turn_on_off.log (36.5 KB) and here
node13_turn_on_off_zlf.txt (13.2 KB) (rename to .zlf).

Unfortunately zniffer experienced a CRC error on one of the packets to node 13, but you can see it getting ACK:ed (packets #27 and #28).

So, you can see in the OH log that I give the ON command at 17:03:15.958, the binding sends a packet to the controller at 17:03:15.975, the binding sends an abort to the controller at 17:03:20.988, and then at 17:04:16.883 in the zniffer log the stupid controller wakes up when receiving an HAIL from node 11. It sends the message to node 13 at 17:04:16.937, about 61 seconds after the binding requested it. And during the first 60 seconds the network was totally quiet.

I would love to know what the controller does during those 60 seconds when it says “fuck you I won’t do what you tell me”. Words can not express my feelings for that little stick right now :slight_smile:

/Niclas

Btw: Node 11 send these HAILs regularly. Often with just a few minutes between them. I don’t know why. It is an Aeotec Nano Switch. It makes the controller poll its state even though nothing has changed.

I’ll point out 60 seconds of jammed may look much the same.

1 Like

Node 11

try setting parameter 80 to 2 so it sends a basic rather than hail. Basic is better supported by most.

Hail is deprecated and while it should still be supported who knows if it is properly supported by the z-stick well. Worth testing if it is this hail that causes the issue on the stick.

1 Like

@rossko57 Thanks. I will investigate this by turning things off the next time it happens .

@robmac Thanks. I did that after I wrote my last post so now it just sends basic reports regularly which means slightly less traffic. However last evening the network stopped entirely. When I tried controlling lights the controller didn’t send anything out, and when node 11 sent its regular reports then node 1 didn’t ACK them. Also when it tried routing through node 4 (next to the PC with the Z-stick) then node 4 didn’t forward it either. Don’t know if node 4 didn’t receive the messages from node 11 or if it couldn’t respond because of jammed as rossko said. Didn’t help to reboot the PC running OH. It started working again after an hour or so. Next time it happens I will try to turn things off to see if there is some nearby node going crazy and jamming everything around it.

Did you get a zniffer trace?

Hello again!

I am in touch with an Aeotec FAE regarding my horrible Z-wave network with my freezing Z-Stick Gen5. I received this question from him:

I don’t think it is a firmware bug of the Z-Stick Gen5 in this case, but i do notice that NOP commands take a longer time to drop, while the Z-Stick Gen5 tries to process them, it could freeze the Z-Stick Gen5.

Is there anyways that you are able to disable NOP from being sent out? This may help assist with keeping your Z-Stick Gen5 from freezing.

​Any time freezing happens, i always notice that NOP commands are the ones being issued.

I don’t think he is on the right track, because freezing occur without NOPs being sent as well, but to make him happy I’ve got to ask you @chris: Is there any way I can turn off NOPs being sent?

I know nothing about their purpose.

Why would you want to do this? It is required to be implemented in all devices - it’s the first command that is sent - if we disable the first command, then another command will become the first command sent, and if there’s a problem, it will also fail.

I expect that this is not an issue with the NOP, it’s a problem with communication. The NOP failing is just because it’s the first command sent.

As I said I know nothing about their purpose, but thanks for the clarification :slight_smile:
Apparently he has seen something in my zniffer logs (or in some lab testing?) that makes him believe that the NOPs are making the Z-Stick freeze.

I have no visible problems with communication in the zniffer log. Signal strength is good and devices close to the controller get just as much trouble as those far away. It just seems that the controller freezes every time it can find an excuse to do so. For example when a node is turned off and not responding. It is as if it can only hold one single transaction in its queue at a time, and when trying to get something through to a node that is not responding, it just stops responding to anything else for minutes.

Anyone know how I can investigate if there is any crap in the air that is jamming everything but does not show up in the zniffer log? I think there may be several issues here. Of course a robust and correctly designed controller should not make the whole network unresponsive for 30 minutes just because a node is not responding. But other times, when the problem is not because of a node being turned off, I suspect there may be noise.

I tried to get the FAE to answer if Z-wave devices listens for a “quiet channel” before sending to avoid collisions, or if they just send stuff whenever they feel like it and wait for the ACK to tell if it succeeded. Also asked if the zniffer could be made to show when the channel is “busy” instead of just when it detects valid and un-corrupted z-wave-traffic, but he didn’t seem to know much about communications unfortunately.