ZWave binding updates

Mark,

Any issues and corrections to the advice would be no problem as I am only advising from a zwave perspective. I have a decade of scars for that but not many from OpenHab. To my luck OpenHab has been the easiest to get working yet.

I like to leave a little polling on my networks as it keeps the latest working route updated and tested. Do you find the same and would you advise it in this case?

I tend to make polling less and less frequent on the regularly used devices and eventual turn it off on regularly used that cause no issues as it is not required on any of the devices I am migrating.

I still have a dozen devices still working on my old network that do require it but they have served me well so will be replaced before migration.

Thanks for the hints both of you. I have already changed polling like you suggested. I have some device included in the network, but out of operation. I will remove them as well. So I will clear any unnecessary things and communications, and I will see.

Yes, I do the same. Polling for battery devices is every 3 hours, and polling for main devices is every 6 hours.

I probably should disable the command repoll on most devices, as most of my devices will report their status.

Always a bit of a risk if they were on the route to anything. While latest working route will update to a path around them, while it gets sorted it may cause nasty things. Also if it is polled for status or controlled it will retry a few times before giving up blocking other traffic while it tries.

Might be. But till today network heal was switched on at every 2 a.m. So network routes were up to date. What is the general experience? Do we need network heal over the night regulary or I can keep it switced off?

In my case, I leave it turned off, as it causes problems with my battery-powered devices. See discussion here.

I have never thought it a good idea unless you have added or removed nodes. One of the strengths of zwave is that it learns the best route based the last route used to successfully communicate with another node.

This is the first method a node uses to try and communicate with a destination. If this fails then, if the option is set, it always is in the OpenHab binding, the node tries other routes based on neighbours. The last it tries is an explorer frame and again the OpenHab binding sets this option. This takes a long time as each try times out.

You can see these options are set for calls in the log file. This was the standard recommendation before the latest release of zwave. The retries are all made within the zwave protocol. A mysterious piece of code that is still not open to prying eyes. If all of the retries fail then the software (openHab Binding) can make more retries and in the case of openHab does. The development guides are not clear on this but Chris has a lot of experience so I am sure in general it is a better choice. It should rarely happen anyway but you did succeed with your many calls.

In a network that has not had any changes the communication settles to mainly the last route used and everything works well and the protocol will rarely make any retries.

Running a heal clears all of the last working routes. The network then has to start learning them again and does not necessarily pick up a good route first time. Particularly on large networks where the command causes a vast amount of traffic.Then you are into retries again while the network settles down.

If you have no issues I would recommend to try not to. If you have issues try to heal just the nodes that cause issues.

I also recommend you stop reading now unless very interested as the next bit is not implemented in the OpenHab binding

In the new version of the zwave SDK the recommendations and implementation in the software supplied have changed slightly.

Rather than calling with all options first time the first call is made with no options set so if the call fails the protocol returns to the software with no retries and therefore a shorter time. The failing message is then put on a lower priority (long) queue and next time the call is made with some options set so the protocol will retry with neighbours. If this fails it is returned to the long queue and only then is it tried with all options including explorer frames. This has the effect of prioritising first attempt calls and failing fast so other messages are not blocked.

It is easy to prove this is a superior strategy by testing the new SDK with a number of nodes, turn one off and control them all starting with the one switched off. The delay before the other nodes react is considerably less than with the older strategy as the protocol return control to the software faster so the messages for the other nodes can be sent as they are in the high priority queue and not blocked by the failed nodes command. After all other nodes have been controlled the system tries the turned off node again a couple of times obviously eventually failing even after an explorer frame.

A friend of mine has just implemented this on a new fork of OZW and reports that it works very well.

If you are very interested you can get a UZB3 and flash it with a zniffer firmware. You can then see all of the network traffic and retries. If you keep having issues this is my next recommendation. It reveals a level of detail a lot greater than the logs as you see all the commands sent by the stick not just the commands going across the serial interface.

2 Likes

:beers:

Many thanks for all of your advices! I will deeper analyse my network behaviour in the next few days and I’ll come back with some updates.

Now my list looks like this:

openhab> threads --list | grep ZWave
342 ? ZWaveReceiveInputThread ? RUNNABLE ? 3421 ? 2468
344 ? ZWaveReceiveProcessorThread ? WAITING ? 2671 ? 2390
480 ? ZWaveNode75Init2019-07-15 06:39:07.957 ? WAITING ? 0 ? 0
492 ? ZWaveNode9Init2019-07-15 06:39:09.587 ? WAITING ? 0 ? 0
511 ? ZWaveNode8Init2019-07-15 06:39:09.603 ? WAITING ? 0 ? 0
523 ? ZWaveNode10Init2019-07-15 06:39:09.618 ? WAITING ? 0 ? 0
525 ? ZWaveNode40Init2019-07-15 06:39:09.618 ? WAITING ? 0 ? 0
535 ? ZWaveNode24Init2019-07-15 06:39:09.634 ? WAITING ? 0 ? 0
161592 ? pipe-grep ZWave ? TIMED_WAITING ? 15 ? 15
openhab>

It was worse before. It does not mean that issue is solved. As it happens accidentally it is hard to judge where am I. But surely it gets better. I need a few more days and I will share my experiences.

Thanks for this robmac, very informative. I have to use command poll because some nodes dont report their power state correctly after a change (ie brightness increase) - this has always been problematic. As my network is rather static now, Im going to disable heals.

Be keen to know if chris has plans to implement these new strategies.

That is a pity they do not report. Aeon labs sensors by any chance? If they are check the firmware.

The thing with the new strategy is that the difference is only when your network has nodes with issues so it should not be a big deal if your network is good and well managed. I have no idea if Chris will make the change but I am not sure I would invest my time when what he has works very well. I am not an expert but I think it would be a big change so a lot of his time when he could be doing other things.

A mixture, Fibaro and Aeotec. Fibaro are the worst

Which one? I have a lot of Fibaro devices, the all work properly.

I thought the Neocam contact sensors I have are even less reliable than Fibaro and Aeotec

I have all 3 Company’s products… Fibaro and Aeotec both are very good. Neo Coolcam is good for short distances. Motion sensor has problems with actual Zwave binding of Neo Coolcam. Polling is an individual requirement in my oppinion. It depends on actual network mainly. It is not important that it is Fibaro or Aeotec. That’s why I wanted to report 2-3 days later that I want to put together our experiences first.

Also surprised you have issues. Devices are generally good. What models and firmware.

Interesting. What scenario would you prefer polling over using association?

In general, polling should be limited, and if associations are available (which is the case with all modern devices), then polling should not be used. This is a common mistake people make, and is a major cause of network performance issues.

I would strong discourage the use of polling at any rate higher than 30 or 60 minutes. This short of low rate polling is (IMHO) ok, and used by the binding to ensure that devices are still available.

FGD212 Dimmer 2

Still lags in power reporting. Turn the light off, and it shows 30watts being sent for example.