Z-Wave binding queue "stuck" during startup

apica · November 2, 2020, 9:08am

Hi,

I’m having problems with an installation of 105 Things (82 real Zwave nodes, 15 of them battery powered). There are no “ghost” nodes in the Z-Wave chip, so the installation should be clean.

The problem is that after a reboot of OH, sometimes the installation works well (after leaving ~15min to setup) but sometimes the zwave queue gets filled and is never flushed even though I do not send any commands from OH during this period. Here is the zwave.log of different startups.

I found a similar post were the solution seems to use less zwave devices, so it seems that I reached “the limit” of the binding somehow.

Is that true? Is the only possible solution to split the installation into different Zwave nets? It seems strange that the queue is not “processed” by the binding eventually.

Thank you

5iver · November 2, 2020, 10:50am

Try disabling the daily network heal (found in the controller Thing configuration). There are several topics discussing this in the forum.

apica · November 2, 2020, 10:54am

Hi!

The healing is already disabled to try to minimize the communications. All you see in the log is the “startup” of the binding: Pinging the devices when OH starts to check if they are alive and getting its latest value. This “initial process” cannot be disabled as far as I know.

Thank you

mhilbush · November 2, 2020, 3:22pm

You probably should post a complete debug log showing the entire process from binding startup until it gets stuck. If the log is too big, put on a sharing service and post a link to it.

apica · November 2, 2020, 4:12pm

Hi Mark,

The log is attached in the original post > Here is the zwave.log of different startups. Is that what you mean?

Thank you

mhilbush · November 2, 2020, 5:13pm

Sorry, I missed that. I took a pretty quick look, and I didn’t see anything obvious. So it’s likely @chris will need to look at it.

One thing I did notice, however… The log file doesn’t go all the way back to the binding startup. My guess is that @chris will want to see the log all the way back to the startup of the binding.

I’m not sure if this is relevant, but this timeout message occurred pretty early on and pretty often (1537 occurrences to be exact). So it might be interesting to see what was going on before that first one occurred. My network is not quite as big as yours (74 nodes), but I never see that particular timeout message (i.e. Timeout at state WAIT_RESPONSE) in my logs.

2020-11-01 20:11:04.799 [DEBUG] [ocol.ZWaveTransactionManager$ZWaveTransactionTimer] - NODE 255: TID 16: Timeout at state WAIT_RESPONSE. 3 retries remaining.
2020-11-01 20:11:05.407 [DEBUG] [ocol.ZWaveTransactionManager$ZWaveTransactionTimer] - TID 16: Transaction is current transaction, so clearing!!!!!
2020-11-01 20:11:05.408 [DEBUG] [b.binding.zwave.internal.protocol.ZWaveTransaction] - TID 16: Transaction CANCELLED
2020-11-01 20:11:05.408 [DEBUG] [ng.zwave.internal.protocol.ZWaveTransactionManager] - NODE 255: notifyTransactionResponse TID:16 CANCELLED

Sorry I wasn’t able to offer more help.

5iver · November 3, 2020, 5:06am

If you have had the heal turned off for a while, try manually healing the mains powered devices. I also suggest power cycling every mains poiwered device (flipping the breaker works) and checking the batteries in every battery powered device. The use of a Z-Wave sniffer would help in troubleshooting further. I had a similar issue that I was able to resolve with a sniffer, which basically showed that some of my devices needed a heal.