Intermittent zwave issues - how to track down?

OH 4.3.8; docker hosted on Linux openhab4 6.6.51+rpt-rpi-v8 #1 SMP PREEMPT Debian 1:6.6.51-1+rpt3 (2024-10-08) aarch64

I have about two dozen ZWave devices that have been mostly reliable. Recently, it’s become hit or miss whether the “on at sunset” devices come on or not. Generally, the switch item shows in OH as “ON”, and when we notice that the actual devices is not on, switching the OH switch to OFF and then back on a moment later results in the device coming on.

Sunset is not the only time failures happen, but it’s dependably unreliable for the ZWave devices. There are Insteon and Lutron Caseta devices updated by the same rules, and they are for the most part reliable.

This began approximately when I upgraded from 3.4 to 4.3.8 – but there was one new ZWave device (node 23) added, and two others slightly relocated in the same rough timeframe, so I can’t say it was the upgrade.

My ZWave log has been set at Info, and there has only been about 1 message logged per day over the past week. I set it to Debug an hour or two ago, and happened to catch an issue with NODE 23

The following log was captured when an OFF command was sent and executed by the device, but a subsequent ON command was not. An OFF followed by a pause and another ON eventually turned the device back on. The ZWave map briefly showed the device (and one downstream from it) offline during this time.

# tail -f /var/log/openhab/openhab.log | grep -i "NODE 23:"
2025-12-31 21:17:17.297 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Command received zwave:device:f6d34df4:node23:switch_binary --> OFF [OnOffType]
2025-12-31 21:17:17.298 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 23: Creating new message for application command SWITCH_BINARY_SET
2025-12-31 21:17:17.299 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: SECURITY not supported
2025-12-31 21:17:17.299 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Command Class COMMAND_CLASS_SWITCH_BINARY is NOT required to be secured
2025-12-31 21:17:17.300 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Adding to device queue
2025-12-31 21:17:17.300 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Added 3693 to queue - size 4
2025-12-31 21:17:17.302 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 23: Sending REQUEST Message = 01 0A 00 13 17 03 25 01 00 25 D7 24 
2025-12-31 21:17:17.304 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling initialised at 86400 seconds - start in 1500 milliseconds.
2025-12-31 21:17:17.316 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 23: sentData successfully placed on stack.
2025-12-31 21:17:17.317 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: TID 3693: Transaction not completed
2025-12-31 21:17:18.804 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling...
2025-12-31 21:17:18.805 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling deferred until initialisation complete
2025-12-31 21:17:22.317 [DEBUG] [sactionManager$ZWaveTransactionTimer] - NODE 23: TID 3693: Timeout at state WAIT_REQUEST. 3 retries remaining.
2025-12-31 21:17:34.319 [DEBUG] [sactionManager$ZWaveTransactionTimer] - NODE 23: TID 3693: Timeout at state ABORTED. 3 retries remaining.
2025-12-31 21:17:34.320 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveTransactionCompletedEvent
2025-12-31 21:17:34.321 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: notifyTransactionResponse TID:3693 CANCELLED
2025-12-31 21:17:41.169 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Command received zwave:device:f6d34df4:node23:switch_binary --> OFF [OnOffType]
2025-12-31 21:17:41.170 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 23: Creating new message for application command SWITCH_BINARY_SET
2025-12-31 21:17:41.170 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: SECURITY not supported
2025-12-31 21:17:41.171 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Command Class COMMAND_CLASS_SWITCH_BINARY is NOT required to be secured
2025-12-31 21:17:41.171 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Adding to device queue
2025-12-31 21:17:41.171 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Added 3694 to queue - size 4
2025-12-31 21:17:41.172 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 23: Sending REQUEST Message = 01 0A 00 13 17 03 25 01 00 25 D8 2B 
2025-12-31 21:17:41.173 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling initialised at 86400 seconds - start in 1500 milliseconds.
2025-12-31 21:17:42.673 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling...
2025-12-31 21:17:42.674 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling deferred until initialisation complete
2025-12-31 21:17:43.173 [DEBUG] [sactionManager$ZWaveTransactionTimer] - NODE 23: TID 3694: Timeout at state WAIT_RESPONSE. 3 retries remaining.
2025-12-31 21:17:55.174 [DEBUG] [sactionManager$ZWaveTransactionTimer] - NODE 23: TID 3694: Timeout at state ABORTED. 3 retries remaining.
2025-12-31 21:17:55.175 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveTransactionCompletedEvent
2025-12-31 21:17:55.175 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: notifyTransactionResponse TID:3694 CANCELLED
2025-12-31 21:18:36.197 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Application Command Request (DEAD:DELETE_SUC_ROUTES)
2025-12-31 21:18:36.198 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveNodeStatusEvent
2025-12-31 21:18:36.198 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Setting ONLINE
2025-12-31 21:18:36.198 [DEBUG] [ve.internal.protocol.ZWaveController] - NODE 23: Node Status event - Node is ALIVE
2025-12-31 21:18:36.198 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Node is ALIVE. Init stage is DELETE_SUC_ROUTES.
2025-12-31 21:18:36.199 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveNodeStatusEvent
2025-12-31 21:18:36.199 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Setting ONLINE
2025-12-31 21:18:36.199 [DEBUG] [ve.internal.protocol.ZWaveController] - NODE 23: Node Status event - Node is ALIVE
2025-12-31 21:18:36.200 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Incoming command class COMMAND_CLASS_SWITCH_BINARY, endpoint 0
2025-12-31 21:18:36.200 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: SECURITY not supported
2025-12-31 21:18:36.200 [DEBUG] [tocol.commandclass.ZWaveCommandClass] - NODE 23: Received COMMAND_CLASS_SWITCH_BINARY V1 SWITCH_BINARY_REPORT
2025-12-31 21:18:36.201 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 23: Switch Binary report, value = 0
2025-12-31 21:18:36.201 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveCommandClassValueEvent
2025-12-31 21:18:36.201 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got a value event from Z-Wave network, endpoint=0, command class=COMMAND_CLASS_SWITCH_BINARY, value=0
2025-12-31 21:18:36.201 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Updating channel state zwave:device:f6d34df4:node23:switch_binary to OFF [OnOffType]
2025-12-31 21:18:36.202 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Commands processed 1.
2025-12-31 21:18:36.202 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Checking command org.openhab.binding.zwave.internal.protocol.ZWaveCommandClassPayload@6a1634cc.
2025-12-31 21:19:21.350 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Command received zwave:device:f6d34df4:node23:switch_binary --> ON [OnOffType]
2025-12-31 21:19:21.351 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 23: Creating new message for application command SWITCH_BINARY_SET
2025-12-31 21:19:21.351 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: SECURITY not supported
2025-12-31 21:19:21.351 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Command Class COMMAND_CLASS_SWITCH_BINARY is NOT required to be secured
2025-12-31 21:19:21.351 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Adding to device queue
2025-12-31 21:19:21.352 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Added 3695 to queue - size 4
2025-12-31 21:19:21.352 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 23: Sending REQUEST Message = 01 0A 00 13 17 03 25 01 FF 25 D9 D5 
2025-12-31 21:19:21.353 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling initialised at 86400 seconds - start in 1500 milliseconds.
2025-12-31 21:19:21.362 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 23: sentData successfully placed on stack.
2025-12-31 21:19:21.362 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: TID 3695: Transaction not completed
2025-12-31 21:19:21.504 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 23: SendData Request. CallBack ID = 217, Status = Transmission complete and ACK received(0)
2025-12-31 21:19:21.504 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Response processed after 151ms
2025-12-31 21:19:21.504 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: TID 3695: Transaction completed
2025-12-31 21:19:21.505 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: notifyTransactionResponse TID:3695 DONE
2025-12-31 21:19:21.505 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveTransactionCompletedEvent
2025-12-31 21:19:21.632 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Application Command Request (ALIVE:DELETE_SUC_ROUTES)
2025-12-31 21:19:21.632 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: Incoming command class COMMAND_CLASS_SWITCH_BINARY, endpoint 0
2025-12-31 21:19:21.632 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 23: SECURITY not supported
2025-12-31 21:19:21.632 [DEBUG] [tocol.commandclass.ZWaveCommandClass] - NODE 23: Received COMMAND_CLASS_SWITCH_BINARY V1 SWITCH_BINARY_REPORT
2025-12-31 21:19:21.632 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 23: Switch Binary report, value = 255
2025-12-31 21:19:21.632 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got an event from Z-Wave network: ZWaveCommandClassValueEvent
2025-12-31 21:19:21.633 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Got a value event from Z-Wave network, endpoint=0, command class=COMMAND_CLASS_SWITCH_BINARY, value=255
2025-12-31 21:19:21.633 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Updating channel state zwave:device:f6d34df4:node23:switch_binary to ON [OnOffType]
2025-12-31 21:19:21.633 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Commands processed 1.
2025-12-31 21:19:21.633 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Checking command org.openhab.binding.zwave.internal.protocol.ZWaveCommandClassPayload@739434b4.
2025-12-31 21:19:22.853 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling...
2025-12-31 21:19:22.854 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Polling deferred until initialisation complete
^C
root@openhab4:~# 
```

The event log during this window showed this:

# grep "21:1[789]" /var/log/openhab/events.log | grep "Cove"
2025-12-31 18:21:18.168 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'iPhoneCT_AccessPoint' changed from NorthCove to Kitchen
2025-12-31 21:17:17.288 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'Cove_Yard_Lights_Switch' received command OFF
2025-12-31 21:17:17.289 [INFO ] [penhab.event.ItemStatePredictedEvent] - Item 'Cove_Yard_Lights_Switch' predicted to become ON
2025-12-31 21:17:41.165 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'Cove_Yard_Lights_Switch' received command OFF
2025-12-31 21:17:41.166 [INFO ] [penhab.event.ItemStatePredictedEvent] - Item 'Cove_Yard_Lights_Switch' predicted to become ON
2025-12-31 21:18:36.203 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Cove_Yard_Lights_Switch' changed from ON to OFF
2025-12-31 21:19:21.344 [INFO ] [openhab.event.ItemCommandEvent      ] - Item 'Cove_Yard_Lights_Switch' received command ON
2025-12-31 21:19:21.346 [INFO ] [penhab.event.ItemStatePredictedEvent] - Item 'Cove_Yard_Lights_Switch' predicted to become ON
2025-12-31 21:19:21.347 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Cove_Yard_Lights_Switch' changed from OFF to ON

```

I plan to reset the log level to Info for the night and set it to debug in advance of tomorrow’s sunset rules to see what else happens during this time.

What other steps can I take to get to the bottom of this? I am willing to upgrade to 5.0 (it feels like 5.1 needs a minute yet) if it seems it would help, although the release notes don’t suggest any ZWave fixes that might apply.

My guess is network congestion and/or problems with getting ZW radio signals to Node23. A couple of random comments;

  1. There should not be any INFO levels from ZW except on startup. If you are getting more, possibly there is a power issue.

  2. A couple of things from the Debug

2025-12-31 21:17:17.297 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 23: Command received zwave:device:f6d34df4:node23:switch_binary --> OFF [OnOffType]
2025-12-31 21:18:36.197 [DEBUG] [nal.protocol.ZWaveTransactionManager] - NODE 23: Application Command Request (DEAD:DELETE_SUC_ROUTES)

a) Note the times. The command was sent (twice) without an answer and then declared dead 1 min 19 seconds later

b) The status notes that the device never completed a heal. Also the stage indicates that the old routings were deleted, but no new routes established.

One thing to try would be to delete the nightly heal. Also possibly a restart to clear out any old commands. If done in debug you could use the debug viewerto look for clues.

Thanks, I’ll turn off the heal, I’ve seen that advice in the past, but never had any issues, so I left it in.

Congestion may be a factor – in writing the original post, I realized that three different rules are creasting ZWave traffic at sunset (each has different “but only if” cases). I also notice that one of my sensors is cresting more traffic then necessary – there’s a report every few seconds, some from aspects I don’t care about.

I’m going to adjust the sunset rules so at least one of them runs on a slightly different trigger after I look at the options astro has.

Edit: heal is now disabled, and one of the sunset rules is now set to end of sunset, the other two are still start. One of them was also triggering at start and end; that should reduct traffic a tad.

Looking at chatty devices is a good start. One issue (that you may or may not have) is with power measurements and triggers using a % trigger. There can be stray watts (0.0 to 0.1 to 0.0) will trigger reports at the microsecond level with even a 100% trigger. Light sensors can also be a problem.

Turns out I have some sensors reporting UV levels, which I’m not interested in for interior spaces, and humidity reports more frequently than needed. Working my way thru the field descriptions to sort out how to reduce/eliminate. I think I’ll try to bash something together to process a log and tabulate what reports are coming from which devices.

A small sample, but by breaking up the sunset rules into START and END of the sunset events, most devices responded tonight. Node 23 is still cranky, and I’m going to try to move it slighly for better reception. It’s line of sight to two other devices, but is not meshed to either of them – it only meshes directly to node 1, which is only about 15 feet away, but the controller is about 18” below ground level in the basement, and this device is at ground level at the far edge of the patio. It’s been there for almost a month, I would hope that it would have noticed the other nearby nodes by now.

It turns out 3 devices were in a scene rather than a typical rule. The ON commands for the three come within 2ms. I’ve reordered the scene to put a non-ZWave device between the first and the second, perhaps that will help a tiny bit as well. There was a dead (no longer connected) device (19) in the scene, which generated some retry traffic. I’ve removed it from the scene and will disable the Thing until I find and reconnect the device.

I still have to get after node 16 to reduce its traffic, and there are a couple other sensors I can dial back a bit as well. Meanwhile, I have indicators in the gui now when these go offline, so that’s a step forward.

These can really mess up traffic, if they were part of a route. Best is to remove from network but maybe disable will work. it is old post, but still worth a review for ideas.