ZWave: Timeout while sending message. Requeueing

I’m getting constant error messages (every ~40 seconds) like so:

 ==> /var/log/openhab2/openhab.log <==    
2017-03-30 20:24:51.088 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Timeout while sending message. Requeueing - 2 attempts left!
2017-03-30 20:24:51.088 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 3: Got an error while sending data. Resending message.
2017-03-30 20:25:09.478 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Timeout while sending message. Requeueing - 1 attempts left!
2017-03-30 20:25:09.539 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 3: Got an error while sending data. Resending message.
2017-03-30 20:25:15.278 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Timeout while sending message. Requeueing - 0 attempts left!
2017-03-30 20:25:15.278 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 3: Got an error while sending data. Resending message.
2017-03-30 20:25:20.361 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Too many retries. Discarding message: Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=3, callback=67, payload=03 05 71 04 00 07 00 

Doesn’t appear to be impacting anything, messages are still coming through ok around the same timestamps:

==> /var/log/openhab2/events.log <==
2017-03-30 20:25:26.262 [ItemStateChangedEvent     ] - GarageSensor1Temperature changed from 22.3 to 22.2
2017-03-30 20:26:56.566 [ItemStateChangedEvent     ] - GarageSensor1Temperature changed from 22.2 to 22.3

Device is an Aeon MultiSensor 6. If I powercycle it, it appears to come right for a couple of hours but then starts again.

I’m running VMWare, USB passthrough, with an Aeon ZWave Stick.

Just wondering where to start trouble-shooting what is going on?

You should enable debug logging - this will log the exact message that is timing out and we can then look at why this is happening.

Could it be related to the problem I had on OH2 2.0.0 Stable?

Problem went away with nightly snapshot.

@chris will do that, I reset everything and it’s come right (as it normally does), need to wait for it to fail again to collect the logs…

@OMR thanks for the info. Will have a look at that as well.

Debug log here:

16:18:44.853 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Timeout while sending message. Requeueing - 2 attempts left!
16:18:44.853 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 3: Got an error while sending data. Resending message.
16:18:44.853 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 2. Queue={}
16:18:44.854 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 1
16:18:44.854 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 0C 00 13 03 05 71 04 00 07 00 25 B0 01 
16:18:44.854 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 3: Sending REQUEST Message = 01 0C 00 13 03 05 71 04 00 07 00 25 B0 01 
16:18:44.954 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 04 01 13 01 E8 
16:18:44.957 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
16:18:44.957 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 04 01 13 01 E8 
16:18:44.957 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 04 01 13 01 E8 
16:18:44.957 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Response[0x01], priority=High, dest=255, callback=0, payload=01 
16:18:44.957 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 3: Sent Data successfully placed on stack.
16:18:44.978 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 07 00 13 B0 00 00 02 59 
16:18:44.981 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
16:18:44.981 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 09 00 13 B0 00 00 02 00 00 57 
16:18:44.981 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 09 00 13 B0 00 00 02 00 00 57 
16:18:44.981 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=B0 00 00 02 
16:18:44.981 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 3: SendData Request. CallBack ID = 176, Status = Transmission complete and ACK received(0)
16:18:44.981 [DEBUG] [alization.ZWaveNodeInitStageAdvancer] - NODE 3: Starting initialisation from DONE
16:18:44.981 [DEBUG] [ve.internal.protocol.ZWaveController] - Event Listener org.openhab.binding.zwave.internal.protocol.initialization.ZWaveNodeInitStageAdvancer@f965029 already registered
16:18:44.981 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Sent Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=3, callback=176, payload=03 05 71 04 00 07 00 
16:18:44.981 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Recv Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=B0 00 00 02 
16:18:44.981 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: class=SendData, callback id=176, expected=ApplicationCommandHandler, cancelled=false      MISMATCH
16:18:49.941 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Timeout while sending message. Requeueing - 1 attempts left!
16:18:49.941 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 3: Got an error while sending data. Resending message.
16:18:49.941 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 2. Queue={}
16:18:49.941 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 1
16:18:49.941 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 0C 00 13 03 05 71 04 00 07 00 25 B1 00 
16:18:49.941 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 3: Sending REQUEST Message = 01 0C 00 13 03 05 71 04 00 07 00 25 B1 00 
16:18:50.034 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 04 01 13 01 E8 
16:18:50.038 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
16:18:50.038 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 04 01 13 01 E8 
16:18:50.038 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 04 01 13 01 E8 
16:18:50.038 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Response[0x01], priority=High, dest=255, callback=0, payload=01 
16:18:50.038 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 3: Sent Data successfully placed on stack.
16:18:50.061 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 07 00 13 B1 00 00 02 58 
16:18:50.063 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
16:18:50.063 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 09 00 13 B1 00 00 02 00 00 56 
16:18:50.064 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 09 00 13 B1 00 00 02 00 00 56 
16:18:50.064 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=B1 00 00 02 
16:18:50.064 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 3: SendData Request. CallBack ID = 177, Status = Transmission complete and ACK received(0)
16:18:50.064 [DEBUG] [alization.ZWaveNodeInitStageAdvancer] - NODE 3: Starting initialisation from DONE
16:18:50.064 [DEBUG] [ve.internal.protocol.ZWaveController] - Event Listener org.openhab.binding.zwave.internal.protocol.initialization.ZWaveNodeInitStageAdvancer@f965029 already registered
16:18:50.064 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Sent Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=3, callback=177, payload=03 05 71 04 00 07 00 
16:18:50.064 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Recv Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=B1 00 00 02 
16:18:50.064 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: class=SendData, callback id=177, expected=ApplicationCommandHandler, cancelled=false      MISMATCH
16:18:55.020 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Timeout while sending message. Requeueing - 0 attempts left!
16:18:55.020 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 3: Got an error while sending data. Resending message.
16:18:55.029 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 2. Queue={}
16:18:55.029 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 1
16:18:55.029 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 0C 00 13 03 05 71 04 00 07 00 25 B2 03 
16:18:55.029 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 3: Sending REQUEST Message = 01 0C 00 13 03 05 71 04 00 07 00 25 B2 03 
16:18:55.121 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 04 01 13 01 E8 
16:18:55.125 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
16:18:55.125 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 04 01 13 01 E8 
16:18:55.125 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 04 01 13 01 E8 
16:18:55.125 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Response[0x01], priority=High, dest=255, callback=0, payload=01 
16:18:55.125 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 3: Sent Data successfully placed on stack.
16:18:55.142 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 07 00 13 B2 00 00 03 5A 
16:18:55.143 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
16:18:55.143 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 09 00 13 B2 00 00 03 00 00 54 
16:18:55.143 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 09 00 13 B2 00 00 03 00 00 54 
16:18:55.143 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=B2 00 00 03 
16:18:55.143 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 3: SendData Request. CallBack ID = 178, Status = Transmission complete and ACK received(0)
16:18:55.143 [DEBUG] [alization.ZWaveNodeInitStageAdvancer] - NODE 3: Starting initialisation from DONE
16:18:55.143 [DEBUG] [ve.internal.protocol.ZWaveController] - Event Listener org.openhab.binding.zwave.internal.protocol.initialization.ZWaveNodeInitStageAdvancer@f965029 already registered
16:18:55.143 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Sent Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=3, callback=178, payload=03 05 71 04 00 07 00 
16:18:55.143 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Recv Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=B2 00 00 03 
16:18:55.144 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: class=SendData, callback id=178, expected=ApplicationCommandHandler, cancelled=false      MISMATCH
16:19:00.106 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - NODE 3: Too many retries. Discarding message: Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=3, callback=178, payload=03 05 71 04 00 07 00 

Anything make sense here?

I have been having very similar problems recently but I have made some changes to my system that have reduced the frequency of the problem considerably.

I have openHAB 2.0 running on Ubuntu 16.10 as a VM on VMware ESXi 6.0u3. I use USB passthrough to passthrough an Aeotec Zwave stick S2, an RFXtrx433e (433MHz), a CUL (can’t remember the exact version) (868Mhz) and a bluetooth stick. The issue would manifest itself by a complete failure of Zwave comms via the Aeotec Zwave USB stick S2. Everything would behave normally for between a couple of hours and half a day but then one of my Zwave devices would timeout with similar messages to yours, then another, and another and within a few minutes the USB stick wouldn’t be able to send any messages, all attempts timed out. I noticed that the frequency of this behaviour increased significantly after I’d upgrade from VMware ESXi 6.0u3 to 6.5. Some googling showed that VMware have changed the USB stack in v6.5 and it had caused problems for lots of users. It is possible to force ESXi to use the old driver by issuing the command listed in the following knowledgebase article:

knowledgebase article 2147650

I tried that and while it helped by extending the working life of the Zwave binding after a reboot of the Ubuntu server I have running openHAB, the problem would still occur. I reverted to ESXi 6.0u3.

Maybe there was too much Zwave traffic on my system and this was swamping the Zstick? Or maybe running openHAB on a system running on ESXi and using USB passthrough reduced the amount of traffic the USB stick could cope with before dying? In an effort to reduce the amount of Zwave traffic I looked at the config parameters of the 4 Aeotec Multisensor6 devices I have and reduced the poll time to 1 hour on each of them. I had reduced it to 10 minutes in the past when I was first setting up my network. This extended the “working life” of the binding to around 24 hours but the problem would still occur and eventually all nodes would be declared dead.

Progress, but not perfect. Further investigation of the logs revealed lots of references to dead nodes, I thought that maybe the system was sending messages to dead nodes and somewhere along the line a backlog was growing for responses that would never arrive. I physically removed a Duwi ceiling light dimmer that no longer worked and I used the Zensys Windows software to remove nonexistant nodes from the Zwave USB stick. These nonexistant nodes were due to devices I’d factory reset without excluding from the Zstick. This made a huge difference! As of writing my openHAB server has been running for over two and a half days without problems, there is still the occasional message in the log about a dead node but this seems to be a temporary issue and resolves itself.

TL;DR

Don’t use ESXi 6.5, use 6.0u3
Ensure that the polling period on all your devices is set to at least 60 minutes
Remove any missing nodes from your Zstick using the Zensys software

Interesting, thanks for the detail.

I am using VMWare 6.5 but I had disabled the new USB drivers. I think I’ll setup a fresh OpenHab install on tin rather than in a VM and see what happens.

That would be a good diagnostic. Alternatively you could install ESXi 6.0 and preserve the datastore. If the openHAB VM is at a hardware version greater than 11 then ESXi 6.0 won’t be able to import it. You might be able to create a new VM in ESXi 6.0 and then copy the VMDK file from the openHAB VM to the new VM though. I’ve not tested that so I can’t be sure.

I thought I was being smart and used the latest VM hardware versions I could when setting up :slight_smile:

I configured everything on an RPI2 last night and didn’t have any timeouts, but now I have the issue that my AU/NZ band versions of the zwave devices aren’t recognised (I took the opportunity to do firmware upgrades and that bumped the identification). Once the binding has been updated with the latest database I’ll take another look.

My long term plan B here might be a dedicated RPI with OpenHab/ZWave/MQTT configured, so I can keep my main OpenHab in ESX.

Do you mean you’ll have the Pi acting like a sort of hardware interface and use it to pass messages on to another OH2 server running on a VM that will contain all the logic for your system? Sounds like a bit of a PITA. :slight_smile: You’d have to create items in two places, one from a Thing on the Pi and one manually on the master server.

What are your reasons for wanting two servers rather than run everything on the Pi? If you’re concerned about performance of running a DB (mysql/influxdb/etc) on the Pi you could run the DB on a VM and have OH on the Pi connect to it remotely.

To follow up on the uptime of my OH system. It’s been up for over 4 days now with six separate sets of transient timeouts:

2017-04-08 01:48:25.762 [ZWaveNode                 ] - NODE 8: Node is DEAD.
2017-04-08 01:48:25.763 [ZWaveController           ] - NODE 8: Node Status event - Node is DEAD
2017-04-08 01:48:25.763 [SendDataMessageClass      ] - NODE 8: Node is DEAD. Dropping message.
2017-04-08 01:48:26.946 [SendDataMessageClass      ] - NODE 8: Node is DEAD. Dropping message.
2017-04-08 01:48:31.954 [SendDataMessageClass      ] - NODE 8: Node is DEAD. Dropping message.
2017-04-08 01:48:37.182 [SendDataMessageClass      ] - NODE 8: Node is DEAD. Dropping message.
2017-04-08 02:22:17.205 [ZWaveNode                 ] - NODE 6: Node is DEAD.
2017-04-08 02:22:17.205 [ZWaveController           ] - NODE 6: Node Status event - Node is DEAD
2017-04-08 02:22:17.205 [SendDataMessageClass      ] - NODE 6: Node is DEAD. Dropping message.
2017-04-08 02:22:17.589 [icationCommandMessageClass] - NODE 6: Application Command Request (DEAD:DONE)
2017-04-08 18:31:05.762 [ZWaveNode                 ] - NODE 36: Node is DEAD.
2017-04-08 18:31:05.763 [ZWaveController           ] - NODE 36: Node Status event - Node is DEAD
2017-04-08 18:31:05.763 [SendDataMessageClass      ] - NODE 36: Node is DEAD. Dropping message.
2017-04-08 18:31:10.778 [SendDataMessageClass      ] - NODE 36: Node is DEAD. Dropping message.
2017-04-09 12:22:41.800 [ZWaveNode                 ] - NODE 26: Node is DEAD.
2017-04-09 12:22:41.800 [ZWaveController           ] - NODE 26: Node Status event - Node is DEAD
2017-04-09 12:22:41.800 [SendDataMessageClass      ] - NODE 26: Node is DEAD. Dropping message.
2017-04-10 17:22:16.931 [ZWaveNode                 ] - NODE 6: Node is DEAD.
2017-04-10 17:22:16.932 [ZWaveController           ] - NODE 6: Node Status event - Node is DEAD
2017-04-10 17:22:16.932 [SendDataMessageClass      ] - NODE 6: Node is DEAD. Dropping message.
2017-04-12 11:22:31.252 [ZWaveNode                 ] - NODE 26: Node is DEAD.
2017-04-12 11:22:31.252 [ZWaveController           ] - NODE 26: Node Status event - Node is DEAD
2017-04-12 11:22:31.252 [SendDataMessageClass      ] - NODE 26: Node is DEAD. Dropping message.
2017-04-12 11:22:31.285 [icationCommandMessageClass] - NODE 26: Application Command Request (DEAD:DONE)

They seem temporary and all nodes are currently working.

Definitely a PITA, but hopefully a temporary one until VMware sort their shit out. Would prefer to maintain my “core” OpenHab in VMWare and the RPI be relatively disposable.

@higgers, I’ve been using the April 15th 2.1.0 snapshot of the ZWave binding (I had to use it to get a database update, details here) and I’ve not had these issues since. Would suggest giving that a go?

Unsure if @chris could comment on any potential changes to the binding that would have resolved this but it’s definitely solved my problems, potentially unintentionally.

There probably haven’t been any code changes in this version - in general the snapshot version is not being updated at the moment. The only changes therefore are likely to be database updates - if there was a problem with the definition of a device, and all the timeouts were always from this device, then this could be the reason. It’s going to be very device dependant though.