ZWave nodes toggling between online and offline

I am trying to move from Homeseer to OpenHab2 so I can run it on a RaspberryPI3. I have my Things/Items all setup (using PaperUI mostly), as well as some basic rules, and they appear to be working fine most of the time. Unfortunately, my nodes will occasionally go offline, then come back on then go off again. My ZWave stick is a HomeSeer SmartStick+. I’m running OpenHAB 2.2.0-1 (release build).

I turned on the DEBUG log for the zwave binding and am including the relevant lines here (I’m new and can’t attach). You can see NODE 9 toggle from offline to online and then back off again.

2017-12-29 08:09:43.529 [vent.ItemStateChangedEvent] - zwave_serial_zstick_6bbcc6ee_serial_sof changed from 12234 to 12235
2017-12-29 08:09:43.913 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 05 00 13 E6 00 0F 
2017-12-29 08:09:43.920 [vent.ItemStateChangedEvent] - zwave_serial_zstick_6bbcc6ee_serial_sof changed from 12235 to 12236
2017-12-29 08:09:43.923 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
2017-12-29 08:09:43.926 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 07 00 13 E6 00 00 00 0D 
2017-12-29 08:09:43.928 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 07 00 13 E6 00 00 00 0D 
2017-12-29 08:09:43.930 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Request[0x00], priority=High, dest=255, callback=0, payload=E6 00 
2017-12-29 08:09:43.932 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 9: SendData Request. CallBack ID = 230, Status = Transmission complete and ACK received(0)
2017-12-29 08:09:43.934 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 9: Node is ALIVE. Init stage is DONE.
2017-12-29 08:09:43.936 [DEBUG] [ve.internal.protocol.ZWaveController] - Notifying event listeners: ZWaveNodeStatusEvent
2017-12-29 08:09:43.938 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 9: Got an event from Z-Wave network: ZWaveNodeStatusEvent
2017-12-29 08:09:43.940 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 9: Setting ONLINE
2017-12-29 08:09:43.944 [DEBUG] [ve.internal.protocol.ZWaveController] - NODE 9: Node Status event - Node is ALIVE
2017-12-29 08:09:43.947 [DEBUG] [.serialmessage.ZWaveCommandProcessor] - Checking transaction complete: Message has Ack Pending: Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=9, callback=230, payload=09 02 25 02 
2017-12-29 08:09:43.950 [hingStatusInfoChangedEvent] - 'zwave:device:6bbcc6ee:node9' changed from OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller to ONLINE
2017-12-29 08:09:45.689 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 03 00 16 EA 
2017-12-29 08:09:45.691 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - NODE 9: Sending ABORT Message = 01 03 00 16 EA 
2017-12-29 08:09:45.694 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 03 00 16 EA 
2017-12-29 08:09:45.698 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 255: Sending REQUEST Message = 01 03 00 16 EA 
2017-12-29 08:09:45.700 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - NODE 9: Timeout while sending message. Requeueing - 2 attempts left!
2017-12-29 08:09:45.701 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 9: Got an error while sending data. Resending message.
2017-12-29 08:09:45.704 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 1. Queue={}
2017-12-29 08:09:45.709 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 0
2017-12-29 08:09:45.713 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 09 00 13 09 02 25 02 25 E7 0B 
2017-12-29 08:09:45.714 [vent.ItemStateChangedEvent] - zwave_serial_zstick_6bbcc6ee_serial_ack changed from 5821 to 5822
2017-12-29 08:09:45.717 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 9: Sending REQUEST Message = 01 09 00 13 09 02 25 02 25 E7 0B 
2017-12-29 08:09:45.733 [DEBUG] [WaveSerialHandler$ZWaveReceiveThread] - Receive Message = 01 04 01 13 01 E8 
2017-12-29 08:09:45.737 [DEBUG] [ve.internal.protocol.ZWaveController] - Receive queue TAKE: Length=0
2017-12-29 08:09:45.737 [vent.ItemStateChangedEvent] - zwave_serial_zstick_6bbcc6ee_serial_ack changed from 5822 to 5823
2017-12-29 08:09:45.739 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 04 01 13 01 E8 
2017-12-29 08:09:45.741 [DEBUG] [ve.internal.protocol.ZWaveController] - Process Message = 01 04 01 13 01 E8 
2017-12-29 08:09:45.743 [DEBUG] [ve.internal.protocol.ZWaveController] - Message: class=SendData[0x13], type=Response[0x01], priority=High, dest=255, callback=0, payload=01 
2017-12-29 08:09:45.749 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 9: Sent Data successfully placed on stack.
2017-12-29 08:09:45.749 [vent.ItemStateChangedEvent] - zwave_serial_zstick_6bbcc6ee_serial_sof changed from 12236 to 12237
2017-12-29 08:09:50.721 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - NODE 9: Timeout while sending message. Requeueing - 1 attempts left!
2017-12-29 08:09:50.722 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 9: Got an error while sending data. Resending message.
2017-12-29 08:09:50.726 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 1. Queue={}
2017-12-29 08:09:50.729 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 0
2017-12-29 08:09:50.733 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 09 00 13 09 02 25 02 25 E8 04 
2017-12-29 08:09:50.736 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 9: Sending REQUEST Message = 01 09 00 13 09 02 25 02 25 E8 04 
2017-12-29 08:09:55.740 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 03 00 16 EA 
2017-12-29 08:09:55.742 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - NODE 9: Sending ABORT Message = 01 03 00 16 EA 
2017-12-29 08:09:55.745 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 03 00 16 EA 
2017-12-29 08:09:55.750 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 255: Sending REQUEST Message = 01 03 00 16 EA 
2017-12-29 08:09:55.754 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - NODE 9: Timeout while sending message. Requeueing - 0 attempts left!
2017-12-29 08:09:55.759 [DEBUG] [ng.zwave.internal.protocol.ZWaveNode] - NODE 9: Node is DEAD.
2017-12-29 08:09:55.763 [DEBUG] [ve.internal.protocol.ZWaveController] - Notifying event listeners: ZWaveNodeStatusEvent
2017-12-29 08:09:55.767 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 9: Got an event from Z-Wave network: ZWaveNodeStatusEvent
2017-12-29 08:09:55.771 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 9: Setting OFFLINE
2017-12-29 08:09:55.781 [DEBUG] [ve.internal.protocol.ZWaveController] - NODE 9: Node Status event - Node is DEAD
2017-12-29 08:09:55.785 [DEBUG] [l.serialmessage.SendDataMessageClass] - NODE 9: Node is DEAD. Dropping message.
2017-12-29 08:09:55.788 [hingStatusInfoChangedEvent] - 'zwave:device:6bbcc6ee:node9' changed from ONLINE to OFFLINE (COMMUNICATION_ERROR): Node is not communicating with controller

Anyone have any pointers? Thanks!

A message is timing out, so the binding considers the device is not responding so takes it offline with a communcation error. It does however also receive some messages, so it puts it online…

Maybe there’s a message the binding is sending that the device doesn’t want to respond to, or maybe the device is not communicating well if it’s a little far away from the controller. It’s hard to tell from this short log and also not knowing what is being sent or what the device is…

(please also format logs using the </> button so they are readable - I’ve edited your post to fix this :wink: ).

I’ll try and catch it again with better log settings. Besides setting the zwave log to DEBUG level, what do you recommend the log settings be to get more/better info?

The nodes that go on and offline are not consistent with distance from the stick (the closest one is 3m with direct line-of-site), nor with particular type. It happens to appliance modules, dimmer switches (Expensive and cheap).

My thought was that it might be an incompatibility between the stick and the USB bus or something like that.

Thanks for your thoughts!

The logging itself is fine - just a bit more so we can see if there’s a pattern (eg it always times out on a specific request). If we can establish this, then we can work around it (most likely!).

What is/are the devices - this is also important as the changes will need to be made to the database if we find a systematic timeout…

Got it. I’ll leave DEBUG on and let it run after that for a few hours or a day and post back.

Okay, I’ve done a lot of logging and some initial investigation of the results. My zwave network has 31 active nodes and is a mashup of various brands and devices of various ages (I’ve been doing HA for a long time). The log runs from 12/29 10:00 AM to 1/1 at 10:00 AM. Here is what I have seen:

  1. EVERY device in my network went offline at least once. There were 224 instances of devices going offline in that time. 9 of the devices seem to go offline significantly more than the others (11-26 times each).
  2. There isn’t a specific brand/type in that 9 that stood out. There were 3 HA03, 1 HA04, 3 DSC27, 1 DSC18 and one unknown device
  3. There wasn’t a specific location in the house that was an issue. 6 on the main floor, 1 outside and 2 in the basement. 2 in the east end, 4 in the middle and 3 in the west end of the house.
  4. The longest gap WITHOUT a device going offline was 14 hours. Other than that they were VERY evenly spread out with just a handful happening each hour.
  5. I DO have some nodes that have died that are still in the controller. I can remove those, if anyone thinks that might be an issue.

Here is a link to the log files for those two days (6MB zipped): https://1drv.ms/u/s!AqjhwhTWp1W9lfo6bbz16ifOebiGOQ

I’m not sure if there is anything in there that would point to a smoking gun. Thanks to anyone that has time to look, or advice on otherways I can troubleshoot!