Broken Z-Wave Network

Chris told somewhere else that the Network viewer is not completely reliable at the moment. You can see this, because according to your screenshot there is NO connection to the controller (“1”). So when you believe in the viewer, nothing should run. :wink:

But as there are other problems, something seems to be wrong.

Are the problems related to zwave only? Or do other bindings also make problems?

Thanks a lot for the info.

I am also seeing other warnings in the log, like:

11:59:35.950 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing sonos:PLAY1:RINCON_XXXXXXX takes more than 5000ms for processing event
11:59:35.950 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber ‘org.eclipse.smarthome.core.thing.internal.ThingManager@641337’ takes more than 5000ms.
11:59:35.950 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber ‘org.eclipse.smarthome.core.thing.internal.ThingManager@641337’ takes more than 5000ms.

This sounds more like a general (maybe network related) problem.

Or a wild guess: If you are running OH on an Raspberry, have you checked that your SD card isn’t corrupted?

Yes, this may be two different problems.

But I am still seeing errors like:

15:39:50.532 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Timeout while sending message. Requeueing - 2 attempts left!
15:39:50.533 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 2: Got an error while sending data. Resending message.
15:39:55.539 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Timeout while sending message. Requeueing - 1 attempts left!
15:39:55.540 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 2: Got an error while sending data. Resending message.
15:40:00.542 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Timeout while sending message. Requeueing - 0 attempts left!
15:40:00.543 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 2: Got an error while sending data. Resending message.
15:40:05.547 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Too many retries. Discarding message: Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=2, callback=238, payload=02 03 71 04 00

15:33:52.052 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 11: Timeout while sending message. Requeueing - 2 attempts left!
15:33:52.054 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 11: Got an error while sending data. Resending message.
15:33:57.058 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 11: Timeout while sending message. Requeueing - 1 attempts left!
15:33:57.062 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 11: Got an error while sending data. Resending message.
15:34:02.068 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 11: Timeout while sending message. Requeueing - 0 attempts left!
15:34:02.069 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 11: Got an error while sending data. Resending message.
15:34:07.074 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - NODE 11: Too many retries. Discarding message: Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=11, callback=219, payload=0B 09 60 0D 01 01 71 04 00 00 00

The devices are working.

SD-card should be good, I just performed a fresh installation of RC1 on a new image.

Sorry, but I’ve no idea why these messages do appear. Maybe the binding developer has a suggestion. But I’m sure he will ask for a DEBUG log. So you can grab this already and then upload it here.

Maybe it’s the same problem like in this thread.

Thanks for the hint. I will keep an eye on this one!

I performed a fresh install of OH2 with RC1.

This is what I get when a Z-Wave device is initialized:

22:22:04.651 [ERROR] [ve.internal.protocol.ZWaveController] - Exception during ZWave thread: Input 2. {}
java.lang.IllegalStateException: Could not update status, because callback is missing
at org.eclipse.smarthome.core.thing.binding.BaseThingHandler.updateStatus(BaseThingHandler.java:388)[105:org.eclipse.smarthome.core.thing:0.9.0.b3]
at org.eclipse.smarthome.core.thing.binding.BaseThingHandler.updateStatus(BaseThingHandler.java:415)[105:org.eclipse.smarthome.core.thing:0.9.0.b3]
at org.openhab.binding.zwave.handler.ZWaveThingHandler.ZWaveIncomingEvent(ZWaveThingHandler.java:1178)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.ZWaveController.notifyEventListeners(ZWaveController.java:567)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.initialization.ZWaveNodeInitStageAdvancer.advanceNodeStage(ZWaveNodeInitStageAdvancer.java:1117)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.initialization.ZWaveNodeInitStageAdvancer.handleNodeQueue(ZWaveNodeInitStageAdvancer.java:230)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.initialization.ZWaveNodeInitStageAdvancer.ZWaveIncomingEvent(ZWaveNodeInitStageAdvancer.java:1306)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.ZWaveController.notifyEventListeners(ZWaveController.java:567)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.ZWaveController.handleIncomingResponseMessage(ZWaveController.java:293)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.ZWaveController.handleIncomingMessage(ZWaveController.java:217)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.ZWaveController.access$7(ZWaveController.java:208)[209:org.openhab.binding.zwave:2.0.0.RC1]
at org.openhab.binding.zwave.internal.protocol.ZWaveController$ZWaveInputThread.run(ZWaveController.java:1324)[209:org.openhab.binding.zwave:2.0.0.RC1]
22:22:04.736 [INFO ] [me.event.ThingStatusInfoChangedEvent] - ‘zwave:device:159ae42df3e:node14’ changed from UNINITIALIZED (HANDLER_MISSING_ERROR) to INITIALIZING
22:22:04.747 [INFO ] [me.event.ThingStatusInfoChangedEvent] - ‘zwave:device:159ae42df3e:node14’ changed from INITIALIZING to OFFLINE (BRIDGE_OFFLINE): Controller is offline
22:22:04.747 [INFO ] [smarthome.event.ThingUpdatedEvent ] - Thing ‘zwave:device:159ae42df3e:node14’ has been updated.
22:22:04.755 [WARN ] [ding.zwave.handler.ZWaveThingHandler] - NODE 14: Initialising Thing Node…
22:22:04.755 [INFO ] [me.event.ThingStatusInfoChangedEvent] - ‘zwave:device:159ae42df3e:node14’ changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
22:22:09.510 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 255: Timeout while sending message. Requeueing - 2 attempts left!
22:22:52.669 [INFO ] [me.event.ThingStatusInfoChangedEvent] - ‘zwave:device:159ae42df3e:node14’ changed from ONLINE to OFFLINE (BRIDGE_OFFLINE): Controller is offline
22:22:52.674 [INFO ] [me.event.ThingStatusInfoChangedEvent] - ‘zwave:device:159ae42df3e:node14’ changed from OFFLINE (BRIDGE_OFFLINE): Controller is offline to ONLINE
22:22:52.698 [WARN ] [ding.zwave.handler.ZWaveThingHandler] - NODE 14: Initialising Thing Node…
22:22:52.699 [INFO ] [marthome.event.ConfigStatusInfoEvent] - ConfigStatusInfo [configStatusMessages=[]]
22:22:52.700 [INFO ] [smarthome.event.ThingUpdatedEvent ] - Thing ‘zwave:device:159ae42df3e:node14’ has been updated.
22:22:52.735 [INFO ] [smarthome.event.ThingUpdatedEvent ] - Thing ‘zwave:device:159ae42df3e:node14’ has been updated.

At OH2 startup I am also seeing:

Controller handler not found. Cannot handle command without ZWave controller.

This is strange. My Z-Wave network used to work well. @chris, do you have any idea? In case you need it, I am having a DEBUG log from OH2 startup (4.5k lines :confounded:)

I will try to take a look at this tonight. I think I will have some time to look at it.

Is this using the latest version of everything (i.e. the latest runtime) or just the current zwave binding with an older runtime? The callback must have been set at some stage since we set the state to offline right at the beginning of the thing initialisation and this call is well after that as here we’re setting up. @kai - any thoughts on why the callback would now be missing? Nothing has changed in the binding for a little while now.

If I can download your debug file I’ll take a look - unfortunately internet is very poor where I am tonight :frowning:

No, there haven’t been any changes in the last weeks as far as I am aware of…

When is the callback removed - just when the thing is destroyed?

@philKrylc the debug log doesn’t seem to have the problem - ie there’s no exception in the log.

Right, the DEBUG log is from a different situation.

I am having the warnings all the time, i.e.:

13:28:38.470 [INFO ] [smarthome.event.ThingUpdatedEvent ] - Thing ‘zwave:device:159b6a25ebb:node2’ has been updated.
13:48:40.981 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Timeout while sending message. Requeueing - 2 attempts left!
13:48:40.987 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 2: Got an error while sending data. Resending message.
13:48:45.998 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Timeout while sending message. Requeueing - 1 attempts left!
13:48:46.013 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 2: Got an error while sending data. Resending message.
13:48:51.016 [ERROR] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Timeout while sending message. Requeueing - 0 attempts left!
13:48:51.022 [ERROR] [l.serialmessage.SendDataMessageClass] - NODE 2: Got an error while sending data. Resending message.
13:48:56.028 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - NODE 2: Too many retries. Discarding message: Message: class=SendData[0x13], type=Request[0x00], priority=Get, dest=2, callback=32, payload=02 03 71 04 00

Edit: @chris, I got an exception while exluding and including a device - I will send you the log.

Only in ThingManager.doUnregisterHandler(). Looking through the code, I notice that this happens (shortly) before dispose() is called on the handler - so there is indeed the risk of a bad timing that the handler does not yet know that it is about to be disposed. But in any case, this situation should not be too critical as it would happen during handler unregistration only and never during normal “operation time”.

This is something completely different - I’d prefer not to confuse things as timeouts are more routine (depending on exactly why they happen).

What happens if we change thing type, or update configuration. Doesn’t this destroy the thing, and then create it again? I would say this is “normal operation” and it shouldn’t cause an exception? Maybe I need to catch this exception better in the binding, but I think there must be something wrong in the framework for it to happen at all?

I’ll see what I can see in the logs…

@kai, so I think that this is what is happening -:

14:43:15.512 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 4: Got an event from Z-Wave network: ZWaveInitializationStateEvent
14:43:15.532 [DEBUG] [l.initialization.ZWaveNodeSerializer] - NODE 4: Serializing to file /var/lib/openhab2/zwave/node4.xml
14:43:15.612 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 0: Updating node properties. Controller not found.
14:43:15.612 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 0: Setting ONLINE
14:43:15.612 [ERROR] [ve.internal.protocol.ZWaveController] - Exception during ZWave thread
java.lang.IllegalStateException: Could not update status, because callback is missing

From the above, I see that there is a single event from the ZWave protocol layers. The “Serializing” message indicates that the thing dispose is being called - as does the following NODE 0: messages since nodeId is only set to 0 in the dispose method.

The code is -:

                case DONE:
                    // Update some properties first...
                    updateNodeNeighbours();
                    updateNodeProperties();

                    // Do we need to change type?
                    if (finalTypeSet == false) {
                        updateThingType();
                    }

                    // Set ourselves online if we have the final thing type set
                    if (finalTypeSet) {
                        logger.debug("NODE {}: Setting ONLINE", nodeId);
                        updateStatus(ThingStatus.ONLINE);

                        // Now that this node is completely initialised, we want to re-process all channels
                        initialiseNode();
                    }
                    break;

The exception occurs at the updateStatus toward the bottom, so from this I assume that the thing type is changing (in the call to updateThingType), dispose is being called at this point and we’re then updating the status.

I could possibly reorder some of this code to (hopefully) solve it, but I wonder if the framework shouldn’t defer the dispose call (somehow)?

FTR -:

1 Like

@sjka already started looking into it. But I think it is correct that the handler is disposed on a thing type change - a different handler might be instantiated afterwards, right?
But the shutdown process of a handler might be optimized as currently the callback is removed before the handler is disposed, so it does not have a clue that this might have happened.

Yes - I don’t disagree - just wondering if there’s a better way that allows the handler to complete what it’s doing…

Hopefully the change I made will at least fix the current issue.

Thanks.

I have one node that I can not initialize (due to this error). The others got the error, but finally initialized.