Unresponsive z-wave unit

MrBov · December 12, 2019, 12:19pm

Running the latest version of OpenHAB on Raspberry Pi for the last couple of years. I have a fairly basic setup with 30+ Z-Wave dimmer and switch units.

Recently one of my Fibaro Z-Wave double switch units has intermittently been behaving strange. Sometimes it works fine and responds in seconds. However, sometimes I tell the switch to turn the lights on/off via the mobile app and nothing happens. I can tell it multiple times and still nothing happens. During that time no other Z-Wave units can be controlled. Then at some random time it will wake up and respond.

The unit is in my garage which is a few units hop from the house and the controller.

Any immediate thoughts? Is this likely to be a hardware issue, a Z-Wave comms issue or something with the setup or scripts/rules? Any quick pointers to start me off in the right direction?

Worth deleting and re-adding the switch? Swap for a different switch? Trawl through the OpenHAB logs?

Bruce_Osborne · December 12, 2019, 12:33pm

What version of openHAB?
You should probably capture some debug logs. You may have one or more “zombie nodes” on your network. Devices sometimes try routing through those non-existent nodes until they are removed from the controller.

MrBov · December 12, 2019, 12:52pm

I’m on the latest stable version. Did an upgrade the other week to make sure. Been living with it doing this for a couple of months now…

Let me go capture some logs.

Cheers Bruce.

Bruce_Osborne · December 12, 2019, 1:03pm

The new stable version should be released on December 15. There have been enough changes since 2.4. that the Z-Wave developer has been recommending to at least update the binding to a more recent version.

MrBov · December 12, 2019, 1:19pm

Cool. Thank for the heads up Bruce. I will certainly update. Tempted to delete/re-add the troublesome Fibaro double switch. Also perhaps do a fresh start, either on my Synology NAS or a new Pi.

Also finding that Alexa is now inconsistently reporting “living room is not responding” and other issues.

Let’s look at those logs first. Guess I know what I’m doing over Christmas

Bruce_Osborne · December 12, 2019, 1:25pm

Sometimes deleting / re-adding the Z-Wave Thing (not excluding / including) helps.
It can be wise to do that after updating anyway so the Things get any new binding settings.

mhilbush · December 12, 2019, 1:34pm

This behavior is usually associated with timeouts. If the current transmission to a device is timing out, other transmissions will block until the binding gives up on the current transaction. As there are multiple 5 second retries, this can take quite a while.

A debug log will show which device is timing out (i.e. it may not be the one to which you’re sending the command).

MrBov · December 12, 2019, 2:07pm

Thanks Bruce, Mark. Anything specific in the log file I should be looking for? Errors, warnings or something else? Events shown at the time I have the issues? Worth sharing and extract of the log on here somehow, in case I’m missing something… ?

Bruce_Osborne · December 12, 2019, 2:09pm

Use the log viewer.

https://www.cd-jackson.com/index.php/openhab/zwave-log-viewer

mhilbush · December 12, 2019, 2:16pm

Also, if you post the log file here, I’l take a look. Don’t filter the log (other than removing any PII or other sensitive information that might be in the log. A guide for producing a log file can be found here. Make sure the log file covers the time period when the problem is occurring.

MrBov · December 12, 2019, 2:20pm

Thanks both… I’ll sort over the weekend and return.

smhaller · December 12, 2019, 2:42pm

Hi Andrew

are you by chance using the latest M6 milestone version currently? After the update from the M5 to M6 I noticed that ZWave acts somehow strangely. I have no clear indication yet to what it points. But if you use the same one it could correlate. I have about 25 nodes in total, and over the last days I saw that some of the previous working ones having ping status not resolved staying in NIF processing.
Sometimes it loos as the controller itself isn’t recognized properly.

So please check your log. if the ZWAVE part is not yet set into debug mode please do this.
Before checking it on Chris log browser.

Cheers
Stefan

Bruce_Osborne · December 12, 2019, 2:45pm

The latest Testing version is Release Candidate 1, released on Monday.

smhaller · December 12, 2019, 2:49pm

Didn’t got that one in time, was sick on Monday
I will most probably update and crosscheck the system afterwards.

Update:
With installing the current 2.5 stable version all issues that I had went away.
The Funny issue with RC1 version was that ZWave devices appeared online if when they where not attached to power
While with the M6 version I had to repeatedly reconfigure the ZWave controller.

FloatingBoater · December 12, 2019, 7:54pm

I hate ‘me too’ posts on existing threads, however in this case can I add another sample

Running openHAB 2.5.0.M5 Milestone Build on a RPI4 openHABian, I see occasional random periods where it can take 2-5minutes for a Zwave command to change the state of a device, then multiple events fire immediately. The logs don’t show much - typically, the state change is noted, but the feedback that the end device has changed doesn’t arrive:

[ome.event.ItemCommandEvent] - Item 'sBed1_Lamps' received command OFF
[nt.ItemStatePredictedEvent] - sBed1_Lamps predicted to become OFF

In the normal case, the device changes state within 0.5S, and acknowledges back the change.

The random events of blocked Zwave communication happen about 1-2 a day, and don’t have much correlation between day / night, and although it could be an external factor like RF interference, the issue can disappear as quickly as it arrives. It could well be RF, or a multi-hop mesh taking 2 mins to time out and blocking other commands.

The Z-Wave network viewer in HABmin shows 30 devices, with 9 with a single-hop to the controller, which is interesting. One battery device (a door FGK101 door sensor) seems to have bad connectivity (red lines), which is slightly odd as I’d not expect a battery device which spends most of its time sleeping to be doing routing. The rest of the devices are densely connected with many routes to the controller.

This makes me wonder about moving a mains powered ZWave device and waiting for the nightly network heal to see if the network viewer changes.

The transient nature of the issue makes it harder to capture useful logs - which I must agree is no help. I probably should run with debug on the ZWave binding, but haven’t done so to date. Chris’s instructions to do this via the Karaf shell are here for reference:
https://www.cd-jackson.com/index.php/openhab/5-zwave-debugging-openhab

log:set debug org.openhab.binding.zwave
log:display zwave
log:set default org.openhab.binding.zwave

Bruce_Osborne · December 12, 2019, 8:06pm

Official instructions here.

Here is the log viewer.

https://www.cd-jackson.com/index.php/openhab/zwave-log-viewer

mhilbush · December 13, 2019, 12:40am

Just turn on debug mode. With 30 nodes in your network, if it happens 1-2 times per day you can be assured that at least one of the log files will contain something that shows what’s going on.

MrBov · December 15, 2019, 5:13pm

Busy weekend… just grabbing a few minutes to play with debug mode etc. Before I did that, I noticed a bunch of warnings and errors in the log file… not sure where to begin… see the attached

First bunch are about “javax.servlet.ServletException: javax.servlet.ServletException: A MultiException has 1 exceptions.”… Is that linked to one of my bindings?

Also, “2019-12-15 14:25:27.687 [WARN ] [lipse.smarthome.io.net.exec.ExecUtil] - Execution failed (Exit value: -559038737. Caused by java.io.IOException: Cannot run program “arping” (in directory “.”): error=2, No such file or directory)” - is that to do with not having a network.cfg file in services?

Then “2019-12-15 14:25:35.476 [WARN ] [ve.internal.protocol.ZWaveController] - NODE 72: Restore from config: Error. Data invalid, ignoring config.”. Also applies to nodes 11 and 29. these are nodes that are in my inbox and show as unknown devices. Any easy way to remove them?

And… “2019-12-15 14:27:03.662 [WARN ] [wave.discovery.ZWaveDiscoveryService] - NODE 4: Device discovery could not resolve to a thingType! Manufacturer data not known.”. Same for nodes 11, 29, 56, 72 and 77. Some of those are the ones I need to remove. Node 4 is a real Fibaro dimmer that currently does not seem to work.

And “2019-12-15 14:36:07.180 [ERROR] [ersey.server.ServerRuntime$Responder] - An I/O error has occurred while writing a response message entity to the container output stream.
org.glassfish.jersey.server.internal.process.MappableException: java.lang.IllegalStateException: Could not find an active context for org.glassfish.jersey.process.internal.RequestScoped”

So rusty… I seriously need to learn more here :-). Hope I can get some pointers… Got to start somewhere on fixing this, before my wife throws me out and rewires the whole place

openhab.log (362.2 KB)

mhilbush · December 15, 2019, 6:07pm

I’ve not seen any of those errors before, except for the last one. That’s a known issue and has no relation to zwave.

An I/O error has occurred while writing a response message entity to the container output stream.

BTW, the openhab.log file has no zwave DEBUG information in it.

MrBov · December 15, 2019, 10:09pm

Hi @mhilbush, no those were the issues I spotted ahead of turning on debug

Here is a sample with some debug. Only a small slice due to file sizes. Although it all seemed to be generally behaving itself (as one would expect!). It’s node 63 (a Fibaro double switch) that’s generally the issue. Node 62 is another double switch in the same consumer unit in the garage. Both are a trek from the house, but I have a number of units in waterproof boxes in the garden to relay).

Now I know how to turn debug on and off I’ll try to grab another log file while the problem is in full force.

Sooo appreciative of any help anyone can give…

openhab.log (984.0 KB)