Zwave Wall Plugs - occasionally do not switch as rules request

I’m using arround 30 wall plugs (zwave) in my house. Every few weeks when normally “hot water” system should start at 6:30am, it seems to not do anything at all. And i have no idea what could be the problem.
Not sure what is the reason (reset Raspi with v2.5.2, unplug the wall plugs,) but after this happens all is fine for the next few weeks.

Its also not only the FRISTAR_hotwater system but also other switches in the basement that do not work…Connectivity is good, so its all arround 10m apart from the controller.

@chris , any idea what i could try…its hard to log more detailed as it would take weeks until it happens again. At 6:53 i realized in the shower that again i ran into this issue so i turned on manually.

2020-04-30 06:30:00.113 [ome.event.ItemCommandEvent] - Item 'FRISTAR_Switch' received command ON
2020-04-30 06:30:00.166 [nt.ItemStatePredictedEvent] - FRISTAR_Switch predicted to become ON
2020-04-30 06:30:00.217 [vent.ItemStateChangedEvent] - FRISTAR_Switch changed from OFF to ON
2020-04-30 06:31:22.698 [vent.ItemStateChangedEvent] - FRISTAR_Switch changed from ON to OFF
2020-04-30 06:53:18.728 [ome.event.ItemCommandEvent] - Item 'FRISTAR_Switch' received command ON
2020-04-30 06:53:18.761 [nt.ItemStatePredictedEvent] - FRISTAR_Switch predicted to become ON
2020-04-30 06:53:18.809 [vent.ItemStateChangedEvent] - FRISTAR_Switch changed from OFF to ON

in addition - interesting to see…
two Switches close to the FRISTAR (WellWater should start at 6:00…and it does, but MaxKalk should start at 6:30 as well, but it does not).

All of them stay togehter in a 10cm distance.

2020-04-30 06:30:00.134 [ome.event.ItemCommandEvent] - Item 'MaxKalk_Switch' received command ON
2020-04-30 06:30:00.193 [nt.ItemStatePredictedEvent] - MaxKalk_Switch predicted to become ON
2020-04-30 06:30:00.229 [vent.ItemStateChangedEvent] - MaxKalk_Switch changed from OFF to ON
2020-04-30 06:31:23.097 [vent.ItemStateChangedEvent] - MaxKalk_Switch changed from ON to OFF
2020-04-30 06:53:20.496 [ome.event.ItemCommandEvent] - Item 'MaxKalk_Switch' received command ON
2020-04-30 06:53:20.525 [nt.ItemStatePredictedEvent] - MaxKalk_Switch predicted to become ON
2020-04-30 06:53:20.539 [vent.ItemStateChangedEvent] - MaxKalk_Switch changed from OFF to ON



2020-04-30 06:00:00.078 [ome.event.ItemCommandEvent] - Item 'WellWater_Switch' received command ON
2020-04-30 06:00:00.175 [nt.ItemStatePredictedEvent] - WellWater_Switch predicted to become ON
2020-04-30 06:00:00.197 [vent.ItemStateChangedEvent] - WellWater_Switch changed from OFF to ON

I offer no help, but a bit of explanation -

First we issue the command, of course, and that should be picked up by the binding and sent over zwave.

At the same time, openHAB’s autoupdate feature also picks up the command, and guesses the likely outcome on your Item, so as to give a fast responding UI etc.
That’s the “predicted” log entry.
The prediction causes an Item state change, the first “changed” entry. Remember, that’s really just a hope that it will come out like that.

Over a minute later, we get a report back from the actual zwave device. It is not possible to tell here if that was in response to the command, or if it is just a routine scheduled report. (I think in truth there is no command response as such, and it is the binding that polls the device after sending command)

Either way, it says it’s still OFF and the Item gets updated accordingly with truth instead of hope.

Good news - that’s an indicator the target zwave device is alive and well and working in the mesh.
I think a binding DEBUG log should tell you if this update was a command response (i.e. the device should have got the command but could not implement it) or a scheduled report (maybe the command never got there)

If it causes confusion, you could disable autoupdate for each Item, so that your rules and UI tell the truth rather than guesses. The penalty is a delay between command (or UI click) and update.

As the truth does eventually get reported, it would be possible to construct rules to detect a command failure and make a re-try. But of course you should first try to get it working as reliably as it should be.

EDIT - have you ever noticed this behaviour outside of morning start ups? I am thinking about things “going to sleep” overnight, or nightly mesh heal involvement.

That is one reason the official binding documentation says to turn on and collect debug logs when things do not go as planned.

Check for and remove ghost nodes. Disable the daily heal or at least change the time. IIRC, the default is 2am, so your network could still be sluggish at 6am, depending on how many devices and their types. Do you have a backup running that could slow things down?

Experiment with the Command Polling Interval. Default is 1.5s and increasing or disabling may help out, especially if you are turning 30 devices on at the same time.

If you find it is not OH causing this, you may want to look into a zwave sniffer to help troubleshoot where the communication is being lost. Cost is only a spare zwave dongle.

hm, i now tried your inputs but ran in another even more annoying issue. Especially i removed 2 items that were dead. Not sure what is the difference between “mark failed” and “remove from controller”. i guess a soft and hard removal from the network.

The second thing i did was to change the healing time from 2am to 1am.

After that in the morning i realized on PAN11 (POPP & PHILIO) wall plugs that they all did blink red like “hell” but did not do their switching as they should. After unplug and replug all works fine, until the next morning, same “feature” every day…once in the morning.

image

Any idea what now is the problem…i did not remove any of these switches and cannot believe that changing to 1am would have such an impact. The documentation does not help a lot, only say that blinking fast means overload which is not the case…

Any idea? (v2.5.2-1)

Have you tried disabling the daily network heal? If your network is not changing it is not really needed. Some devices do not handle the heal well. @5iver may be able to provide more details.

Even just to test, set the heal to disabled.

The controller won’t remove a device that has not been marked as failed. Did you successfully remove them using just OH? Try a zwave discovery to be sure. Often, a third-party tool is needed, like PC Controller from Silabs… search the forum for details.

Another thing to try is to delete the Thing and then run discovery again. I don’t mean excluding it from the network, just delete the Thing in OH. This will update the Thing definition that is used. If you use a .things file, add .txt to the file extension, wait a minute, then change it back.

I’m fairly confident that disabling the heal will put things right. Did you try doing anything with the Command Poll Interval?

1 Like

i now first changed back to 2.5.2 as it seems that 2.5.4 brings some major issues with zwave from what i read today…nevertheless my problems started with 2.5.2.

I did not change the commnd poll intervall as i did never change it before (fir 2-3 year now) and i did not want to change too many things in parallel so the effects are unclear in the end. but definitely will have a look at that soon, changed it for now back to 2am (as it was) but next is to disable the healing.