[SOLVED] Z-Wave unreliable in 2.5.0.M4

Celaeno1 · October 25, 2019, 3:21am

I have not finished my investigations because of time constraints. I’ll post the results here as soon as I find the exact reason.

Celaeno1 · October 25, 2019, 3:51am

The issues (#1195, #1178) you reported on Github are still open.
If it turns out that they still exist in M2, M3, M4 and snapshots, why has not it been resolved so far, or why was it decided to publish the milestones anyway?

5iver · October 25, 2019, 5:28am

I did not report the first, and it was a known issue. The second one was mine. My workaround was to disable the daily network heal and restarting OH. I can still heal individual devices, if needed.

It’s a rare issue, possibly related to large networks (>120 devices). Chris is extremely busy, and with all the changes this year, the IDE is not what it was, especially when it comes to debugging.

I’m not the right person to ask, but the 2.5M2 preparation topic may shed some light on that.

Celaeno1 · October 25, 2019, 11:07am

I only have 10 nodes. (1 controller, 1 repeater, 5 FLiRS and 3 “non listening” nodes (=3x motion sensor)) . Only the last 3 make troubles after “daily healing”.

mhilbush · October 25, 2019, 11:36am

The workaround to both issues is easy. Just disable the nightly heal. My network of about 100 nodes has been running fine without the nightly heal for a couple months.

Celaeno1 · October 25, 2019, 11:42am

Did you do a manual heal for each node? How often? How do you know if it’s neccessary? Do you restart OH server each time?

mhilbush · October 25, 2019, 11:46am

I’d do that if necessary, but I haven’t needed to do it yet. My network is pretty stable. If I add some new nodes I might need to do it. If I see a node starting to become flaky, I might try running a heal on that node. But I haven’t needed to do that either.

mhilbush · October 25, 2019, 12:30pm

I should note that after disabling the nightly heal, a restart of openHAB is advised just in case the binding is in a weird state due to a previously run heal.

Celaeno1 · October 25, 2019, 1:35pm

@mhilbush

One last question.

Suppose you have a nightly heal at 2:00 A.M. and all the nodes have been healed within the next 24 hours, except for one or two nodes, then you should do a manual heal (of the missing nodes) until it is complete? Then “disable heal” and thereafter restart OH?

mhilbush · October 25, 2019, 1:40pm

In my analysis, if a heal doesn’t complete in 24 hours, it will never complete. In this scenario, I’ve seen the binding get stuck in the “initialize/heal” state for that node. Once in this state, it will never get out of that state until the binding is restarted. Once stuck in this state, the device will never be polled again until you do a restart. That’s why I disable the nightly heal, then do an openHAB restart.

This is just my experience, of course. OTOH, it’s backed by log files that show this behavior.

Bruce_Osborne · October 25, 2019, 1:44pm

In a battery operated sensor, I found removing & replacing the battery & making sure it was awake brought it back online fully.

Celaeno1 · October 25, 2019, 1:48pm

@mhilbush @Bruce_Osborne

I my case, I can do a “manual heal” by waking the node up many times (tripple pressing the button)! Without restarting OH.

EDIT: I have to check if 1x pressing has the same effect. (–>heal successful)

Bruce_Osborne · October 25, 2019, 1:51pm

If a node is hung trying to heal, may may not work.
Verify how to wake up your device. Triple-clicking on some devices is used for inclusion.exclusion mode.

Celaeno1 · October 25, 2019, 1:53pm

But then I must have had many ghost nodes? But I have not. But controller is NOT in inclusion/exclusion mode during a heal! Isn’t it?

But you are right. 3x press (include/exclude); 1x press “wake up”

Bruce_Osborne · October 25, 2019, 2:05pm

The mode likely times out if the controller is not in inclusion or exclusion mode.
Some of my devices take a single press & I have one where you are supposed to press & hold for at least 5 seconds to wake up.

mhilbush · October 25, 2019, 2:16pm

Hmm. I’m not sure why cycling the power and waking up the device fixes the issue completely. The primary symptom of the issue is that the initialization thread (which is used for device initialization, as well as heal), never completes. While it might’ve shown the node with a thing status of ONLINE, how do you know the binding killed the node initialization thread?

In the issue I opened, and as documented in the log file, once in this state, wake ups have no effect on resolving the issue. I’ve found nothing short of an OH restart will fix the issue. Even a binding restart won’t fix it, because the binding never kills the orphaned initialization threads. This behavior can be demonstrated by looking at the zwave threads using the karaf console. I’ve observed numerous node initialization threads even after a binding restart.

Edit: I should be more clear, after stopping the binding, the init threads are still there.

mhilbush · October 25, 2019, 2:18pm

See discussion here.

Celaeno1 · October 25, 2019, 2:20pm

But I can see in HABmin: “Last Heal Time” is getting updated with actual date/time, if it was successful. Is this info wrong?

mhilbush · October 25, 2019, 2:24pm

I don’t know. It’s unclear to me when the binding updates Last Heal Time – at the beginning of the heal, or at the end of the heal.

Celaeno1 · October 25, 2019, 2:26pm

Here is why I thought it’s “tripple pressing”: it was not translated correctly!

english:

.

german:

.

3).. dreimal = 3x
6) …“drücken” = press (but not how often!) …“aufzuwecken” = “to wake it up”

Better would be “drücken Sie einmal” = “do press once”