There have been some recent talk about the nightly heal and can on occasion (usually when a node is misbehaving) hang the network and in some cases knocking the network completely offline. Here is one thread where Chris comments on the nightly heal
Also please check the thread Bruce linked to for info about the remove node in habmin and using zensys tool. Mark and Chris have been working on getting habmin to function better at getting rid of died/ghost nodes
if you consider your issue fixed please mark the thread title as [SOLVED] and put a check mark in the thread that helped you fix the issue
In my case: I’ve checked all nodes with “zensys tool 4.7” and “Z-Wave PC Controller 5.38” by sending a NOP to all nodes. All were responding. None is failed. And it worked for 10 months without any problems. But since I made update to Snapshot 1731 and after first network heal, the troubles began.
That is a new and terrible bug. This was never the case before M3.
I do not, and I will not know for certain for at least 24 hours.
I know @5iver had some issues related to healing a while ago, before M3 IIRC.
I reported the same/similar before 2.5M2 as something that should be considered a blocker…
thank you for that and let us know. Do you already have you logs set to debug?
if you need to know how check here then
Thank you. When I had discovered it was broken for me in M3, I tried unsuccessfully to find if someone had a similar issue, but didn’t have time to thoroughly look into it. I have no idea why, but I opted to roll back to M1 instead of M2, so I was unaware if the issue affected M2. So all I knew was the issue existed in M3, but not M1.
I do now… Thank you for the reminder.
I’m not sure to understand what M1-2-3-4 means … I have 2.5.0~S1733-1 (Build #1733) on openhabian and I can say this bug is not visible (at the moment).
No issues so far with zwave
I have the same USB key … and healing at 2 AM … updated yesterday so I’ve already passed this point
There are current release versions of OpenHAB.
The stable release, normally released every 6 months, is currently
2.4 released late last year.
The Testing releases, also known ad Milestone releases are released every month.
2,5 Milestone 4 is the current version.
The Unstable, or Snapshot releases are built & released every night and are only designed for developers to test all pieces working together. Snapshot releases are not expected to totally work or be stable.
OK folks… before we all grab our pitchforks and start calling this a bug… this is common traits of a failed node. Sometimes when things are working good for a long time and we upgrade and things suddenly break, it is easy to blame the upgrade itself, when in fact, it is a low battery, node slightly out of range or whatever edge case. In a system with a lot of nodes, it could have gone unnoticed previously.
I have not finished my investigations because of time constraints. I’ll post the results here as soon as I find the exact reason.
The issues (#1195, #1178) you reported on Github are still open.
If it turns out that they still exist in M2, M3, M4 and snapshots, why has not it been resolved so far, or why was it decided to publish the milestones anyway?
I did not report the first, and it was a known issue. The second one was mine. My workaround was to disable the daily network heal and restarting OH. I can still heal individual devices, if needed.
It’s a rare issue, possibly related to large networks (>120 devices). Chris is extremely busy, and with all the changes this year, the IDE is not what it was, especially when it comes to debugging.
I’m not the right person to ask, but the 2.5M2 preparation topic may shed some light on that.
I only have 10 nodes. (1 controller, 1 repeater, 5 FLiRS and 3 “non listening” nodes (=3x motion sensor)) . Only the last 3 make troubles after “daily healing”.
The workaround to both issues is easy. Just disable the nightly heal. My network of about 100 nodes has been running fine without the nightly heal for a couple months.
Did you do a manual heal for each node? How often? How do you know if it’s neccessary? Do you restart OH server each time?
I’d do that if necessary, but I haven’t needed to do it yet. My network is pretty stable. If I add some new nodes I might need to do it. If I see a node starting to become flaky, I might try running a heal on that node. But I haven’t needed to do that either.