Some bindings gets into a non-responsive state every few hours (in v2.5.6 - v2.5.8)

I have the same issue with my HUE motion sensors (20+ of them). I’m actually running 2 different bridges to create redundancy with groups across the bridges. I actually thought it was an issue with network routing since I don’t have bulbs as repeaters so I started buying innr plugs (4x) to supplement each bridge for repeaters on the network. I’m not convinced that this step has helped or not yet.

I’m running an older version of HUE and I’ve also changed out the Amazon binding for the fix.

openhab> list -s | grep hue
212 │ Active │ 80 │ 2.5.0.201908160411 │ org.openhab.binding.hue

Best, Jay

So I did put my own post up here:

I first noticed this issue upgrading to 2.5.3 a few weeks ago but didn’t have time to investigate.
Then I see my vendor (qnap club) release the 2.5.6 update but it’s still happening.

It’s always the hue binding that causes my whole openhab to crash, nothing else in the logs other than what I put in the link above about the discovery service and too many open files.

I have 3 x genuine hue bridges linked and did not have these issues on 2.5.2

I also have the amazon echo fix installed

Please consider there were no changes of the hue binding between 2.5.0 and 2.5.3.
First enhancements started in 2.5.4.
So if your problem started with 2.5.3, it cannot be directly due to the hue binding.

I’m sure it was 2.5.3 but cannot be 100% as there were a few updates over the course of a couple of weeks on my system. Ive removed the amazon echo binding for now to test. Will report back

So I have had the amazon echo fix removed for a couple of days now And I still have a working openhab. Seems the fix didn’t fix it.

Quite a whlie ago, but I do have a new update…

My system was running fine (as described above) on 2.5.5 for the time since the downgrade.
With the 2.5.8 I wanted to give it a new try. Unfortunately that failed, as after the upgrade the error of hanging bindings occurs again.

I digged into my Grafana charts on persited items as well as the events.log and found that some bindings (incl. Hue) stopped working while some other still kept working.

Still working:

  • Novelan
  • ComfoAir
  • MQTT
  • Z-Wave
  • Homematic

Not working:

  • Astro
  • OpenWeatherMap
  • Hue
  • System Info

Those of you that encountered the same issue, did you find some reasons or does someone else has an idea of what this could be?

Comment only - That looks like bindings with scheduled activities, polling.

If bindings stop updating it’s very likely a deadlock on the scheduled tasks. There are a limited set of threads available that are shared between bindings. If bindings don’t play nice or even lock those threads all other bindings will also stop. To see if there are any dead locks you can use the following command in the karaf console:

threads --monitors --locks

It’s a bit verbose. But it will given an information about locks and should give information about where it’s blocked.

What is the meaning of this output? I’ve been having some seemingly odd issues of late and I know that my mqtt homie binding is one that is probably holding up resources a lot. Would love some info on deciphering this. Thanks!

Really? Doesn’t that just lie there passively until someone publishes a message for it?

True - I just know I have a lot of devices here (over 50) that are fairly active and i’ve had a few issues where they have stopped updating from mqtt (commands still work). Have a separate thread where that is being looked into though.

So your broker might be pants.

It isn’t the broker - everything in mqtt works fine. This has its own thread discussion on github going though.

Hi hilbrand,

thanks a lot for that excellent hint, a deadlock sounds so much reasonable!
I wouldn’t expect it due to a schedule as the lock situation occurs always at a different time.
But I found something that looks like the right way to the root of the problem.

This is the result of my threads --monitors --locks:

openhab> threads --monitors --locks|grep BLOCKED
“OH-thingHandler-1” Id=222 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-thingHandler-3” Id=229 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-thingHandler-4” Id=232 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-thingHandler-5” Id=233 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“upnp-main-243” Id=2133 in BLOCKED on lock=java.lang.Object@197628b
“OH-discovery-75” Id=2193 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“upnp-main-258” Id=2242 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-77” Id=2244 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“HarmonyDiscoveryServer(tcp/39185)” Id=2245 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“upnp-main-262” Id=2261 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-83” Id=2550 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-84” Id=2590 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-85” Id=2642 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“HarmonyDiscoveryServer(tcp/40065)” Id=2690 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“HarmonyDiscoveryServer(tcp/35275)” Id=3111 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9

That shows that org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9 is the one blocking other tasks.

Regarding the Hue bindung I can’t find anything at all in the threads information, so the OH-thingHandler seems to block the thing actions.

After a service restart there are no locks anymore.

Do you know how to dig deeper into DiscoveryServiceRegistryImpl and what @1a7e1c9 means?
Meanwhile I try to stop the Discovery services in Karaf and see if the thing actions will hang again.

There are blocks on thing handler threads (OH-thingHandler-*) and discovery threads (OH-discovery-*)

and the blocks mention the HarmonyDiscoveryServer. It’s very likely the discovery process and handler of that binding don’t work well together. In the parent class of the handler a number of methods are synchronized. It’s possible both the handler and discovery access different methods at the same time causing them to wait for each other.

The binding also does background discovery. If you disable background discovery on that binding the problem might not happen (could be triggered if you do a manual discovery, but less likely). But that is only a temporary solution.

I also suggest to report this issue on the openhab-addons GitHub.

This is a internal runtime hash of the object.

I configured the Harmony binding inside PaperUI which does not allow settings for discovery methods in the binding and the hub thing configuration.
To my understanding disabling general discovery in OH should disable auto discovery for all bindings.

Anyway I will monitor the behaviour having the harmony binding and OH discovery services disabled completely in the first run (before disabling discovery by .cfg):

openhab> bundle:list|grep Resolved
139 │ Resolved │ 80 │ 2.5.0 │ openHAB Core :: Bundles :: Configuration Discovery
140 │ Resolved │ 80 │ 2.5.0 │ openHAB Core :: Bundles :: Configuration mDNS Discovery
247 │ Resolved │ 80 │ 2.5.8 │ openHAB Add-ons :: Bundles :: HarmonyHub Binding
264 │ Resolved │ 80 │ 2.5.0 │ openHAB Core :: Bundles :: Configuration UPnP Discovery
265 │ Resolved │ 80 │ 2.5.0 │ openHAB Core :: Bundles :: Configuration USB-Serial Discovery
266 │ Resolved │ 80 │ 2.5.0 │ openHAB Core :: Bundles :: Configuration USB-Serial Discovery for Linux using sysfs scanning

I will bring this to the openhab-addons GitHub once getting a more precise description with these results.
Again, thanks for pointing me into this direction!

Do you know if the behaviour in the parent class changed since 2.5.6 (can’t find something in the release notes)?
In 2.5.5 the exactly same environment ran / runs (when downgrading) flawlessly.

You should be able disable discovery for a specific binding at least via karaf console: Runtime Commands | openHAB

I also see upnp being in the block list so it might also be caused by that binding. The harmonyhub hasn’t changed since that version.
The thread command also gives the full stack trace. That should give more information about at what point the block is actually happen and should give some insight in where to look.

That’s not related to a specific binding, but any binding that uses upnp for discovery. That would point to the Hue binding in your list of bindings.

Just a first wrap up:
Having the discovery processes and the harmony service disabled, the system kept running properly for >24h now :slight_smile:

I enabled the harmony service again and will keep analyzing if the lock is due to the upnp discovery service in conjuction with the harmony binding. Will report back soon.

1 Like

After a couple of more testing I found the combination of the upnp discovery and the harmony binding causing the deadlock for some other binding.

I opened up an issue in the GitHub repo here.

Thanks for all your help!

3 Likes