Some bindings gets into a non-responsive state every few hours (in v2.5.6 - v2.5.8)

No problem noticed on my side, OH is running since several days without restart and I can still control my hue lights.

So, I monitered my system for now more than one week and in the constellation for having 2.5.5 incl. amazonechofix running I did NOT encounter this issue again.

Even if I can’t get a log proof for this, I assume that this is due to version 2.5.5 / 2.5.6.
@Kai, would you mind to read into this thread as this might be an information that could be important for the next release fixes? Let me know if I can help in any way.

This is probably a combination of bindings or your setup that leads to your problem.
I would be very surprised that there is a particular issue with the hue binding as I am using it myself without any problem.

In case you are not owning hue sensors, I would recommend to disable this feature by setting the refresh interval to 0 (2.5.6 version) or a very high value (<2.5.6 version - value is in milliseconds) to avoid very frequent requests to the hue bridge (twice per second by default).

That is true that mine is disabled.

Good hint with the sensor interval, which I changed immediately.

It might be indeed an issue resulting of the combination of bindings. Anyway, the combination was working since a very long time and isn’t working anymore since 2.5.6.
I’ll keep on watching this in further releases and will report back.

Thanks for sharing ideas!

Hi, are you still experiencing this issue? I am also having problems with Hue devices and have done so for approx 2 weeks. I cannot pin down what has changed, it happen around the time that I applied the Amazon binding fix, but I have disabled the binding and still have the same issues, so not sure if it is related. I am having to reboot my system about once a day. The symptoms vary, but the Hue motion sensors always cease to work after a period of time, while the Hue lights will sometimes be operational and sometimes they are not (from Openhab mobile App). If they are, then I also have some Zwave sensors that will operate the lights, but the Hue sensors will not. I thought it may be something to do with my Hue hub as it is getting pretty busy from all the lights, sensors and switches attached. I set up a 2nd Hue hub with just one sensor and 6 lights to see if that solved the issue. It did not.

I have the same issue with my HUE motion sensors (20+ of them). I’m actually running 2 different bridges to create redundancy with groups across the bridges. I actually thought it was an issue with network routing since I don’t have bulbs as repeaters so I started buying innr plugs (4x) to supplement each bridge for repeaters on the network. I’m not convinced that this step has helped or not yet.

I’m running an older version of HUE and I’ve also changed out the Amazon binding for the fix.

openhab> list -s | grep hue
212 │ Active │ 80 │ 2.5.0.201908160411 │ org.openhab.binding.hue

Best, Jay

So I did put my own post up here:

I first noticed this issue upgrading to 2.5.3 a few weeks ago but didn’t have time to investigate.
Then I see my vendor (qnap club) release the 2.5.6 update but it’s still happening.

It’s always the hue binding that causes my whole openhab to crash, nothing else in the logs other than what I put in the link above about the discovery service and too many open files.

I have 3 x genuine hue bridges linked and did not have these issues on 2.5.2

I also have the amazon echo fix installed

Please consider there were no changes of the hue binding between 2.5.0 and 2.5.3.
First enhancements started in 2.5.4.
So if your problem started with 2.5.3, it cannot be directly due to the hue binding.

I’m sure it was 2.5.3 but cannot be 100% as there were a few updates over the course of a couple of weeks on my system. Ive removed the amazon echo binding for now to test. Will report back

So I have had the amazon echo fix removed for a couple of days now And I still have a working openhab. Seems the fix didn’t fix it.

Quite a whlie ago, but I do have a new update…

My system was running fine (as described above) on 2.5.5 for the time since the downgrade.
With the 2.5.8 I wanted to give it a new try. Unfortunately that failed, as after the upgrade the error of hanging bindings occurs again.

I digged into my Grafana charts on persited items as well as the events.log and found that some bindings (incl. Hue) stopped working while some other still kept working.

Still working:

  • Novelan
  • ComfoAir
  • MQTT
  • Z-Wave
  • Homematic

Not working:

  • Astro
  • OpenWeatherMap
  • Hue
  • System Info

Those of you that encountered the same issue, did you find some reasons or does someone else has an idea of what this could be?

Comment only - That looks like bindings with scheduled activities, polling.

If bindings stop updating it’s very likely a deadlock on the scheduled tasks. There are a limited set of threads available that are shared between bindings. If bindings don’t play nice or even lock those threads all other bindings will also stop. To see if there are any dead locks you can use the following command in the karaf console:

threads --monitors --locks

It’s a bit verbose. But it will given an information about locks and should give information about where it’s blocked.

What is the meaning of this output? I’ve been having some seemingly odd issues of late and I know that my mqtt homie binding is one that is probably holding up resources a lot. Would love some info on deciphering this. Thanks!

Really? Doesn’t that just lie there passively until someone publishes a message for it?

True - I just know I have a lot of devices here (over 50) that are fairly active and i’ve had a few issues where they have stopped updating from mqtt (commands still work). Have a separate thread where that is being looked into though.

So your broker might be pants.

It isn’t the broker - everything in mqtt works fine. This has its own thread discussion on github going though.

Hi hilbrand,

thanks a lot for that excellent hint, a deadlock sounds so much reasonable!
I wouldn’t expect it due to a schedule as the lock situation occurs always at a different time.
But I found something that looks like the right way to the root of the problem.

This is the result of my threads --monitors --locks:

openhab> threads --monitors --locks|grep BLOCKED
“OH-thingHandler-1” Id=222 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-thingHandler-3” Id=229 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-thingHandler-4” Id=232 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-thingHandler-5” Id=233 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“upnp-main-243” Id=2133 in BLOCKED on lock=java.lang.Object@197628b
“OH-discovery-75” Id=2193 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“upnp-main-258” Id=2242 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-77” Id=2244 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“HarmonyDiscoveryServer(tcp/39185)” Id=2245 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“upnp-main-262” Id=2261 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-83” Id=2550 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-84” Id=2590 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“OH-discovery-85” Id=2642 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“HarmonyDiscoveryServer(tcp/40065)” Id=2690 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9
“HarmonyDiscoveryServer(tcp/35275)” Id=3111 in BLOCKED on lock=org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9

That shows that org.eclipse.smarthome.config.discovery.internal.DiscoveryServiceRegistryImpl@1a7e1c9 is the one blocking other tasks.

Regarding the Hue bindung I can’t find anything at all in the threads information, so the OH-thingHandler seems to block the thing actions.

After a service restart there are no locks anymore.

Do you know how to dig deeper into DiscoveryServiceRegistryImpl and what @1a7e1c9 means?
Meanwhile I try to stop the Discovery services in Karaf and see if the thing actions will hang again.

There are blocks on thing handler threads (OH-thingHandler-*) and discovery threads (OH-discovery-*)

and the blocks mention the HarmonyDiscoveryServer. It’s very likely the discovery process and handler of that binding don’t work well together. In the parent class of the handler a number of methods are synchronized. It’s possible both the handler and discovery access different methods at the same time causing them to wait for each other.

The binding also does background discovery. If you disable background discovery on that binding the problem might not happen (could be triggered if you do a manual discovery, but less likely). But that is only a temporary solution.

I also suggest to report this issue on the openhab-addons GitHub.

This is a internal runtime hash of the object.