AndroidTV binding toxic to my system

Hello.

Short version: I am a new openHAB user, coming from more than a decade on MisterHouse. I set up openHAB a couple of weeks ago with the goal of bringing my devices and automations from MisterHouse to openHAB. One device class I use is 1-Wire. It had been 100% stable for years on MisterHouse and after setting it up on openHAB, it would work perfectly for a short period of time (from minutes to hours) and then completely stop working. There would be absolutely nothing in the logs, even at the TRACE level. Other bindings would also stop working, e.g. IKEA TRÅDFRI, UPnP Control. Initially, I did not think that these different bindings that stopped working had a common root cause, but after a couple of days of intense troubleshooting, I know now that it was a single binding that was wreaking havoc: the AndroidTV binding – it would exhaust the thing handler thread pools, which is what caused the OneWire binding’s refresh function to not be called anymore. Uninstalling the AndroidTV binding fixed all the problems.

Long version: this is related to the following post that I made in the beginners section last week:

“OneWire binding stops working”

I know now that there was no configuration/beginner issue but a problem with a specific binding, which is why I decided to post here.

The issue was that the OneWire binding would stop working after working fine for a period of minutes to hours. There was nothing in the openHAB logs, even when logging at the TRACE level.

How did the issue manifest itself? The OneWire binding has a refresh function that is scheduled like this, which I understand is the correct way for a binding to schedule tasks to be run at a later time:

refreshTask = scheduler.scheduleWithFixedDelay(this::refresh, 1, 1000, TimeUnit.MILLISECONDS);

The refresh() method is then responsible, depending on the polling interval (by default 5 minutes), of polling the 1-Wire devices and bridge channels.

The problem was that this refresh() method would work fine for a little while but then it would stop being called, and then no more 1-Wire data would come into openHAB. This was a very frustrating issue because not only would data stop coming in, which is a big problem for a home automation system, but also because the only way to recover was to restart openHAB. Troubleshooting was hard because the binding itself did not generate any log messages, even at the TRACE level, that could shed any light into what was happening.

After a lot of reading community posts and Github issues, and adding some instrumentation to the OneWire binding to be able to follow what was going on, I found out that the reason the OneWire binding’s refresh method was not getting called anymore was that the thing handler thread pools became exhausted, apparently all blocked by the AndroidTV binding.

I reached this conclusion after inspecting the “threads” command output in the Karaf console – in the bad/nothing works state, the thing handler threads would look like this:

$ grep thingHandler threads2.txt
“OH-thingHandler-1” Id=738 in RUNNABLE (running in native)
“OH-thingHandler-2” Id=739 in RUNNABLE (running in native)
“OH-thingHandler-3” Id=740 in RUNNABLE (running in native)
“OH-thingHandler-4” Id=742 in RUNNABLE (running in native)
“OH-thingHandler-5” Id=743 in RUNNABLE (running in native)
“OH-thingHandler-6” Id=746 in RUNNABLE (running in native)
“OH-thingHandler-7” Id=750 in RUNNABLE (running in native)
“OH-thingHandler-8” Id=751 in RUNNABLE (running in native)
“OH-thingHandler-9” Id=752 in RUNNABLE (running in native)
“OH-thingHandler-10” Id=756 in RUNNABLE (running in native)
“OH-thingHandler-11” Id=757 in RUNNABLE (running in native)
“OH-thingHandler-12” Id=759 in RUNNABLE (running in native)
“OH-thingHandler-13” Id=760 in RUNNABLE (running in native)
“OH-thingHandler-14” Id=761 in RUNNABLE (running in native)
“OH-thingHandler-15” Id=762 in RUNNABLE (running in native)

(Note the “RUNNABLE” state.)

Furthermore, when running “threads --locks”, each one of the above threads would show that they were locked by:

  - locked java.lang.Object@227f1989
at org.openhab.binding.androidtv.internal.protocol.googletv.GoogleTVConnectionManager$$Lambda/0x00000001012e9a30.run(Unknown Source)
at java.base@21.0.10/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
at java.base@21.0.10/java.util.concurrent.FutureTask.run(FutureTask.java:317)
at java.base@21.0.10/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304)
at java.base@21.0.10/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
at java.base@21.0.10/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
at java.base@21.0.10/java.lang.Thread.runWith(Thread.java:1596)
at java.base@21.0.10/java.lang.Thread.run(Thread.java:1583)

Locked synchronizers: count = 3
  - java.util.concurrent.ThreadPoolExecutor$Worker@2ce77b3a
  - java.util.concurrent.locks.ReentrantLock$NonfairSync@4c6cec4f
  - java.util.concurrent.locks.ReentrantLock$NonfairSync@13b278c6

A healthy system would show something like the following and no locks:

$ grep thingHandler threads3.txt
“OH-thingHandler-22” Id=879 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-34” Id=922 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-37” Id=940 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-39” Id=942 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-40” Id=943 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-41” Id=944 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-44” Id=949 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-45” Id=968 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-46” Id=969 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-47” Id=972 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-48” Id=974 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-49” Id=975 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-50” Id=976 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-51” Id=977 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f
“OH-thingHandler-52” Id=979 in TIMED_WAITING on lock=java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@7f8df40f

(Note “TIMED_WAITING”.)

After I found this out, I uninstalled the AndroidTV binding, restarted openHAB, and OneWire data did not stop coming into openHAB anymore. And the other bindings that had previously stopped working randomly, like IKEA TRĂ…DFRI, and that I thought had stopped working due to issues with the bindings, did not stop working anymore.

What is wrong with the AndroidTV binding? I have no idea. It might be a problem with my two Android TVs not being online all the time, or with the fact that I had not finished openHAB thing or item configuration of one of my two Android TVs (to finish the configuration it is necessary to interact with the TV and enter a PIN or code somewhere, which I did for one of my TVs but not the other). Whatever the case, something is not right with it because it should not bring the system down to its knees because of some anomalous condition like TV not online or related thing or item not fully configured.

Why did I have the AndroidTV binding installed? Well, I really don’t need it, but when I first installed openHAB and logged into the UI for the first time, the system said “look, we have discovered all these things for you; we recommend you install this and that add-on”. And I was like “wow, this is nice; MisterHouse didn’t have any of this so I’ll just install the suggested add-ons so I can play with them later; perhaps I have been missing out”. Little did I know that one of the suggested bindings (AndroidTV) would give me so much grief.

To wrap up:

  1. Something is not right with the AndroidTV binding; under some circumstances it causes serious problems by locking all threads in the thing handler thread pool. I don’t have that binding installed anymore but if a maintainer would like to dig in, I am happy to install it to help get it fixed. Happy to create a Github issue for this. (I just don’t know the protocol for filing an issue on this project given that I have been a user for just a couple of weeks.)

  2. I did a lot of reading of community posts and Github issues to be able to get some clues on how to troubleshoot an issue like this.

  3. I learned that to troubleshoot this type of issues where a binding seems to stop working with no helpful log messages anywhere, the “threads --locks” command is really useful.

Hope this helps someone in the future.

Cheers,

Eloy.-

1 Like

Please file an issue on this on the openhab-addons repo. It’s known that some bindings are not properly using the shared thread pool which is only intended for not long-running/blocking actions. We’ve found a few bindings already and even added some warning logging to core to report when a binding appears to be holding on to the shared threads for too long. But I don’t know how robust that logging is and it might not catch it in every case.

It’s doing long-running and/or blocking activities using the shared thread pool. It needs to create its own thread pool for such activities. It very well could be that it gets into a loop trying to connect to a device that is not online, consuming more and more threads.

The hardest part with this specific problem is the part you’ve already done, identify which binding is the root cause of the problem. Fixing it is relatively simple, just a couple changes I think.