My openhab service is taking a long time to stop.
I have OH v4.0.2 running on Ubuntu LTS however I’ve had this issue on/off for a while.
frequently Openhab takes a up to 2 mins I initially trhought it was mysql but I’ve updated my service to make this a prereq service. I normally see this issue when I’m rebooting
How do I figure how why its taking so long to stop?
I remember a discussion in Github for OH3. I explained that every binding using a serial connection needs few seconds, around 5, to stop.
At this time, my OH setup required 34 seconds to stop if I correctly remember.
I will check again with 4.0.2 during this evening.
The problem in that case was a thread issue with jupnp. During shutdown a race condition happened and we ran out of threads and lost the unsubscribe replies. That was corrected in jupnp 2.6.0 I believe by adding an additional thread pool. If you want to determine if this is still an issue, disable the pooling by setting all three pools to -1 in the config.
My point was not to point an outdated / resolved issue but I know there were some interesting comments in that discussion, especially on serial connections.
I just checked:
$ date ; sudo systemctl stop openhab.service ; date
jeudi 7 septembre 2023, 08:38:38 (UTC+0200)
jeudi 7 septembre 2023, 08:39:26 (UTC+0200)
So it takes now 48 seconds in my case to stop OH 4.0.2 on a RPI3, that is certainly more than before with a relatively similar OH setup (maybe with 1 or 2 added bindings) as I had in mind 34 seconds with first OH 3 version.
Now if it takes 2 minutes in your case, that is probably not normal, you may have a problem somewhere, maybe a problem with one binding ?
I’ve been seeing the same problem ever but never had enough time to investigate.
I believe there must be some fundamental design flaw involved.
Even if there was some sort of wait-for timeout involved, that must not be serialized in the first place so that times add up, must it ?
Second, why wait for threads (beyond those doing core I/O to logs and persistence) to “flush” at all?
Ideally, people should be able to kill -9 the java process or instantly switch off power of their box.
Don’t get me wrong, I for sure do NOT advocate this, yet it seems to be working, indicating longish shutdown times are a bug not a housekeeping or other requirement.
In that sense, it does not make a difference if it’s 2 mins or a half.
(FTR I don’t have any database running other than standard rrd4j).
The issue is probably due to the destroy() method in the thing handlers. The OH developer documentation say that destroy() must execute fast. However many binding developers do not respect that rule, and the destroy() method blocks instead. This causes a chain reaction since destroy() on other subsequent bindings is delayed until the prior call returns. A proper implemenation of destroy() should use scheduler.submit() if it has any tasks that may take a long time.
Odd question, when you run “shell:threads | grep -i httpclient | grep -i runnable | grep -i selector | wc -l” at the karaf console what do you get? I noticed that the neeo binding is creating a whole bunch of HttpClients and they aren’t cleaning up like they should. My system is crashing when that number gets around 4000. I opened an issue here…
…I’ve been trying to figure out why they aren’t closing but I’ve had no luck so far.
EDIT: More specifically, I’m pretty sure it’s because HttpClient is not AutoCloseable (from what I can find on google). My issue is why this just recently started happening. I never had issues with the neeo binding crashing OH before. This seems new. I’ve been wondering if it was a more recent karaf upgrade but I can’t pinpoint.
Generally, the good practice for a binding is to use the common OH HTTP client or if not possible to create a dedicated HTTP client.
If this binding is creating several thousands of HTTP clients, there is a serious conception issue in this binding.