Openhab takes 2mins to stop

AndrewFG · April 9, 2024, 10:32am

Every Thing in your system has a dispose() method. When OH is shutting down, it calls each Things’ dispose() in series … so OH will not terminate until all such dispose() methods have completed. Therefore the OH specification for addons that implement dispose() is that the method shall complete “fast”. However not all handlers comply with the “fast” specification (maybe because nobody really knows what that word actually means). And the consequence is that if any Thing handler blocks during dispose() then the whole OH shutdown process is delayed. To find out which binding’s Things are blocking the shutdown, you have to one-by-one remove Things from your system and see which are causing the delay…

mtrax · April 9, 2024, 10:50am

Is there any logging that I can use to track down the cause, but having said that I don’t have heaps of add-ons.
I’m looking for the best way to disable without causing any problems with my automation setup.
ie if I disable addon hopefully it won’t break my scripts when install them back

mstormi · April 9, 2024, 11:30am

Shouldn’t that be changed to be parallel in the first place?
And second, add a (configurable?) timeout for each dispose()?
I believe there’s at least some code parts that complete seem to ignore shutdown.
Such as when parsing large rules DSL files on slow machines like RPi. Actually a shutdown only ever happens after all rules files had been parsed. Maybe parallelizing makes a difference here, too.

Actually while any core developer might be tempted to think this is a low prio thing because OH is supposed to run 24/7, it’s not in fact.
It’s making upgrades and debugging require much more time, and it’s annoying about anyone, users as well as developers.

@wborn Wouter are you aware of this and maybe can provide some more insight ? thanks

DrRSatzteil · April 9, 2024, 2:29pm

I experience the same thing in my docker based setup. I already raised the timeout for my container to a couple of minutes but I never investigated further to get to the bottom of this. I agree however that it is quite annoying

rlkoshak · April 9, 2024, 2:34pm

Just because this statement opened questions in my mind, if a binding has to do something during a dispose, the speed in which that happens won’t always be in their control. For example (one that admittedly many not be relevant to a binding) if there is a buffer that needs to be written to disk before dispose is done, it takes as long as it takes, right?

So while the policy is to exit fast, maybe it really means as fast as possible where that fast may actually take some time if there’s a lot of stuff a binding needs to do to safely close down.

I agree with @mstormi, why isn’t this done in parallel? Even using a thread pool so four or five can be disposing at the same time if issuing a dispose to everything all at once is a problem would improve it tremendously.

For another data point, shutdown on mine (Ubuntu VM running in Docker) takes about 8 seconds. 2 minutes does seem extreme.

AndrewFG · April 9, 2024, 5:18pm

I would say do. I will have a look at changing this.

wborn · April 10, 2024, 6:54am

I don’t think I have this issue. Perhaps the addons you use do not properly handle InterruptedException? If threads ignore the exception and keep running the Java runtime keeps waiting for them to properly terminate until the timeout expires. It’s a common issue devs fail to properly handle this exception and add a comment like “ignored” or “this will not happen” because they just want to get something quickly working and only spend time on proper exception handling once they run into such issues.

AndrewFG · April 10, 2024, 3:35pm

There are certainly some addons that have such a fault. I don’t think you can blame the poster for that.

In the meantime I will look into core to see if the shutdown could be optimised. Although that wont fix faulty addons obviously.

AndrewFG · April 10, 2024, 5:39pm

github.com

openhab/openhab-core/blob/cbb458e0c3c35f353954f82031c5cd1b3943759c/bundles/org.openhab.core.thing/src/main/java/org/openhab/core/thing/internal/ThingManagerImpl.java#L220


      
          
              this.thingRegistry.addThingTracker(this);
              readyService.registerTracker(this, new ReadyMarkerFilter().withType(StartLevelService.STARTLEVEL_MARKER_TYPE)
                      .withIdentifier(Integer.toString(StartLevelService.STARTLEVEL_MODEL)));
          }
          
          @Deactivate
          protected synchronized void deactivate() {
              thingRegistry.removeThingTracker(this);
              for (ThingHandlerFactory factory : thingHandlerFactories) {
                  removeThingHandlerFactory(factory);
              }
              readyService.unregisterTracker(this);
              ScheduledFuture<?> startLevelSetterJob = this.startLevelSetterJob;
              if (startLevelSetterJob != null) {
                  startLevelSetterJob.cancel(true);
                  this.startLevelSetterJob = null;
              }
              ScheduledFuture<?> prerequisiteCheckerJob = this.prerequisiteCheckerJob;
              if (prerequisiteCheckerJob != null) {
                  prerequisiteCheckerJob.cancel(true);

@mstormi / @wborn … this is just a quick and dirty suggestion, (as I don’t admit to great knowledge of the intricacies of the core code), but perhaps line 220 should be modified as follows. This would at least cause the disposal of thing handlers and thing handler factories to be done in parallel rather than in series. Or ??

scheduler.submit(() ->  removeThingHandlerFactory(factory));

wborn · April 22, 2024, 4:25pm

That may cause threading issues during development if you update the org.openhab.core.thing bundle with a new version.