I upgraded to OH 4.3.4 to resolve an issue with the Tado smart thermostat. Initially all seemed to work fine but after a certain time most items stopped responding. It seems only MQTT items keep working.
All items seem to stop reacting at the same time. I can see from data stored at what time the fault occurs, but in the log files there appears to be no error message or other indication something has gone wrong.
When restarting the OH service or rebooting the system all is working fine again, but after a certain amount of time (varying from about 30 minutes to several hours) the same fault occurs again.
If I look in Settings>Things in the UI everything appears to be working OK and in the log files I can see commands are sent when pressing a switch on the sitemap or running a rule, but nothing seems to respond, with the exception of MQTT items.
Glad to see I am not alone. I was starting to lose my mind over this. Iāll keep looking to find if there is any specific action that triggers the fault, but so far no joy. Looks like it happens at a random moment.
Unfortunately I have the same issue with 4.3.4 (openHABian on Raspberry Pi 5)
I also saw that stopping the openhab.service with systemctl took very long and according to the status it was killed because of timeouts.
...
Process: 1566441 ExecStartPre=/bin/rm -f /var/lock/LCK..ttyAMA0 /var/lock/LCK..ttyACM0 (code=exited, status=0/SUCCESS)
Process: 1566442 ExecStart=/usr/share/openhab/runtime/bin/karaf ${OPENHAB_STARTMODE} (code=killed, signal=KILL)
Process: 1585037 ExecStop=/usr/share/openhab/runtime/bin/karaf stop (code=exited, status=0/SUCCESS)
Main PID: 1566442 (code=killed, signal=KILL)
CPU: 6min 22.392s
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Killing process 1585207 (n/a) with signal SIGKILL.
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Killing process 1585208 (n/a) with signal SIGKILL.
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Killing process 1585229 (n/a) with signal SIGKILL.
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Killing process 1585231 (n/a) with signal SIGKILL.
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Killing process 1585233 (n/a) with signal SIGKILL.
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Killing process 1585235 (n/a) with signal SIGKILL.
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Main process exited, code=killed, status=9/KILL
Apr 03 12:04:57 openhabian systemd[1]: openhab.service: Failed with result 'timeout'.
After starting the openhab.service again it could be stopped ānormallyā with proper āexitedā status.
I had noticed the same, that a service restart took very long. I just checked my logs and noticed the same as in your case, that the service was killed due to time outs during restarts.
It happened again this night and when roaming through the debug logs, I noticed, that the āDecrypting AccessTokenResponseā are missing from the time on when no thing updates are happening anymore.
I suspect that this might be related to the changes done to the OAuth Handling because of the tado binding.
When doing systemctl stop openhab.service, journalctl shows that the OAuthHandler hangs:
@AndrewFG Sorry to ping you here, but did you observe something similar on your side?
In my own JavaScript API calls to the tado API, I noticed that sometimes I donāt get a new access_token when trying to refresh it, because the tado server immediately closed the connection and my refresh attempt just fails.
Could something like this probably lead to a hang like we observe it here?
It looks like cross blocking on a synchronized call. It is probably not because the tado server responds too fast with an error but rather that it doesnāt respond at all. It is complicated to debug because I donāt see the same issues myself.
@AndrewFG, Iām no pro, but also not an absolute beginner. If there is anything I can do/try/test to gather some information that would help to identify the exact issue, let me know and I will have a go.
Just a thought⦠I am running PiHole, could that cause an issue?
To get the current status of the tado geofencing I implemented the oauth mechanism via JavaScript rules in parallel to the binding (this was already discussed on Github )
I.e. the binding uses oauth to communicate with the tado server and additionally I have some rules which use a second oauth access_token to send requests to the API.
The rule which refreshes the āJavaScript access_tokenā from time to time throws this error when the endpoint https://login.tado.com/oauth2/token is called:
If the same happens for the binding, could this somehow block it?
But itās just a guess, I donāt know the internals of the binding and openHAB core regarding the oauth mechanismā¦
Probably itās possible to add some timeoutsā¦?
I disabled the tado things for now and guess this will stop the issueā¦
@Ironman / @engine I am investigating the possibility that the problem is caused by the tado authentication server blocking without any http timeout; or alternatively if the tcp connection to the server was softly broken, or not even opened at all.
So can you please provide more information about your OH serverās ethernet connection:
Is it wifi or wired?
Is it on a virtual machine i.e. where the host might softly break tcp connections?
Are there any firewalls or spam filters that might block tcp connections, of filter traffic contents?
Do you have access to Wireshark? In which case can you see anything odd in your traffic?
Anyway perhaps you could kindly log:set TRACE org.openhab.core.auth and log:set TRACE org.openhab.binding.tado, restart OH, wait for the issue to reappear, and post the respective trace logs?
Unfortunately I donāt have the experience with wireshark and it seems that some additional setup is necessary which Iād like to avoid on my openHABian.
I checked the logs from tonights fail again and the oauth debug logs end with
2025-04-04 04:00:48.130 [DEBUG] [oauth2client.internal.OAuthConnector] - grant type refresh_token to URL https://login.tado.com/oauth2/token success
Thereās no following Decrypting: AccessTokenResponse log which usually follows afterwards.
Ten minutes earlier the refresh was successful:
2025-04-04 03:50:45.793 [DEBUG] [oauth2client.internal.OAuthConnector] - grant type refresh_token to URL https://login.tado.com/oauth2/token success
2025-04-04 03:50:45.795 [DEBUG] [lient.internal.OAuthStoreHandlerImpl] - Decrypting: AccessTokenResponse [accessToken=....
This really looks like the auth core is waiting forever.
There are also no additional TRACE logs from the auth core now that I activated TRACE level.
I guess it wonāt help much if I wait now until the issue happens again. As this breaks most of my home automation, Iād rather deactivate it nowā¦
Hm youāre right; I just tried to understand the code. But itās too compliated for me
If the grant type refresh_token is logged, it means that the doRequest() in grantTypeRefreshToken() returns.
Afterwards before grantTypeRefreshToken() itself returns, the shutdownQuietly() is done - could this hang for some reason?
I cannot find out how the Decrypting: AccessTokenResponse log is usually triggered after the grant type refresh_token log.
@engine thanks for looking at the code (I was doing the sameā¦)
Indeed.
No.
That comes from the next normal request being processed for the next zone.
Actually I think the issue is probably the accessTokenRefreshListeners.forEach(l -> l.onAccessTokenResponse(accessTokenResponse)) callback to the tado binding; this causes an immediate online status refresh, which causes another API call, which causes another request to getAccessTokenResponse() ad infinitum.
So there is a new Jar where I break that loop from within the binding ā you can download the Jar for test purposes hereā¦
Thanks!!
I installed it in the addons folder and got these errors when it loaded:
2025-04-04 18:16:53.331 [ERROR] [.internal.handler.TadoHandlerFactory] - bundle org.openhab.binding.tado:4.3.4.202504041425 (331)[org.openhab.binding.tado.internal.handler.TadoHandlerFactory(456)] : Error during instantiation of the implementation object
java.lang.IllegalArgumentException: argument type mismatch
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:?]
at jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:77) ~[?:?]
at jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:?]
at java.lang.reflect.Constructor.newInstanceWithCaller(Constructor.java:500) ~[?:?]
at java.lang.reflect.Constructor.newInstance(Constructor.java:481) ~[?:?]
at org.apache.felix.scr.impl.inject.internal.ComponentConstructorImpl.newInstance(ComponentConstructorImpl.java:326) ~[?:?]
at org.apache.felix.scr.impl.manager.SingleComponentManager.createImplementationObject(SingleComponentManager.java:286) ~[?:?]
at org.apache.felix.scr.impl.manager.SingleComponentManager.createComponent(SingleComponentManager.java:115) ~[?:?]
at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:1002) ~[?:?]
at org.apache.felix.scr.impl.manager.SingleComponentManager.getServiceInternal(SingleComponentManager.java:975) ~[?:?]
at org.apache.felix.scr.impl.manager.SingleComponentManager.getService(SingleComponentManager.java:920) ~[?:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceFactoryUse$1.run(ServiceFactoryUse.java:220) ~[org.eclipse.osgi-3.18.0.jar:?]
at java.security.AccessController.doPrivileged(AccessController.java:318) [?:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceFactoryUse.factoryGetService(ServiceFactoryUse.java:217) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceFactoryUse.getService(ServiceFactoryUse.java:118) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceConsumer$2.getService(ServiceConsumer.java:48) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.getService(ServiceRegistrationImpl.java:547) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.getService(ServiceRegistry.java:534) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl.getService(BundleContextImpl.java:660) [org.eclipse.osgi-3.18.0.jar:?]
at org.apache.felix.scr.impl.manager.SingleRefPair.getServiceObject(SingleRefPair.java:88) [bundleFile:?]
at org.apache.felix.scr.impl.inject.methods.BindMethod.getServiceObject(BindMethod.java:675) [bundleFile:?]
at org.apache.felix.scr.impl.manager.DependencyManager.getServiceObject(DependencyManager.java:2612) [bundleFile:?]
at org.apache.felix.scr.impl.manager.DependencyManager.doInvokeBindMethod(DependencyManager.java:2078) [bundleFile:?]
at org.apache.felix.scr.impl.manager.DependencyManager.invokeBindMethod(DependencyManager.java:2061) [bundleFile:?]
at org.apache.felix.scr.impl.manager.SingleComponentManager.invokeBindMethod(SingleComponentManager.java:443) [bundleFile:?]
at org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.addedService(DependencyManager.java:336) [bundleFile:?]
at org.apache.felix.scr.impl.manager.DependencyManager$MultipleDynamicCustomizer.addedService(DependencyManager.java:304) [bundleFile:?]
at org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerAdded(ServiceTracker.java:1232) [bundleFile:?]
at org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.customizerAdded(ServiceTracker.java:1152) [bundleFile:?]
at org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.trackAdding(ServiceTracker.java:959) [bundleFile:?]
at org.apache.felix.scr.impl.manager.ServiceTracker$AbstractTracked.track(ServiceTracker.java:895) [bundleFile:?]
at org.apache.felix.scr.impl.manager.ServiceTracker$Tracked.serviceChanged(ServiceTracker.java:1184) [bundleFile:?]
at org.apache.felix.scr.impl.BundleComponentActivator$ListenerInfo.serviceChanged(BundleComponentActivator.java:116) [bundleFile:?]
at org.eclipse.osgi.internal.serviceregistry.FilteredServiceListener.serviceChanged(FilteredServiceListener.java:123) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:961) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEventPrivileged(ServiceRegistry.java:937) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.publishServiceEvent(ServiceRegistry.java:874) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceRegistrationImpl.register(ServiceRegistrationImpl.java:141) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.serviceregistry.ServiceRegistry.registerService(ServiceRegistry.java:262) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl.registerService(BundleContextImpl.java:500) [org.eclipse.osgi-3.18.0.jar:?]
at org.apache.felix.scr.impl.manager.AbstractComponentManager$3.register(AbstractComponentManager.java:929) [bundleFile:?]
at org.apache.felix.scr.impl.manager.AbstractComponentManager$3.register(AbstractComponentManager.java:915) [bundleFile:?]
at org.apache.felix.scr.impl.manager.RegistrationManager.changeRegistration(RegistrationManager.java:133) [bundleFile:?]
at org.apache.felix.scr.impl.manager.AbstractComponentManager.registerService(AbstractComponentManager.java:984) [bundleFile:?]
at org.apache.felix.scr.impl.manager.AbstractComponentManager.activateInternal(AbstractComponentManager.java:752) [bundleFile:?]
at org.apache.felix.scr.impl.manager.AbstractComponentManager.enableInternal(AbstractComponentManager.java:674) [bundleFile:?]
at org.apache.felix.scr.impl.manager.AbstractComponentManager.enable(AbstractComponentManager.java:437) [bundleFile:?]
at org.apache.felix.scr.impl.manager.ConfigurableComponentHolder.enableComponents(ConfigurableComponentHolder.java:671) [bundleFile:?]
at org.apache.felix.scr.impl.BundleComponentActivator.initialEnable(BundleComponentActivator.java:310) [bundleFile:?]
at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:593) [bundleFile:?]
at org.apache.felix.scr.impl.Activator.access$200(Activator.java:74) [bundleFile:?]
at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:460) [bundleFile:?]
at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196) [bundleFile:?]
at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169) [bundleFile:?]
at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49) [bundleFile:?]
at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488) [osgi.core-8.0.0.jar:?]
at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:420) [osgi.core-8.0.0.jar:?]
at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232) [osgi.core-8.0.0.jar:?]
at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450) [osgi.core-8.0.0.jar:?]
at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.container.Module.publishEvent(Module.java:499) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.container.Module.start(Module.java:486) [org.eclipse.osgi-3.18.0.jar:?]
at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445) [org.eclipse.osgi-3.18.0.jar:?]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundle(DirectoryWatcher.java:1260) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.startBundles(DirectoryWatcher.java:1233) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.doProcess(DirectoryWatcher.java:520) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.process(DirectoryWatcher.java:365) [bundleFile:3.7.4]
at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316) [bundleFile:3.7.4]
2025-04-04 18:16:53.339 [WARN ] [g.discovery.internal.PersistentInbox] - bundle org.openhab.core.config.discovery:4.3.4 (169)[org.openhab.core.config.discovery.internal.PersistentInbox(135)] : Could not get service from ref {org.openhab.core.thing.binding.ThingHandlerFactory}={service.id=875, service.bundleid=331, service.scope=bundle, osgi.ds.satisfying.condition.target=(osgi.condition.id=true), component.name=org.openhab.binding.tado.internal.handler.TadoHandlerFactory, component.id=456}
2025-04-04 18:16:53.339 [WARN ] [g.discovery.internal.PersistentInbox] - bundle org.openhab.core.config.discovery:4.3.4 (169)[org.openhab.core.config.discovery.internal.PersistentInbox(135)] : DependencyManager : invokeBindMethod : Service not available from service registry for ServiceReference {org.openhab.core.thing.binding.ThingHandlerFactory}={service.id=875, service.bundleid=331, service.scope=bundle, osgi.ds.satisfying.condition.target=(osgi.condition.id=true), component.name=org.openhab.binding.tado.internal.handler.TadoHandlerFactory, component.id=456} for reference ThingHandlerFactory
But it is working, I can control the tado thermostats
I still have TRACE level active. I have four zones in my home and for each periodic update every 30 seconds I see 8 Decrypting: AccessTokenResponse logs and four ApiResponse logs from the TadoZoneHandler. Is this the expected amount of logs?
That is probably only a transient error due to changed thing configuration signature, so we can probably ignore it, but do let me know if you see it again.