Openhab cloud connection stops working, does not try to reconnect

Did anyone else have debug logging enabled?

Not me, since I haven’t been having any issues until yesterday (and not again since restarting the connector). If it happens again, I’ll install the troubleshooting binding.

It happened to me also, I have trace logging enabled. I also see the OH crashed completely because of memory leak I guess.

@ssalonen What exactly are you looking for in the logs?

2022-05-06 21:01:05.430 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = xxx-xc1-xxx-xxx-xxx, base URL = http://localhost:18080)
2022-05-06 21:01:12.417 [INFO ] [del.script.irrigation:Main_algorithm] -  not started. checkUserStartTime[true] itmIrrigationAuto.stat[ON] isWatering[false] WaterLeakDetectedRainSensor.state[OFF]
2022-05-06 21:01:13.676 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO re-connecting (attempt 1)
2022-05-06 21:01:13.679 [TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
2022-05-06 21:01:13.686 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:19.529 [WARN ] [okhttp3.OkHttpClient                ] - A connection to https://myopenhab.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
2022-05-06 21:01:19.530 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO re-connected successfully (attempt 1)
2022-05-06 21:01:19.531 [TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
2022-05-06 21:01:19.533 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:23.694 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:26.301 [INFO ] [nternal.handler.SpotifyBridgeHandler] - Spotify returned an error during polling: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
2022-05-06 21:01:29.050 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connected
2022-05-06 21:01:29.052 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = xxx-xxx-xxx-xxx-xxx, base URL = http://localhost:18080)
2022-05-06 21:01:29.054 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:43.184 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@129fa48' takes more than 5000ms.
2022-05-06 21:01:51.184 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://192.168.4.221/real_time_data.xml' (method='GET', content='null') failed: Total timeout 3000 ms elapsed
2022-05-06 21:01:58.266 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@129fa48' takes more than 5000ms.
2022-05-06 21:01:53.970 [ERROR] [io.openhabcloud.internal.CloudClient] - Error during communication: {}
io.socket.engineio.client.EngineIOException: xhr poll error
	at io.socket.engineio.client.Transport.onError(Transport.java:63) [bundleFile:?]
	at io.socket.engineio.client.transports.PollingXHR.access$100(PollingXHR.java:26) [bundleFile:?]
	at io.socket.engineio.client.transports.PollingXHR$6$1.run(PollingXHR.java:140) [bundleFile:?]
	at io.socket.thread.EventThread$2.run(EventThread.java:80) [bundleFile:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?]
	at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?]
	at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478) ~[?:?]
	at sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472) ~[?:?]
	at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1449) ~[?:?]
	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1060) ~[?:?]
	at okio.Okio$2.read(Okio.java:139) ~[?:?]
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[?:?]
	at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345) ~[?:?]
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217) ~[?:?]
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211) ~[?:?]
	at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[?:?]
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[?:?]
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[?:?]
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[?:?]
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) ~[?:?]
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) ~[?:?]
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) ~[?:?]
	... 3 more
2022-05-06 21:02:08.139 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:02:08.141 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO disconnected: transport error

.....

2022-05-06 21:52:24.408 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
.....

2022-05-06 23:07:53.034 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - 
java.lang.OutOfMemoryError: Java heap space


When I reproduced this issue on May 5, nothing other than ping/pong messages were present in the logs. I didn’t have any crashes, out-of-memory conditions, disconnects or anything weird at all that day, except for the Google Home commands stop working and myopenhab.org claiming my openhab was offline. I had full trace logging enabled and was using the troubleshooting binding and openhab version 3.3.0~S2784-1 on Ubuntu.

...
...
... hours and hours of ping/pong messages and nothing else since my last Google Home voice command, which was successful...
...
...
2022-05-05 11:42:08.721 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:42:08.771 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:42:33.773 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:42:33.823 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 49 ms
2022-05-05 11:42:58.825 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:42:58.875 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:43:23.878 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:43:23.927 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:43:48.929 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:43:48.979 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:44:13.981 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:44:14.030 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 49 ms

I am not sure what I am looking in the logs exactly but trying to find a common story.

It seems that we have now two confirmed cases with ping pong active but myopenhab.org saying that instance is online.

This new piece of information! To me it is saying that the client and server are connected and we can conclude that from (socket.io) client point of view everything would seem to be fine. Ping/pong watchdog is running still, and not stuck, for example.

Another piece of information is that both of you seemed to have issue at the same time, pointing to possible server side root cause.

We can also say that this is fairly uncommon scenario, I personally haven’t had any issues for months. At some point it was certainly more frequent.

Hmm. To understand the root cause, I am not sure we can do much without myopenhab.org maintainer in the loop, looking how things are connected in server side.


One more thing: when things are running normally, is there anything other than ping/pong in the logs?

EDIT: Turns out similar findings have been made already 2y ago by @wborn . Connection error recovery failed · Issue #134 · openhab/openhab-cloud · GitHub

Interesting. I’m curious about this:

The openHAB Cloud Notification Actions also work while the Cloud thinks the instance is offline. This is another indicator that the connection is working properly (see community topic).

If anyone sees their server go offline again, try triggering a notification from openHAB to see if it reaches your phone.

1 Like

Normal operation (for me at least) has just ping/pong messages in openhab.log, so nothing was out of the ordinary with the logs at the time of the recent Google Home issues.

I agree that whenever the problem occurs, a few people report here (and in other threads) having the issue at the same time.

It was more frequent in Jan/Feb 2022 that is correct. My first case of this I saw in Dec 2021. Even if it was more frequent before, it is happening way too often. For me personally an acceptable failure rate of home automation is perhaps once per year, and I’m now at 10-20 per year with this Google Home ↔ Openhab connection.

This type of issue might be very hard to diagnose, so one suggestion could be to add a secondary ping/pong mechanism on a higher network layer that run quite infrequently, perhaps once every 4 hours. This could help self-heal the broken connections when they occur and also gather diagnostics data on the server on how often and in which circumstances this type of issue occurs.

I have set this up now on one of my buttons in the home so I will try it the next time this problem happens.

I find the notifications menus in myopenhab quite weird. When I went there today all the notifications were from 4 years ago (mostly “openHAB is online”). The pager mechanism showed me there were 6 pages of notifications to list, but pressing pages 4 to 6 just showed empty lists.

Pressing the “Today”-button didn’t do anything, even though I had a single test notification I published earlier today. Nothing happened when I pressed the “Last week” button either.

When I pressed the “All” button, the page changed from “Notification” to “Events” (7 events were displayed: online/offline events from 2 days ago to 1 month ago).

I don’t know if that behavior is correct.

Yep, not sure if this has been investigated here in the community in detail. One idea that has been raised is to have HTTP GET to home.openhab.org and see if it times out. I am not sure if this is trivial to implement as you cannot (to my knowledge, at least) use uuid+secret to get through the HTTP Basic authentication.

I don’t have new information to share but can confirm that I also am affected by this. Wife acceptance factor is starting to be affected by this.

Same behavior as described:
Google home stops being able to reach openHAB.
Check my openHAB.org and my instance appears Offline.
Check my local instance and immediately after I press the “save” button in the openHAB cloud integration options, my instance switches to online in the cloud openHAB.
Google home is once again able to reach out to openHAB.
I don’t have exact dates or hours but I’ll be keeping an eye on this thread now.

I don’t know the code, but I was thinking more of having an application level ping/pong message exchange using the same socket.io connection. My understanding was that the existing ping/pong watchdog mechanism was at a lower level of the socket.io library.

I was thinking of a ping/pong exchange that actually required some verifiable data to be returned by the myopenhab.org server, and if that data is not retrieved correctly, then the openhab connector assumes the connection is dropped and reconnects. This mechanism being more cpu/net expensive and possibly requiring more server resources could be run at a (much) lower frequency than the existing socket.io ping/pong.

This is just my thinking without knowing any of the code of either server/connector, so I will apologize in advance if what I just wrote is pure gibberish. :slight_smile:

I just noticed that I’m not able to reach openhab.org, and created this post:
Openhab.org not reachable?

Do you guys reckon this might be related??
If I use a vpn in germany I can reach to openhab.org. This is weird…

No, should not be related as the two are pointing to different servers at different hosters.

The troubleshooting versions (for openHAB 3.2.0 and for openHAB 3.3.0-SNAPSHOT) have been updated.

If this works for you without issues I am proceeding to merge it to main codebase. Good improvements for everyone.

Appreciate all the testing for this, please report if it runs without issues.

You can upgrade from previous troubleshooting version as follows:

  1. Remove the addon using MainUI
  2. Reinstall the addon
1 Like

Thanks for the update. I have uninstalled the old addon and reinstalled the new one today.

For some reason I was unable to uninstall the old troubleshooting connector. Every time I clicked uninstall on either the troubleshooting connector or the regular openhab cloud connector entry I got an error message in the UI, but nothing in the logs. Restarting openhab didn’t help, so I ended up doing the following:

  • Cleared the cache and reset permissions. After that nothing worked, UI was just all black every time I tried to restart openhab.
  • Downgraded to 3.2.0 (stable).
  • Upgraded to 3.3.0~S2902 (I was using 3.3.0~S2784 before)
  • Uninstalled the connector
  • Restarted openhab and verified no connector was installed
  • Installed the new 3.3.0 snapshot troubleshooting cloud connector (bundle org.openhab.io.openhabcloud:3.3.0.202205150929)

Everything seems to be working ok now, and I see the ping/pong messages and other trace logs from CloudClient and OkHttpClient.

Note that I am not present at this house very much in the summer months. I will check the install remotely periodically and every time I am at that location.

2 Likes

Quick update: Everything has worked flawlessly for me since early May. Will continue testing with my configuration unchanged until further notice.

1 Like

Funny you should say that. I appear to have had one of the phantom disconnect/connects early this morning, after going a long time without needing to restart it.

1 Like

All, those of who experience the problem. Next time you have the offline issue, can you verify whether cloud notifications work and reach your mobile devices and web?

Old discussions in GitHub do tell that they indeed might still work.

This would further narrow down the issue on cloud side.

3 Likes

Dear, I had downtime yesterday after 7 days. The cloud suddenly stopped working. The notifications kept working though.

I noticed when I restarted my Telenet router that the problem was solved. I think it’s a problem with the reconnection of the cloud.

Hi, I am also having this issue every few weeks or so. Notifications keep working. Mobile apps and Google Home are not working (they cannot reach my local openhab instance any more via the cloud).

2 Likes