Openhab cloud connection stops working, does not try to reconnect

The binding has external dependencies specified in the feature.xml. To install bindings with dependencies you need the kar file instead of the jar file (build with mvn karaf:kar). But be aware that the kar file has a specific openHAB version dependency. So it only works on the openHAB version is was created with/for.

2 Likes

Thanks for the information! I searched the feature.xml files in the source code and found the Ambient Weather Binding also has a dependency for io.socket, and that seems to work as a hack to run the testing .jar alone without the existing cloud connector addon also there.

Yes… Better not rely on ambient weather, it might refer to slightly older version of socket io [openhabcloud] Reconnect on connection errors by ssalonen · Pull Request #11153 · openhab/openhab-addons · GitHub

I see. The only other alternative I have now is to load the .jar with the existing openhabcloud connector addon also loaded. I have verified now that this causes two simultaneous connections to the myopenhab cloud service and every request coming in is duplicated. I see how this can interfere with testing of connectivity issues, but it all comes down to what is the so-far-unknown root cause.

Neither alternative seems very good. Maybe I should wait until a new snapshot version becomes available with the required logging code included? or maybe you could supply a kar file instead of the jar? The openhab version I have installed now is 3.3.0~S2784-1 according to dpkg -l in Ubuntu.

Please advise. I will do whatever you think is best.

1 Like

Why not provide a kar file, which should solve the dependency issues ?

1 Like

@seanch try the version in [WIP] [openhabcloud] debug by ssalonen · Pull Request #12430 · openhab/openhab-addons · GitHub

UPDATE: I have now published the addon in the marketplace, to allow even more easy installation (and further updates)

You can find them from the MainUI marketplace view with “cloud connector troubleshooting version” keyword. Direct links for marketplace forum posts: openHAB 3.2.0 version and openHAB 3.3.0-SNAPSHOT version.

That’s awesome. I uninstalled the other test bindings I used, restarted openhab and installed the 3.3.0-SNAPSHOT troubleshooting version from the marketplace. Everything seems to work fine, and I will be testing this configuration now and report back if the connection problem can be reproduced again.

One last thing, you noted yesterday that that other binding, the ambient weather one, used a slightly older version of io.socket. When did the openhabcloud addon start using the new one, or has it always been using the newer version? I am curious if I am now with 3.3.0-SNAPSHOT also is using a better io.socket library compared to my other testing earlier this year with 3.1.0 and 3.2.0 (stable). (?)

Thanks,
Sean

1 Like

At least in the latest openHAB release uses the new one. Do not now recall whether it was introduced already in 3.1.0 or was it 3.2.0… EDIT: github archeology reveals that this came in 3.2.0

Probably does not make a huge difference but who knows

Just a quick update. I have not been able to reproduce this problem again yet, but I am still running the “troubleshooting version” with full logging and will continue to do so. I will report back here as soon as something happens.

1 Like

Today I “sort of” reproduced it, but not really. (Tuesday, April 12, 2022 at ~14:05 UTC) I had been away from the house for 5 days, and the first thing I tried after getting back was a Google Home voice command. After a long pause, I got the error message “…openhab not available…”. I don’t remember the exact phrase, but I think it was the one I usually hear when I have this problem. I waited 30 seconds and tried again but it was the same result.

Then I logged in with SSH to check the openhab.log file, but there was no errors in the ping/pong messages. It was still going, and nothing seemed out of the ordinary.

Then I logged on to myopenhab.org, to see if the status was offline or online, but the status was online, so that was ok as well.

Then I tried a Google Home voice command yet again, and that time it worked. I tried another 5 voice commands and all were successful.

So I don’t really know what caused those two first errors today, but somehow it fixed itself without having to go into the OpenHAB UI and clicking ‘Save’ (without any changes) in the cloud connector settings like I usually have to do.

I don’t even know if this is related, but I thought I should mention it here anyway, just to get all the data points in one place.

I’ll keep testing.

1 Like

Finally today I was able to reproduce it just like it has happened several times before (this time with full logging enabled and the experimental debug version of the cloud connector).

  • Google Home voice commands stopped working. I retried about 10 times and all failed.
  • I checked openhab.log and ping/pong messages were totally fine. I check back several hours, and everything seemed normal.
  • I logged in to myopenhab.org and the status there was offline. (“Your openHAB is not online. Please check if your installation is running or recheck the openHAB settings in your account.”)
  • I tried some more voice commands, and tried reloading myopenhab (ctrl-f5), but nothing helped.
  • I rechecked the log and the only messages there were still ping/pong messages, and they were all normal.
  • Finally I logged into the local openhab web console and clicked “Save” in the Openhab Cloud settings page. That fixed everything, like it usually does.

So what I have learned today is that definitely ping/pong messages can be fine, but still myopenhab thinks the openhab instance is offline.

The only thing that was able to fix it was clicking Save in the settings page as before.

2 Likes

This makes me think if the issue could be server side… Cc @digitaldan to review above comment

Interestingly, I’m now seeing the same thing. My server disconnected/reconnected three times over a ten-minute period earlier today, but myopenhab.org showed me as offline. This is the first time for me (as far as I can remember).

As usual, saving the openHAB Cloud config to restart the cloud connector got me back online immediately.

Just realized my cloud connection had also gone down sometime a few hours ago and not reconnected.
Did the save and now back connected and running.

Did anyone else have debug logging enabled?

Not me, since I haven’t been having any issues until yesterday (and not again since restarting the connector). If it happens again, I’ll install the troubleshooting binding.

It happened to me also, I have trace logging enabled. I also see the OH crashed completely because of memory leak I guess.

@ssalonen What exactly are you looking for in the logs?

2022-05-06 21:01:05.430 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = xxx-xc1-xxx-xxx-xxx, base URL = http://localhost:18080)
2022-05-06 21:01:12.417 [INFO ] [del.script.irrigation:Main_algorithm] -  not started. checkUserStartTime[true] itmIrrigationAuto.stat[ON] isWatering[false] WaterLeakDetectedRainSensor.state[OFF]
2022-05-06 21:01:13.676 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO re-connecting (attempt 1)
2022-05-06 21:01:13.679 [TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
2022-05-06 21:01:13.686 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:19.529 [WARN ] [okhttp3.OkHttpClient                ] - A connection to https://myopenhab.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
2022-05-06 21:01:19.530 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO re-connected successfully (attempt 1)
2022-05-06 21:01:19.531 [TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
2022-05-06 21:01:19.533 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:23.694 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:26.301 [INFO ] [nternal.handler.SpotifyBridgeHandler] - Spotify returned an error during polling: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
2022-05-06 21:01:29.050 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connected
2022-05-06 21:01:29.052 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = xxx-xxx-xxx-xxx-xxx, base URL = http://localhost:18080)
2022-05-06 21:01:29.054 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:01:43.184 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@129fa48' takes more than 5000ms.
2022-05-06 21:01:51.184 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://192.168.4.221/real_time_data.xml' (method='GET', content='null') failed: Total timeout 3000 ms elapsed
2022-05-06 21:01:58.266 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@129fa48' takes more than 5000ms.
2022-05-06 21:01:53.970 [ERROR] [io.openhabcloud.internal.CloudClient] - Error during communication: {}
io.socket.engineio.client.EngineIOException: xhr poll error
	at io.socket.engineio.client.Transport.onError(Transport.java:63) [bundleFile:?]
	at io.socket.engineio.client.transports.PollingXHR.access$100(PollingXHR.java:26) [bundleFile:?]
	at io.socket.engineio.client.transports.PollingXHR$6$1.run(PollingXHR.java:140) [bundleFile:?]
	at io.socket.thread.EventThread$2.run(EventThread.java:80) [bundleFile:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
	at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.net.SocketTimeoutException: Read timed out
	at java.net.SocketInputStream.socketRead0(Native Method) ~[?:?]
	at java.net.SocketInputStream.socketRead(SocketInputStream.java:115) ~[?:?]
	at java.net.SocketInputStream.read(SocketInputStream.java:168) ~[?:?]
	at java.net.SocketInputStream.read(SocketInputStream.java:140) ~[?:?]
	at sun.security.ssl.SSLSocketInputRecord.read(SSLSocketInputRecord.java:478) ~[?:?]
	at sun.security.ssl.SSLSocketInputRecord.readHeader(SSLSocketInputRecord.java:472) ~[?:?]
	at sun.security.ssl.SSLSocketInputRecord.bytesInCompletePacket(SSLSocketInputRecord.java:70) ~[?:?]
	at sun.security.ssl.SSLSocketImpl.readApplicationRecord(SSLSocketImpl.java:1449) ~[?:?]
	at sun.security.ssl.SSLSocketImpl$AppInputStream.read(SSLSocketImpl.java:1060) ~[?:?]
	at okio.Okio$2.read(Okio.java:139) ~[?:?]
	at okio.AsyncTimeout$2.read(AsyncTimeout.java:237) ~[?:?]
	at okio.RealBufferedSource.indexOf(RealBufferedSource.java:345) ~[?:?]
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:217) ~[?:?]
	at okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:211) ~[?:?]
	at okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) ~[?:?]
	at okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[?:?]
	at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[?:?]
	at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) ~[?:?]
	at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) ~[?:?]
	at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) ~[?:?]
	at okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) ~[?:?]
	at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) ~[?:?]
	... 3 more
2022-05-06 21:02:08.139 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
2022-05-06 21:02:08.141 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO disconnected: transport error

.....

2022-05-06 21:52:24.408 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
.....

2022-05-06 23:07:53.034 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - 
java.lang.OutOfMemoryError: Java heap space


When I reproduced this issue on May 5, nothing other than ping/pong messages were present in the logs. I didn’t have any crashes, out-of-memory conditions, disconnects or anything weird at all that day, except for the Google Home commands stop working and myopenhab.org claiming my openhab was offline. I had full trace logging enabled and was using the troubleshooting binding and openhab version 3.3.0~S2784-1 on Ubuntu.

...
...
... hours and hours of ping/pong messages and nothing else since my last Google Home voice command, which was successful...
...
...
2022-05-05 11:42:08.721 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:42:08.771 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:42:33.773 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:42:33.823 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 49 ms
2022-05-05 11:42:58.825 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:42:58.875 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:43:23.878 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:43:23.927 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:43:48.929 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:43:48.979 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 50 ms
2022-05-05 11:44:13.981 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
2022-05-05 11:44:14.030 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 49 ms

I am not sure what I am looking in the logs exactly but trying to find a common story.

It seems that we have now two confirmed cases with ping pong active but myopenhab.org saying that instance is online.

This new piece of information! To me it is saying that the client and server are connected and we can conclude that from (socket.io) client point of view everything would seem to be fine. Ping/pong watchdog is running still, and not stuck, for example.

Another piece of information is that both of you seemed to have issue at the same time, pointing to possible server side root cause.

We can also say that this is fairly uncommon scenario, I personally haven’t had any issues for months. At some point it was certainly more frequent.

Hmm. To understand the root cause, I am not sure we can do much without myopenhab.org maintainer in the loop, looking how things are connected in server side.


One more thing: when things are running normally, is there anything other than ping/pong in the logs?

EDIT: Turns out similar findings have been made already 2y ago by @wborn . Connection error recovery failed · Issue #134 · openhab/openhab-cloud · GitHub

Interesting. I’m curious about this:

The openHAB Cloud Notification Actions also work while the Cloud thinks the instance is offline. This is another indicator that the connection is working properly (see community topic).

If anyone sees their server go offline again, try triggering a notification from openHAB to see if it reaches your phone.

1 Like