Are the any issues with myopenhab.org? Since August 23rd, my cloud connector monitor has been regularly reporting connection errors and restarting the Cloud Connector. The openhab.log was full of messages such as:
2024-08-28 15:24:21.095 [WARN ] [io.openhabcloud.internal.CloudClient] - Socket.IO disconnected: transport error
2024-08-28 15:24:21.100 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 7f...4e, base URL = http://localhost:8080)
2024-08-28 15:24:21.201 [WARN ] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: EngineIOException xhr poll error. Reconnecting after 30000 ms.
2024-08-28 15:24:22.658 [WARN ] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: EngineIOException xhr poll error. Reconnecting after 30000 ms.
2024-08-28 15:24:31.213 [WARN ] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: EngineIOException xhr poll error. Reconnecting after 30000 ms.
2024-08-28 15:24:49.760 [WARN ] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: already connected. Reconnecting after 30000 ms.
2024-08-28 15:25:36.902 [WARN ] [okhttp3.OkHttpClient ] - A connection to https://myopenhab.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
2024-08-28 15:25:36.909 [WARN ] [okhttp3.OkHttpClient ] - A connection to https://myopenhab.org/ was leaked. Did you forget to close a response body? To see where this was allocated, set the OkHttpClient logger level to FINE: Logger.getLogger(OkHttpClient.class.getName()).setLevel(Level.FINE);
I disabled my cloud connect monitor in case it was causing problems and rebooted the openHAB server but am still seeing:
2024-08-28 15:41:22.784 [WARN ] [io.openhabcloud.internal.CloudClient] - Socket.IO disconnected: ping timeout
2024-08-28 15:41:22.788 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 7f...4e, base URL = http://localhost:8080)
2024-08-28 15:41:41.951 [WARN ] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: EngineIOException xhr poll error. Reconnecting after 1490 ms.
2024-08-28 15:42:10.935 [WARN ] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance: already connected. Reconnecting after 2173 ms.
2024-08-28 15:42:16.701 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = 7f...4e, base URL = http://localhost:8080)
Connecting to https://myopenhab.org/ via a browser claims I am online, although the history shows lots of offline periods. https://home.myopenhab.org/ sometimes works but sometimes hangs and then displays an empty screen, “504 Gateway Time-out” or “openHAB connection error: HttpClient@1b4c146{STOPPED} is stopped”. https://status.openhab.org/ claims 100% availability for the myopenHAB Cloud Service.
Today’s issues may be unrelated to what happened earlier. I did add code to my cloud connector monitor this morning to do a “netcat -vz myopenhab.org 443” which always showed a successful connection even if attempts to fetch data failed with “Read timed out” or “500 Server Error: Internal Server Error”.
I just looked at my logs and I am seeing a lot of connection timed out errors. I tried pinging myopenhab.org and get nothing back, but if I use nc to try to connect to port 443 it’s successful.
I think something is going wrong but perhaps it’s subtle. I restared OH earlier today when I updated Docker and I just restarted now and it appears I’m connected. I’ll watch it for a bit to see for how long.
@rlkoshak thanks for confirming my observations. Restarting OH helps sometimes but not for long. Going over my logs, I see “Read timed out” errors for months, but usually a second attempt was successful. I might only need to restart the Cloud Connector once every few days when a second attempt failed I started seeing five to 10 restarts a day from August 19 to 22 (I monitor every 15 minutes, or 96 times a day). August 23 had 48, August 24 had 79, August 25/26 dropped to 9 and 10, August 27 had 47, and August 28 had 54 until I disabled monitoring.
Some more observations. There were no cloud errors in the openhab.log since 18:07 EDT August 28, so I started my cloud connector monitor at 19:08. It runs every 15 minutes, updating a local heartbeat with a random value, fetching the value from myopenhab.org, and comparing that to what was set. If there are issues or a mismatch, it waits 60 seconds before trying again. On the second failure, it restarts the cloud connector and repeats the above process before exiting. Here is the log with successful checks removed.
19:30:01 single timeout
22:00:00 first timeout
22::01:06 second timeout, restarted cloud connector
22:02:45 third timeout
22:03:50 500 Server Error, restarted cloud connector
22:05;16 first timeout, reported multiple successive failures
22:15:00 single timeout
22:30:00 first timeout
22:31:06 second timeout, restarted cloud connector
22:32:37 third timeout, successful match
23:00:00 first timeout
23:01:06 second timeout, restarted cloud connecter
23:02:36 third timeout
23:03:42 fourth timeout, restarted loud connector
23:05:13 500 Server Error, reported multiple successive failures
23:15:00 single timeout
23:30:00 single timeout
00:00:01 first timeout
00:01:06 second timeout, restarted cloud connector
00:04:00 single timeout
05:15:00 single timeout
05:30:00 single timeout
08:45:00 single timeout
09::00:00 single timeout
10:00:00 single timeout
After a run of multiple successive issues between 22:00 yesterday and 01:06 this morning, I am seeing good health checks with the odd single timeout. I can increase the timeout value to 10 seconds to see if that helps.
I’m not sure what the issue is, there were not any errors in the logs, and all servers were running. I went ahead and booted our socket.io servers that OH connects to, if that clears this up then there is probably a very slow object leak in the code somewhere, those processes have been running for a long time.
The last disconnect I see in my logs was at 2024-08-28 16:36:58.522 MDT. It’s been smooth sailing since then for me. I don’t know whether that corresponds to when you restarted the socket.io servers or not.
I too have been seeing some disconnects recently, but I would like to send a big thank you to @digitaldan for his work improving the reconnect process. It used to be that these disconnects and reconnects would eventually cause a permanent loss of connection, but now it always seems to reconnect gracefully. This is so much better. Thank you @digitaldan!