Openhab cloud connection stops working, does not try to reconnect

…just to clarify, when the Google Home connection doesn’t work, the last CloudClient log line in openhab.log is:

Connected to the openHAB Cloud service (UUID = .....

…so according to the log, the connection is up, but when checking on myopenhab it says I am offline. This can go on for several days, until I do the Save-clicking trick to force a reconnect. During the downtime, no reconnect is attempted. I have to do it manually. It seemingly doesn’t know that the connection is broken.

It is the third time this happens for me with openhab 3.2.0. Once in 2021, then after the January 24 myopenhab downtime and now today.

In January after all the problems, the cloud connection finally reconnected 2022-01-25 07:36:43, but the connection was actually broken until 2022-01-27 13:15:55 when I clicked Save.

This time, there was a reconnect log entry at 2022-02-22 20:55:22, but the connection was actually broken (myopenhab offline and google home not working) until 2022-02-23 11:45:13 when I clicked Save.

Is the cloud connection utilizing keep-alive functionality to validate the connection is actually up?

Let me know if there is anything I can do to help diagnose this issue.

Thanks,

  • Sean

There is keepaline mechanism in the library used byt the openHAB and server-side. However, it seems clear it is not working as it should, at least not always.

The best users can do is to have the TRACE (best) or DEBUG (ok) level logs to tell what is really happening and whether the client side is noticing that connection has been “lost”.

(…second attempt, needed to hide the UUID/SECRET in the log file)

Hello. I now have it reproduced again, this time with TRACE logs.
openhabcloud_trace.txt (55.2 KB)

After a ping timeout it successfully (?) reconnected on the 7th attempt at: 2022-03-01 08:03:11.891

I am not sure when it stopped working as I was away for several days. When I got back March 4, it didn’t work.

I pressed “Save” to make it work again at: 2022-03-04 19:07:44.717.

Let me know if there is anything else I can test.

Thanks,
Sean

1 Like

(…second attempt, needed to hide the UUID/SECRET in the log file)

Probably we should remove logging of secrets…

I can see few things from the log

  1. The enhanched functionality to reconnect (introduced recently) is “kicking in”, Error connecting to the openHAB Cloud instance: <error detail>. Reconnecting after <xxxx>ms.. This is good
  2. Comparing the successful reconnection on Ma4ch 1 and ‘forced’ reconnection (by pressing Save) on March 4, looks exactly the same
2022-03-01 08:03:11.989 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connected
2022-03-01 08:03:11.990 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = ***UUID***, base URL = http://localhost:8080)
2022-03-01 08:03:11.991 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
<nothing logged after this for days. No errors, no nothing>
2022-03-04 19:07:45.086 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connected
2022-03-04 19:07:45.087 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = ***UUID***, base URL = http://localhost:8080)
2022-03-04 19:07:45.088 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
<assumedly nothing logged after this. End of log file attached>

I need to see if it would be possible to add even more log to understand what is happening

Thanks for the feedback.

I am assuming that the connection was working after that 7th reconnect on March 1 (marked in the log as successful), but I can’t say for sure since I wasn’t home.

My guess is that the connection was lost some time after that between March 1 and March 4, but I can’t say that for sure either.

I agree it would be great to have some more logging, especially on those pings and the lower level Socket.IO stuff. Let me know, and I will reconfigure, run test versions, etc.

I never had any problems with this until some time in the end of 2021. This year it has happened quite frequently. Now almost every week. I was running openhab 3.1 the first time it happened. Upgrading to 3.2.0 didn’t help.

Some additional info that might not be relevant, but I am adding it regardless.

  • Mini-PC running openhab: ASUS VIVO-Mini UN45-VM065M (4GB RAM, 256GB SSD)
  • OS: Ubuntu 16.04.7 LTS (GNU/Linux 4.4.0-219-generic x86_64) with extended updates
  • java package: Azul Zulu 11.54+23 (11.0.14-b9) JDK
  • openhab package: openhab 3.2.0-1
  • Internet connection: 350mbit fiber (Telenor)
  • Network hardware: Ubiquiti Unifi
  • Cat 6A wired connection, not using WiFi, static IPs
  • Location: South of Oslo, Norway
  • Traceroute to myopenhab.org shows packets are routed through:
    - Oslo, Norway
    - London, England
    - Frankfurt am Main, Germany
1 Like

Try with this one:

org.openhab.io.openhabcloud-3.3.0-SNAPSHOT.jar.pdf (38.2 KB) (commit da5bd42e5fd9b8db34c6cc118d895ac43b212d78, git branch lives here). Rename file as .jar. (OUTDATED. See instead openHAB marketplace entries)

Debug logging has been enhached and it now logs ping/pong from the Socket.IO library as well as HTTP calls, example below.

I have also “sensored” UUID and secrets, printing only first and last 2 characters of each.

5:46:51.693 [TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
15:46:51.693 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
15:46:51.699 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connecting
15:46:51.707 [INFO ] [okhttp3.OkHttpClient                ] - --> GET https://myopenhab.org/socket.io/?EIO=3&transport=polling http/1.1
15:46:51.846 [INFO ] [okhttp3.OkHttpClient                ] - --> GET https://myopenhab.org/socket.io/?EIO=3&transport=polling http/1.1
15:46:51.880 [INFO ] [okhttp3.OkHttpClient                ] - <-- 200 OK https://myopenhab.org/socket.io/?EIO=3&transport=polling (33ms, 99-byte body)
15:46:51.880 [INFO ] [okhttp3.OkHttpClient                ] - <-- 200 OK https://myopenhab.org/socket.io/?EIO=3&transport=polling (172ms, 99-byte body)
15:46:51.880 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO OPEN
15:46:51.880 [TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
15:46:51.880 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
15:46:51.881 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
15:46:51.882 [INFO ] [okhttp3.OkHttpClient                ] - --> GET https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED http/1.1
15:46:51.882 [INFO ] [okhttp3.OkHttpClient                ] - --> GET https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED http/1.1
15:46:51.977 [INFO ] [okhttp3.OkHttpClient                ] - <-- 200 OK https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED (94ms, 4-byte body)
15:46:51.977 [INFO ] [okhttp3.OkHttpClient                ] - <-- 200 OK https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED (95ms, 4-byte body)
15:46:51.977 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connected
15:46:51.977 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = 56...3a, base URL = http://localhost:8080)
15:46:51.977 [TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
15:46:51.978 [INFO ] [okhttp3.OkHttpClient                ] - --> GET https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED http/1.1
15:46:51.978 [INFO ] [okhttp3.OkHttpClient                ] - --> GET https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED http/1.1
15:46:52.311 [INFO ] [okhttp3.OkHttpClient                ] - <-- 200 OK https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED (333ms, 3-byte body)
15:46:52.311 [INFO ] [okhttp3.OkHttpClient                ] - <-- 200 OK https://myopenhab.org/socket.io/?EIO=3&transport=polling&sid=SENSORED (333ms, 3-byte body)
15:47:16.882 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
15:47:16.916 [DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong
1 Like

Slightly improved version:

org.openhab.io.openhabcloud-3.3.0-SNAPSHOT.jar.pdf (38.6 KB) (commit 4ce01d530e) (OUTDATED. See instead openHAB marketplace entries)

Logs connection “upgrade” from HTTP to websocket, PACKETs socket.io messages (on connect), pong milliseconds.

EDIT: beware! Some secrets are still logged.
EDIT2: You could also try to see if TRACE level logging works for “io.socket”

1 Like

I have never tried installing such a thing manually in openhab before, so I had some unsuccessful attempts at first. Was finally able to get it in by upgrading to 3.3.0 SNAPSHOT and not uninstalling the existing addon first before copying the new .jar to /usr/share/openhab/addons. If I uninstalled the existing addon first there was resolve error, not finding io.socket.backo.

Now I see the ping/pong messages in the log:

[TRACE] [io.openhabcloud.internal.CloudClient] - Manager.EVENT_TRANSPORT
[TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
[TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO Packet: CONNECT (0)
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO connected
[INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = aa...c3, base URL = http://localhost:8080)
[TRACE] [io.openhabcloud.internal.CloudClient] - Transport.EVENT_REQUEST_HEADERS
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 44 ms
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 58 ms
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 44 ms
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO ping
[DEBUG] [io.openhabcloud.internal.CloudClient] - Socket.IO pong: 44 ms

In the middle of all of this I also tried to enable more socket.io logging. I did the following, but I don’t think it accomplished anything:

(/usr/share/openhab/runtime/bin/client)
log:set TRACE io.socket.backo
log:set TRACE io.socket

I used that approach to successfully enable TRACE logging on org.openhab.io.openhabcloud. Let me know if you know any other approach to enable more logging.

It will be interesting to see if it can be reproduced again now with the new version and more logging. Usually I try avoiding changing too much for each troubleshooting step, but now also my entire system has been upgraded from 3.2.0-STABLE to 3.3.0-SNAPSHOT, so hopefully that doesn’t interfere with this testing.

To be clear, I installed the .jar you labeled “Slightly improved version”.

1 Like

Thanks!

In the middle of all of this I also tried to enable more socket.io logging.

OK it might be it’s not just working since the library uses different logging framework and I am not sure if openHAB is configured to handle those.

now also my entire system has been upgraded from 3.2.0-STABLE to 3.3.0-SNAPSHOT, so hopefully that doesn’t interfere with this testing.

I am sorry to hear that you had to update the whole system. It should not be necessary, it should work with openhab 3.2.0 stable as well. Indeed the dependency issues are a thing, we can have a closer look if you decide to revert back to stable (not sure if it is possible even!)

1 Like

Don’t worry, I am ok with updating to 3.3.0-SNAPSHOT to do this testing. I think perhaps the issue I had all along was that the custom .jar won’t load without having the existing openhabcloud addon there already, if not I get a resolve error for the module io.socket.backo. I tried installing 3-4 other addons in the hope that some of those would have a dependency towards io.socket.backo and pull it in that way, but I wasn’t successful.

I am not sure why I can’t run the new .jar alone. Now I am installing it by copying it to /usr/share/openhab/addons. Let me know if that is the incorrect procedure. I have never done this before. I am a retired software engineer, but have hardly touched Java in my career and also never done any openhab debugging, testing or development.

I also wonder if there is another way to pull in the io.socket.backo module. I don’t even know how module dependencies are configured in openhab.

What slightly worries me about having both the existing addon installed and the new .jar too is that I now have two “openhab Cloud” entries in the UI. It would be bad to have dual connections to the myopenhab cloud service I guess. Everything seems to work fine though.

NOTE: I have also now added TRACE logging for okhttp3.OkHttpClient, and I can see the protocol switching message:

101 Switching Protocols https://myopenhab.org/socket.io/?EIO=3&transport=websocket&sid=WG......nc (587ms, 0-byte body)
1 Like

The binding has external dependencies specified in the feature.xml. To install bindings with dependencies you need the kar file instead of the jar file (build with mvn karaf:kar). But be aware that the kar file has a specific openHAB version dependency. So it only works on the openHAB version is was created with/for.

2 Likes

Thanks for the information! I searched the feature.xml files in the source code and found the Ambient Weather Binding also has a dependency for io.socket, and that seems to work as a hack to run the testing .jar alone without the existing cloud connector addon also there.

Yes… Better not rely on ambient weather, it might refer to slightly older version of socket io [openhabcloud] Reconnect on connection errors by ssalonen · Pull Request #11153 · openhab/openhab-addons · GitHub

I see. The only other alternative I have now is to load the .jar with the existing openhabcloud connector addon also loaded. I have verified now that this causes two simultaneous connections to the myopenhab cloud service and every request coming in is duplicated. I see how this can interfere with testing of connectivity issues, but it all comes down to what is the so-far-unknown root cause.

Neither alternative seems very good. Maybe I should wait until a new snapshot version becomes available with the required logging code included? or maybe you could supply a kar file instead of the jar? The openhab version I have installed now is 3.3.0~S2784-1 according to dpkg -l in Ubuntu.

Please advise. I will do whatever you think is best.

1 Like

Why not provide a kar file, which should solve the dependency issues ?

1 Like

@seanch try the version in [WIP] [openhabcloud] debug by ssalonen · Pull Request #12430 · openhab/openhab-addons · GitHub

UPDATE: I have now published the addon in the marketplace, to allow even more easy installation (and further updates)

You can find them from the MainUI marketplace view with “cloud connector troubleshooting version” keyword. Direct links for marketplace forum posts: openHAB 3.2.0 version and openHAB 3.3.0-SNAPSHOT version.

That’s awesome. I uninstalled the other test bindings I used, restarted openhab and installed the 3.3.0-SNAPSHOT troubleshooting version from the marketplace. Everything seems to work fine, and I will be testing this configuration now and report back if the connection problem can be reproduced again.

One last thing, you noted yesterday that that other binding, the ambient weather one, used a slightly older version of io.socket. When did the openhabcloud addon start using the new one, or has it always been using the newer version? I am curious if I am now with 3.3.0-SNAPSHOT also is using a better io.socket library compared to my other testing earlier this year with 3.1.0 and 3.2.0 (stable). (?)

Thanks,
Sean

1 Like

At least in the latest openHAB release uses the new one. Do not now recall whether it was introduced already in 3.1.0 or was it 3.2.0… EDIT: github archeology reveals that this came in 3.2.0

Probably does not make a huge difference but who knows

Just a quick update. I have not been able to reproduce this problem again yet, but I am still running the “troubleshooting version” with full logging and will continue to do so. I will report back here as soon as something happens.

1 Like

Today I “sort of” reproduced it, but not really. (Tuesday, April 12, 2022 at ~14:05 UTC) I had been away from the house for 5 days, and the first thing I tried after getting back was a Google Home voice command. After a long pause, I got the error message “…openhab not available…”. I don’t remember the exact phrase, but I think it was the one I usually hear when I have this problem. I waited 30 seconds and tried again but it was the same result.

Then I logged in with SSH to check the openhab.log file, but there was no errors in the ping/pong messages. It was still going, and nothing seemed out of the ordinary.

Then I logged on to myopenhab.org, to see if the status was offline or online, but the status was online, so that was ok as well.

Then I tried a Google Home voice command yet again, and that time it worked. I tried another 5 voice commands and all were successful.

So I don’t really know what caused those two first errors today, but somehow it fixed itself without having to go into the OpenHAB UI and clicking ‘Save’ (without any changes) in the cloud connector settings like I usually have to do.

I don’t even know if this is related, but I thought I should mention it here anyway, just to get all the data points in one place.

I’ll keep testing.

1 Like