Thanks for addressing what has been a mildly vexing issue and for providing a guide that a user who is not a developer can follow and understand what is going on. Often when someone provides help, it is in the form of âdo xyzâ. Then I spend an hour or two searching the docs and the forum to figure out how to do xyz.
Once I get a little runtime, I will report back on how this is working.
Yeah, and I do that too. When weâre explaining something in response to a question/problem, itâs very easy to gloss over steps that seem like common sense, but are really learned through repetition. "Writing a tutorialâ is a different mindset from âresponding to a questionâ.
Iâve added a version of the rule that has a counter so that users can get a sense of how frequent the restarts are (as opposed to successful reconnections. I stopped short of actually calculating the success rate of reconnections.
Oh, I see. That wasnât in the old version, but it was more relevant to a draft I didnât publish. I had used a switch, but replaced it with a string. Iâll update that.
Yes, we need to eventually solve this ongoing issue, which is fairly recent (relative to my four years using openHAB). Hence:
No, it doesnât make sense to me for myopenHAB to have a recovery mechanism.
Individual OH servers go offline all of the time for various reasons (reboots, power outages, Internet outages, upgrades, etc.). We wouldnât want myopenHAB to keep trying to reconnect to servers that are actually offline, and thereâs no way for it to know if thatâs the case. Iâm actually not sure if that would even be possible (but Iâm not a developer). I suspect that itâs not.
Hi, this makes no sense to me. Why canât we have a recovery within the cloud connector binding?
I am having the same issue and implemented an item for calling the exec script to restart the cloud connected many months ago. Meaning the issue is not the server but the local cloud connector.
Everytime this happens, the cloud connector had a disconnect and states that it is reconnected while it is actually not.
A simple handshake between the local cloud connector binding and the server should be enough.
OpenHAB reconnects: Hello myopenhab, I am still here
Actually, there may not be enough evidence to draw a conclusion either way. What we know is:
myopenHAB does not think the OH server is connected
the OH server thinks its connected to myopenHAB
the OH server can still send notifications through myopenHAB (only verified by some users)
If the server werenât actually connected, that third point wouldnât be possible. Thatâs why some of us think that the problem is with myopenHAB.
If itâs simple, then Iâd encourage you to take a look at the code for the cloud connector and try adding it. Iâm not a developer, so I donât have have the ability to contribute on that end. I would if I could.
Short of that, the solution Iâve posted above is essentially a handshake. I just chose to test if a command can be received through the REST API instead.
I think we have pretty well established the fact that client is reconnecting the connection successfully. There is handshake in which client talks to server and server talsk to client. There is a regular check that communications work, with one party sending ping, and another responding pong. These basic healthcheck and handshake things actually all come from Socket.IO protocol, based on websocket technologies.
We have several reports that notifications go through, even while cloud shows that instance is offline. Actually, to my knowledge, there is zero reports that notifications would not work in this weird state.
The thing is that openhab cloud is tracking separately which clients (uuids) are online, which are offline. Whenever we get a new connection handshake, we update the status to online. Whenever there is disconnect (whether it is due to ping/pong failure or âcleanâ disconnect; does not matter), the status is updated offline.
This online/offline status tracking is implemented within the openhab project (openhab-cloud repo), backed by a database.
Unfortunately it remains a mystery why the online/offline tracking is not working as expected. There could be a bug within Socket.IO library on server/client (e.g. missing âconnectionâ/âdisconnectâ event). Or perhaps there are some race conditions which leads to updating to wrong state on cloud side (**). The Socket.IO library versions are oldish, perhaps update would help? Update is not trivial, backwards compatibility with old clients needs to be considered, and there is no proper means to test this out safelt.
This is quite hard to solve since
this is a volunteer project with limited hours put into this, it is a âcharityâ, not a business. All the maintainers etc. have probably day jobs etc.
the debugging on cloud side is depending on those volunteer hours, limited opportunities to debug âend-to-endâ, debugging e.g. one specific client and trying to see how it looks on cloud side
thereâs performance topics to be considered, I have understood the myopenhab.org free service is actually quite well used
**) I spent some time staring at the cloud side code with help from digitaldan and we did find one race condition. This was fixed some time ago but clearly it was not the (main/common) root cause people are experiencing
Thinking back to @SeeAgeâs comment, would it be possible to make a version of the cloud connector that sends an online message to myopenHAB periodically (without having to disconnect and reconnect)?
It is already done in practice in the form of âpongâ messages. This is part of socket.io protocol
On client side, one can build custom logic based on those messages. For example, oh logs nowadays the time between ping/pong. I presume same is possible with server side.
I guess the thing is that making a database call (that is where the online/offline status is stored) all the time might notâŠfly in practice. Comes back to the topic of limited pre-production
testing capabilities.
Then again, this gave me an optimization idea: could we check only new connections on cloud side, and verify the online status is correct?
Probably such a thing is good to spread out over time (30s after connect but at latest 2min after connect for example), so that things do not crash and burn when large swarms of client connect at once (eg when server restarted)
And of course what I described in the above is my hunch onlyâŠwho knows if there is something else that is actually broken.
Hi all, i may have a solution to our âsplit brainâ issue going on here. There are a number of changes in the works right now, but a big one will be a push iâm doing tomorrow that will hopefully ensure our cloud code only allows 1 connection from a authorized UUID to try and connect. The issue i think we have now is that the underlying socket.io retry logic tries a couple of times to reconnect in the background, while those happen serially on the client, due to load balancing, DB calls, redis calls, etc., the cloud service can actually be processing those in parallel which means a connection that the client gave up on, may finish after a good connection is made, which then overwrites the DB with the wrong server address and we no longer know how to route proxy connections to it.
We also have some changes to the cloud addon to connect more gracefully that we will try and get into the next 3.4.x release as well as 4.0
Is this a fix on the cloud server or do I have to upgrade my openhhab server? My myopenhab connection is still breaking after some time.
We also have some changes to the cloud addon to connect more gracefully that we will try and get into the next 3.4.x release as well as 4.0
The next OH 3.4 release (which may be this weekend or next) will have a reconnect logic fix in the binding. Any other fixes needed will be done on the cloud side.
thanks for your workaround, saved me a lot of nerves during continuous disconnections and missing notifications trough OH Cloud. And in the End pointed on the reason which causes the issue:
My Connection brakes down with the following log:
âError connecting to the openHAB Cloud instance: already connectedâ
â Checking time and Events in my Network I figured out that it always happened during the forced disconnection of my internet connection trough the provider. Anyone knows if there is a possibility to solve this without restarting the Cloud Service?
The Reason why I ask is, when the service is restartet, all rules are reloaded, so some runtime Data gets lost, or is there any possibility to restart the service without all rules reloaded?
My Setup:
Openhabian 2.5.12 on RPI 3B+
Thanks again for your nice Workaround
Greetings Andy
Hey Russ, thanks for your reply.
To be trueâŠI am thinking of an upgrade to OH3.x since a long timeâŠlooks like this is the trigger to start with the work
Greetings Andy
There are a lot of conversations about upgrading right now, as some users realize that theyâre on 2.5 and see that OH4 is targeting a release later this year. If you have time, Iâd suggest moving to 3.4.2, which is a significant leap forward and will prepare you for OH4. Otherwise, youâll be dealing with breaking changes from two major releases at once.