How to automatically restart cloud connector after an unexpected disconnection

Update: February 4, 2023

The issues with myopenhab connections appear to have been solved. Hooray!


Over the past year or so, users have found that their openHAB server are unexpectedly disconnecting from myopenHAB and then immediately reconnecting. However, myopenHAB continues to think that the OH server is offline, so you can’t access it remotely.

While our developers continue working on the problem, I’ve cobbled together a solution for automatically restarting the cloud connector when this happens. It’s based on these posts:

Constructive feedback and suggestions are welcome and appreciated.

Note that there are other solutions in the community that use different methods. Feel free to mention them in this post so that we can collect them all in one spot.

1. Install bindings and transformations

Start by installing these add-ons, if you don’t already have them:

2. Turn on Implicit User Role

Check that the Implicit User Role is ON in your API Security settings (it most likely is).

3. Make an unbound item to store the testing message

Create a string item that has no channel attached to it. Mine is called myopenHAB_Connection.

When the connnection test starts, the string will be updated to Testing. We’ll then send a command through myopenHAB to change the string to Online. If the command succeeds, it’s not necessary to restart the cloud connector.

If you want, you can add this item to a page or sitemap as an indicator of the cloud status.

4. Make a thing and item to send a POST command to myopenHAB

  1. In MainUI, manually add a thing using the Exec Binding. You can edit the ID and label to be whatever you want them to be (mine is called Post_Test_Command).

  2. Enter the following statement in the Command field, substituting in your myopenHAB username and password:

    curl -u username:password -H "Content-Type: text/plain" -X POST -d "Online" https://myopenhab.org/rest/items/myopenHAB_Connection

    This curl command changes the string in the myopenHAB_Connection item to be Online.

  3. Click the “Create thing” button to save your work.

  4. Create a switch item that’s connected to the Running channel of your exec thing (I’ve called my item Post_Test_Command). When this item is turned ON, it will execute the shell command and then reset itself to OFF.

5. Make a thing and item to restart the cloud connector

  1. In MainUI, add another thing using the Exec Binding. You can edit the ID and label to be whatever you want them to be.

  2. Enter the following statement in the Command field:

    openhab-cli console -p habopen bundle:restart org.openhab.io.openhabcloud
    

    This shell command will log into the console and restart the openhabcloud bundle.

    NOTE: habopen is the default password for the console (which is different from the password you use to log into your server via SSH. Most users leave it as `habopen’, but if you’ve changed it then you’ll need to change it here as well.

  3. Click the “Create thing” button to save your work.

  4. Create a switch item that’s connected to the Running channel of your exec thing (I’ve called my item Restart_Cloud_Connector).

6. Add the exec commands to the exec.whitelist

Before you can execute a shell command, it has to be added to the exec.whitelist file, which you’ll find in the openhab-conf/misc folder on your OH server.

  1. Open exec.whitelist in a text editor and insert the commands we used in our exec things:

    curl -u username:password -H "Content-Type: text/plain" -X POST -d "Online" https://myopenhab.org/rest/items/myopenHAB_Connection

    openhab-cli console -p habopen bundle:restart org.openhab.io.openhabcloud

    Don’t forget to update your username, password, and item name (if you changed it from myopenHAB_Connection). The commands in your exec things must be identical to the whitelist.

  2. Save and close exec.whitelist.

7. Test your exec items

At this point, your items should work. You can add them to a page or sitemap and toggle them on to see the results in your log.

8. Make a thing to monitor the log

  1. In MainUI, manually add a thing using the Log Reader Binding. For the ID and label, I’ve used:

    • ID: openhabcloud
    • Label: LogReader: openHAB Cloud
  2. In the Custom Patterns field, enter the following string:

    Connected to the openHAB Cloud service
    

    The thing will monitor the log and update whenever it sees this exact string of text.

    NOTE: We installed the REGEX Transformation earlier so that it can be used by this item (using the default settings). You don’t need to do anything else with REGEX.

  3. Click the “Create thing” button to save your work.

    The log reader thing has an an advanced channel called New Custom Event. It can be used to trigger a rule without needing an item (though you can also make a switch if you prefer).

    So, let’s make that rule.

9. Restart the binding whenever a disconnect is detected

Apologies for the fact that I’m still using Rules DSL. In a nutshell, this rule does the following:

  1. Triggers whenever openHAB reconnects to myopenHAB.
  2. Updates the myopenHAB_Connection string item to Testing
  3. Sends a command through the cloud to change myopenHAB_Connection string item to Online
  4. Checks if myopenHAB_Connection changed to Online
  5. Restarts the binding if the Online command was not received

And that’s all. The hard part is really getting the things/items set up, but only if you’ve never used the exec or log reader bindings before.

Some things to note:

  • The rule will trigger every time it sees the “Connected to…” message in the log. So, it will loop continuously until it successfully reconnects. For this reason, I’ve set a five-minute waiting period in case there are multiple disconnects/reconnects in a brief span of time.
  • The rule will not run unless it sees the “Connected to…” message. So, if there’s an actual myopenHAB outage, we won’t keep pinging it with this rule and potentially making things worse.

As I said earlier, constructive feedback and suggestions are welcome and appreciated.

var Timer Cloud_Test_Timer = null

rule "Restart cloud connector following an unsuccessful reconnection"
when
    Channel "logreader:reader:openhabcloud:newCustomEvent" triggered
then
    //Cancel any running timers. This is in case you have multiple disconnections/reconnections in a short time frame.
    Cloud_Test_Timer?.cancel

    //Set the testing status message
    myopenHAB_Connection.postUpdate("Testing")

    //Post a command to reset myopenHAB_Connection through the cloud
    Post_Test_Command.sendCommand(ON)

    //Wait 300 seconds, then restart the cloud connector if myopenHAB_Connection has not been updated to "Online" by the REST command
    Cloud_Test_Timer = createTimer(now.plusSeconds(300),
    [|
        if (myopenHAB_Connection.state == "Testing")
        {
            logInfo("openHAB Cloud", "Restarting cloud connector due to unsuccessful reconnection")
            sendNotification("russ@scatterthought.com", "openHAB Cloud connector restarted")
            Restart_Cloud_Connector.sendCommand(ON)
        }
        else
        {
            logInfo("openHAB Cloud", "Successful reconnection to myopenHAB")
        }
    ])
end

Bonus rule with counter

Just for fun, this version has a counter that increments every time there’s a successful reconnection and resets when the cloud connector is restarted. The counter requires another unbound item (similar to myopenHAB_Connection), which I’ve called myopenHAB_Connection_Success.

var Timer Cloud_Test_Timer = null
var Counter = null

rule "Restart cloud connector following an unsuccessful reconnection"
when
    Channel "logreader:reader:openhabcloud:newCustomEvent" triggered
then
    //Cancel any running timers. This is in case you have multiple disconnections/reconnections in a short time frame.
    Cloud_Test_Timer?.cancel

    //Set the testing status message
    myopenHAB_Connection.postUpdate("Testing")

    //Post a command to reset myopenHAB_Connection through the cloud
    Post_Test_Command.sendCommand(ON)

    //Wait 300 seconds, then restart the cloud connector if myopenHAB_Connection has not been updated to "Online" by the REST command
    Cloud_Test_Timer = createTimer(now.plusSeconds(300),
    [|
        if (myopenHAB_Connection.state == "Testing")
        {
            logInfo("openHAB Cloud", "Restarting cloud connector due to unsuccessful reconnection")
            sendNotification("russ@scatterthought.com", "Cloud connector restarted after " + myopenHAB_Connection_Success.state.toString + " successful reconnections")
            Restart_Cloud_Connector.sendCommand(ON)
            myopenHAB_Connection_Success.postUpdate(0)
        }
        else
        {
            //Initialize the counter so that the rule won't fail due to a NULL value
            if (myopenHAB_Connection_Success.state == "NULL") { myopenHAB_Connection_Success.postUpdate(0) }
            //Increment the counter after a successful reconnection
            var Counter = ((myopenHAB_Connection_Success.state as Number) + 1)
            myopenHAB_Connection_Success.postUpdate(Counter.toString)
            // logInfo("openHAB Cloud", "Successful reconnections to myopenHAB: " + Counter.toString)
        }
    ])
end
19 Likes

Some others have implemented a check to actually verify “broken” connection by making a http call to myopenhab.org

The approach here is more simple, and simply restarts when connection is disconnected. The logic is “conservative” in the sense that many disconnects are reconnected succesfully but this rule still restarts the addon, just to ensure it is working 100%

I am concerned that this elevates the “thundering herd” problem even more (assuming the rule would be used in masses). Exponential backoff or random jitter would be quite a good thing to add here? Tagging cloud side expert @digitaldan for comments as well

The more complex approach (http call to myopenhab.org) has the benefit it avoid excessive addon restarts, only acting in the rare case the connection is “broken”.

That’s an excellent point. Anecdotally, I think most/all of my disconnections are not successful in reconnecting (and it happens very infrequently). But I don’t have data to back that up and I wouldn’t want to put unnecessary stress on myopenhab.

I know what all of those words mean individually, but not when they’re used together. :wink:

I was going for the simpler approach to help out users with less expertise, but I don’t imagine it would be too hard to add in a step to do an HTTP call before triggering the restart. I just don’t know how to do it. Do you have that info handy?

We could even count how often a forced reconnection is necessary.

1 Like

The more complex approach with http is eg here Check, if openHAB cloud Connector is online? - #11 by abal

Re exponential backoff with jitter, the concept is explained fairly well in Exponential Backoff And Jitter | AWS Architecture Blog

Thanks. Rather than checking a heartbeat, I’m thinking that I’ll toggle a switch over HTTP, and then check if the switch has successfully toggled before restarting the connector. Do you see any issues with that?

1 Like

Many ways probably, I think even HTTP GET item request (i.e. trying to get item state) should fail and trigger the logic?

I’ve rewritten the rule with the “toggle an item” solution. Seems like the most straightforward solution for users who are less experienced, and also allows for a useful status message in a page/sitemap. Plus, it just amuses me more.

I haven’t introduced exponential backoff and random jitter. I now grasp the concept thanks to the article you showed me, though. :wink:

2 Likes

I think this comment is still for the old version? I got confused when reading the code.

I think now with this script it should not try restart unless it is really necessary…I like that the logreader trigger is “connected”, not “disconnected”: The addon itself tries the reconnect with exponential backoff so this is now more of a fail-safe when we end up with “broken” connection


//Wait for 10 seconds, then post a command to reset myopenHAB_Connection through the cloud
    //It shouldn't be necessary to wait longer than 10 seconds, since we're just sending a command
    Cloud_Test_Timer = createTimer(now.plusSeconds(10),
    [|

Could you avoid the wait by having another rule, triggered by “myopenHAB_Connection” state update to “Testing”?

I tried to copy your integration of this automatically restored but I have one issue.
At the moment I click on “Create Thing” something happens at point 4 (1-3 are ok) in the background but I can not find the created Thing from the Exec-Binding in my “Things” list.

But the thing is definitively created which I can see in the Log.

My question is now, How can I create a switch which refers to the running channel? So I struggle between point 4.3 and 4.4

But my further approach would be to create a rule which turns on a with on my UI where I have restarted the cloud connector manually in the past at the moment the switch at point 4 in your description turns to ON.

Short Update:
The Thing was created but not under the name I expected.
All Exec-Things will be first named as “Base” in the Things-List. I named the “Exec-Thing” TestCloud and searched under T for the Thing.
So Problem is solved.

Love it. Great work and a very good setup guide. :+1:

The new version seems to be a bit more complex but I think it’s the better way of checking the connection to myopenhab. Maybe „more complex“ because of the curl command against the API.

Will the myopenHAB_Connection item be marked as „Online“ after the system starts? Or will the rule engine not be ready to run rules before the cloud connection gets established when the system starts?

That’s for the 300-second waiting period before checking the status of myopenHAB_Connection. It’s just there to give time for multiple disconnect/reconnects. It’s not absolutely necessary, but I wanted to give some time for everything to settle before potentially restarting the cloud connector.

Actually, I think the 10-second wait period just isn’t necessary at all, since we’re now triggering the rule on “Connected to…” If it’s connected, the command can be sent immediately. I’ll remove that.

I’m not sure what you mean. When you create an exec thing, you’ll get something like this:

image

The thing list uses labels, but I don’t know why it would include “Base” in the name.

I’m also not sure what you’re saying here. If you want, you can put your item directly into a UI so that you can manually trigger it.

Thanks! I try to make my tutorials easy for beginners, and that mostly means taking time to explain why we’re doing something (without getting too technical). When someone is very familiar with a task, it becomes easy to skip steps that would not be obvious to others. In this case, I don’t spend much time on creating items since OH users should know how to do that. The challenging parts are really the unfamiliar exec and logreader things.

Yeah, exactly. It’s a simpler rule than the first version, but the curl command is daunting if you haven’t done it before. I relied on this post to figure it out.

That’s a good question. Since the general state of myopenHAB_Connection is Online, I believe that it will be persisted as such on a system shutdown/restart. I don’t think the rule would run when the system is shutting down, so it won’t be set to Testing. On a restart, I think the cloud connection will be reestablished before the rule runs.

I’m not using myopenHAB_Connection in a UI, so I haven’t tested this. Even if it is tested on restart, the rule will still run properly afterward.

Thanks for addressing what has been a mildly vexing issue and for providing a guide that a user who is not a developer can follow and understand what is going on. Often when someone provides help, it is in the form of “do xyz”. Then I spend an hour or two searching the docs and the forum to figure out how to do xyz.

Once I get a little runtime, I will report back on how this is working.

Yeah, and I do that too. When we’re explaining something in response to a question/problem, it’s very easy to gloss over steps that seem like common sense, but are really learned through repetition. "Writing a tutorial’ is a different mindset from “responding to a question”.

I’ve added a version of the rule that has a counter so that users can get a sense of how frequent the restarts are (as opposed to successful reconnections. I stopped short of actually calculating the success rate of reconnections. :wink:

I get it but comment still talks about openHAB_Cloud_Status and “toggled off by the REST command”

Instead, I think the REST commands set myopenHAB_Connection as “Online”?

Oh, I see. That wasn’t in the old version, but it was more relevant to a draft I didn’t publish. I had used a switch, but replaced it with a string. I’ll update that. :wink:

1 Like

I use this Item in my UI. Works fine for now and looks good :wink:

While I really appreciate you effort. Wouldn’t it make more sense if the cloud connector implements some sort of recovery mechanism by itself?

Yes and no.

Yes, we need to eventually solve this ongoing issue, which is fairly recent (relative to my four years using openHAB). Hence:

No, it doesn’t make sense to me for myopenHAB to have a recovery mechanism.

Individual OH servers go offline all of the time for various reasons (reboots, power outages, Internet outages, upgrades, etc.). We wouldn’t want myopenHAB to keep trying to reconnect to servers that are actually offline, and there’s no way for it to know if that’s the case. I’m actually not sure if that would even be possible (but I’m not a developer). I suspect that it’s not.