Check, if openHAB cloud Connector is online?

didn’t think about that! of course, myopenhab.org also has a REST API… That would be a nice item for my Zabbix installation :wink:

1 Like

I did add a item for my Zabbix monitoring, which checks, if one item is “ON” via myopenhab.org REST API.

Still feels a bit off, shouldn’t there be some functionality within OH3 local API, if everything works as intended? Perhaps analog to a Thing state the REST API could tell, if something hasn’t the normal state? could be difficult, because AFAIK there’s no central state for each bundle?

I think the big problem you will face is if the cloud connector add-on knew that it was disconnected from the cloud server it would have logged it. So even if there were some API or something to query, the add-on likely still thinks it’s connected so you’d still get the wrong answer.

hmmm. I’m no developer, but shouldn’t an endpoint know, if it’s not connected anymore? I thought, the connection originates from the local openHAB3 installation, so if it’s not connected, then it should throw an error in some way or another? on the other hand, if it’s technically difficult to fetch that information from within openHAB core, then of course it makes sense, that as I experienced, the logfile was unaware of the disconnected state…

It depends. But if it did, wouldn’t it log that it’s no longer connected or failed to connect? Normally it does so but it didn’t in this case. So either the log itself failed somehow or the add-on still thinks its connected.

It does but unless there is some sort of regular heartbeat, one side could close and the other side wouldn’t know that happened until a couple of missed heartbeats.

This doesn’t come from core. It comes from the Cloud Connector add-on.

It’s not so much that the log file is unaware. The lack of a log statement shows that the part of the Cloud Connector add-on that handles the case where it discovers it is no longer connected didn’t run (or failed to run to completion). Hence my warning. If the code that processes a disconnect didn’t run, who knows if the add-on itself knows whether it’s still connected or not?

Obviously there can be ways to force it to check or something like that but my point is if the log file is missing I see no reason to think that the add-on itself knows it’s been disconnected at that time.

If someone is interested, here is a JS 2021 rule, I made:

String String_heartbeat        "Cloud Connector connectivity heartbeat"
configuration: {}
triggers:
  - id: "1"
    configuration:
      cronExpression: 0/27 * * * * * *
    type: timer.GenericCronTrigger
conditions: []
actions:
  - inputs: {}
    id: "2"
    configuration:
      type: application/javascript;version=ECMAScript-2021
      script: >-
        // Generate a random string and set it as state for the heartbeat item
        var rnd = (Math.random() + 1).toString(36);
        var heartbeathItem = items.getItem("String_heartbeat");
        heartbeathItem.postUpdate(rnd);

        // Get the heartbeat item through cloud connector and compare the value with the generated value
        var response = actions.HTTP.sendHttpGetRequest('https://myopenhab.org/rest/items/String_heartbeat', {"Authorization": "Basic YOUR_BASE64_USER:PW"}, 1000);
        var responseJson = JSON.parse(response);
        var isOk = responseJson.state === rnd;
        //console.log("REST RESULT: " + responseJson.state + "; EXPECTED: " + rnd + "; IsOk: " + isOk);

        if(isOk){
          // If values match, do a heartbeat on the remote monitoring
          var response2 = actions.HTTP.sendHttpGetRequest('https://pulse.webgazer.io/YOUR_GENERATED_ID');
          //console.log("HEARTBEAT SENT: " + response2);
        } else {
          console.log("Cloud connector is down or something bad happened." + response);
        }
    type: script.ScriptAction

And here is the monitoring (free) service:

Looks like a good candidate to be published as a rule template in the Marketplace.

@abal, thanks for sharing your script. I got it working but wanted to add code to restart the cloud connector and also maintain a detailed log - learning EMACScript is not a priority at the moment. I converted the first part of your script to Blockly, then called a python script (passing the expected value) via the EXEC Binding.

It might be worth increasing the timeout value on your sendHttpGetRequest - I was getting intermittent timeouts (sometimes a bunch in a row) at 1000ms.

I am checking every 15 minutes. Since the 25th, there have been 11 timeouts (currently set at 4 seconds) but retrying the HTTP GET works. If I disabled Cloud Connector on my OpenHAB server, I see error 500 “OpenHAB is offline”. On October 24 between 08:15 and 09:05, I had a mix of error 500 and error 504 “Gateway Time-out” errors - for two cycles, stopping/restarting the Cloud Connector from my side did not resolve the problem. So far, I have not seen any cases where the retrieved random value did not expect the expected value.

@nh905, increasing the timeout is a good idea, but what is I think a better solution is to use some kind of tolerance for the failures. I’ve also extending my script right now with an action, to solve the problem when the cloud connector goes offline. I’m running my OH in a container and using Portainer as docker management UI. Portainer has an API through what I can restart the whole OH container. Restarting OH always helps here. I’m not sure, if the binding restart is enough, but if yes, then it would be a better solution. You can also share your python script here.
If you are interested, I can share the current script. It was debugged, but not yet verified in the real action :slight_smile:
In my experience the cloud connector disconnects at at least once in two days and is able to recover in 5 minutes in cca 90% of the cases. But this means, it hangs permanently cca ones in two weeks and it cannot automatically recover.

@abal , my code will try twice to fetch the heatbeat value, restarts the cloud connector, then repeats the sequence before sending me an emaikl.

My python code got a good workout on October 28-29 (https://community.openhab.org/t/disconnected-from-the-openhab-cloud-service/132335/64 that suggested a problem with myopenhab.org itself. Since then, my python code has reported the odd timeout but the code successfully retries without requiring any restarts. I have considered restarting OpenHAB if all else fails - I have cases where the local server cloud connector showed connected but myopenhab.org showed offline, but I have not had any reoccurrences since I put the monitoring in place.

I need to fix a minor issue with the python code and also tidy it up a bit. When I get a chance, I will post the code to github and write up how to invoke the python script. I can see some OpenHAB users wanting your native solution while others may be more familiar with python.

Here is the updated code, which restarts the OH container through portainer API, if the outage is taking longer than 10 min:

configuration: {}
triggers:
  - id: "1"
    configuration:
      cronExpression: 0/27 * * * * ? *
    type: timer.GenericCronTrigger
conditions: []
actions:
  - inputs: {}
    id: "2"
    configuration:
      type: application/javascript;version=ECMAScript-2021
      script: >-
        var rnd = (Math.random() + 1).toString(36);
        var heartbeathItem = items.getItem("String_heartbeat");
        heartbeathItem.postUpdate(rnd);

        var response = actions.HTTP.sendHttpGetRequest('https://myopenhab.org/rest/items/String_heartbeat', {"Authorization": "Basic __AUTH_COMES_HERE__"}, 5000);

        var isOk = false;
        try{
          var responseJson = JSON.parse(response);
          isOk = responseJson.state === rnd;
          //console.log("REST RESULT: " + responseJson.state + "; EXPECTED: " + rnd + "; IsOk: " + isOk);
        } catch (err){
          isOk = false;
        }

        var lastHeartbeatItem = items.getItem("Num_heartbeat_last_live");

        if(isOk){
          var response2 = actions.HTTP.sendHttpGetRequest('https://pulse.webgazer.io/__WEBGAZER_ID_COMES_HERE__');
          // Update the last heartbeat date, we will use it for downtime calculation
          lastHeartbeatItem.postUpdate((new Date()).getTime());
        } else {
          console.log("Cloud connector is down or something bad happened. (API response: " + response + ")");
          
          // If the connection is broken for more than 10 minutes, but internet connection is live, then restart OH container
          var now = new Date();
          var lastHeartbeat = now;
          if(lastHeartbeatItem.state !== 'NULL'){
            lastHeartbeat = new Date(parseInt(lastHeartbeatItem.state));
          }
          var delta = now - lastHeartbeat;
          var downForSec = Math.abs(delta)/1000;
          console.log("Last heartbeat: " + downForSec + " sec ago");
          // if down for more than 10 min, then check internet connection and if alive (means, that the OH cloud connector is only broken) then restart container
          if(downForSec > (10 * 60)){
            console.log("Cloud connector is down for more than 10 minutes, restarting OH container ...");
            // Update the last beat, to avoid cyclic restarts right after the startup (first script execution)
            lastHeartbeatItem.postUpdate((new Date()).getTime());
            var res = actions.HTTP.sendHttpPostRequest('http://__PORTAINER_IP__:__PORTAINER_PORT__/api/endpoints/2/docker/containers/OpenHAB/restart', '', '', {"X-API-Key": "__PORTAINER_API_KEY__"}, 60000);
            console.log("Restart container: " + (res !== null ? "OK" : "FAILED"));
          }
        }
    type: script.ScriptAction

This would be a great submission for a rule template in the marketplace.

1 Like

A restart of the binding seems to work as well.

Does anyone knows, if any of the issues documented on github is reflecting this behaviour?

Doesn’t the Binding auto reconnect?

The main issue is here: Connection error recovery failed · Issue #134 · openhab/openhab-cloud · GitHub

1 Like

Thank you @abal . I’ve implemented your Script. Works like a charm.

  • OpenHAB also runs in a container (Docker). Since there is no portainer in my installation, I just stop OpenHAB and docker then automatically restarts the container (restart_policy: unless-stopped);
  • I trigger the script every minute.

Question: To trigger the stopping of OpenHAB I used “executeCommandLine” in a DSL rule (see bellow). Would it be possible to trigger this directly in JS 2021? Sorry for this amateurish question.

JS 2021:

var rnd = (Math.random() + 1).toString(36);
var heartbeathItem = items.getItem("String_heartbeat");
heartbeathItem.postUpdate(rnd);

var response = actions.HTTP.sendHttpGetRequest('https://myopenhab.org/rest/items/String_heartbeat', {"Authorization": "Basic __Acc_PW___"}, 5000);

var isOk = false;
try{
  var responseJson = JSON.parse(response);
  isOk = responseJson.state === rnd;
  // console.log("REST RESULT: " + responseJson.state + "; EXPECTED: " + rnd + "; IsOk: " + isOk);
} catch (err){
  isOk = false;
}

var lastHeartbeatItem = items.getItem("Num_heartbeat_last_live");

if(isOk){
  // Update the last heartbeat date, we will use it for downtime calculation
  lastHeartbeatItem.postUpdate((new Date()).getTime());
} else {
  console.log("Cloud connector is down or something bad happened. (API response: " + response + ")");

  // If the connection is broken for more than 10 minutes, but internet connection is live, then restart OH container
  var now = new Date();
  var lastHeartbeat = now;
  if(lastHeartbeatItem.state !== 'NULL'){
    lastHeartbeat = new Date(parseInt(lastHeartbeatItem.state));
  }
  var delta = now - lastHeartbeat;
  var downForSec = Math.abs(delta)/1000;
  console.log("Last heartbeat: " + downForSec + " sec ago");
  // if down for more than 10 min, then check internet connection and if alive (means, that the OH cloud connector is only broken) then restart container
  if(downForSec > (10 * 60)){
    console.log("Cloud connector is down for more than 10 minutes, restarting OH container ...");
    // Update the last beat, to avoid cyclic restarts right after the startup (first script execution)
    lastHeartbeatItem.postUpdate((new Date()).getTime());
        
    osgi.getService("org.openhab.core.automation.RuleManager").runNow("reboot",true, {"trigger_type": "event"});
    
    //console.log("Restart container: " + (res !== null ? "OK" : "FAILED"));
    console.log("Restart container");
  }
}

DSL rule:

executeCommandLine(Duration.ofSeconds(10),"bash","/openhab/runtime/bin/stop");
1 Like

In ECMA2021 it should be something like this:

var Duration = Java.type("java.time.Duration");
var stopOH = actions.Exec.executeCommandLine(Duration.ofSeconds(10), 'bash', '/openhab/runtime/bin/stop');

Theoretically it should work with time.Duration so you don’t have to import the Java Duration. If that doesn’t work an issue should be filed to do the same magic for Duration that is done for ZonedDateTime.

1 Like