I have Openhab running on Windows 11. Yesterday Windows had to do an update and then of course restart. When I restarted, Openhab got stuck on runlevel 70, which I didn’t notice right away, but that’s just a side note.
My actual question in this context is how do you check whether Openhab is still running? My wall display shows it, but I don’t always look at it .
My family tells me very quickly.
Two key questions here:
- how do you want to define that openhab is not running (i.e. one binding not working? two? three? one thing offline? or two or three?)
- what is the action once you find out openhab is not running
This will tell you if you need to monitor openhab internally (by openhab itself) or from external.
As an example, you could write a shell script which sends a command via rest api to an item and if the response is not 200 you could start further actions.
But as said, this is one indicator which might mean that openhab is not running, but it could be something else although openhab is running fine.
The most reliable solution is the WAF as hafniumzinc already said😀
- Zabbix sends me an email if the docker container is not running or crashed
- Lights do not turn on/off when expected
- I bring up the app on my phone or the browser and it doesn’t come up
- someone in the family asks me why the lights didn’t turn on/off when expected
Beyond the lights, most of my automations are more subtle and their failure wouldn’t necessarily be noticed right away.
I can’t remember the last time OH went offline unexpectedly for me though.
If Openhab fails due to an update, it probably can’t report it itself. So it would have to be checked externally.
Yes, the family does report when something isn’t working, but if it fails at night and I don’t notice it in the rush early in the morning, I’m really stressed when I get home from work…
You might try something like this and assuming you want to auto recover you could do like others have mentioned by making calls to the api since you are running on windows 11
you could create a scheduled task that runs a PowerShell script periodically and makes a call to the api and then take an action to recover so if you had it set as a service and it returns an error you could pass a net stop openHAB then net start openHAB
I use some like this example to handle auto recovery.
create a API token copy it into the script
Obviously update to the values you use for log file location
and make sure you set your PowerShell execution policy
Call the script using a reoccurring scheduled task (may need to elevate to administrator depending on what your pc is set up with
When your scheduled task runs at your defined interval it will check that openhab is up responsive returns the systeminfo that includes uptime for openhab startlevel and current resource usage and logs it then it exit.
However, if it hits a error and finds it is not behaving normally it will try to stop openhab service and then try and start it then appends entire powershell output results to your log file.
Of course if it fails to start then at next scheduled time it will try again but if openhab comes back up clean it will simply log systeminfo at next scheduled task.
$Logfile = "C:\openhab\Logs\proc_$env:computername.log"
function WriteLog
{
Param ([string]$LogString)
$Stamp = (Get-Date).toString("yyyy/MM/dd HH:mm:ss")
$LogMessage = "$Stamp $LogString"
Add-content $LogFile -value $LogMessage
}
try
{
$APIUri = "http://your openhab IP:8080/rest/systeminfo"
$AccessToken = "" #Replace your OAuth Access Token
$Headers = @{Authorization = "Bearer $AccessToken"}
$Response = Invoke-RestMethod -Method Get -Uri $APIUri -Headers $Headers
$Response | ConvertTo-Json
$Result = $Response| ConvertTo-Json
WriteLog $Result
}
catch
{
Start-Transcript -Append $Logfile
(Get-Date).toString("yyyy/MM/dd HH:mm:ss")
if ($_.Exception -is [System.Net.WebException] -AND $_.Exception.Response -ne $null)
{
$errorContent = $_.Exception.Response.GetResponseStream()
$reader = New-Object System.IO.StreamReader($errorContent)
$errorDetails = $reader.ReadToEnd()
Write-Host "HTTP error details: $errorDetails" -Foreground "Red"
net stop Openhab
net start Openhab
}
else
{
Write-Host "An error occurred: $($_.Exception.Message)" -Foreground "Red"
net stop openhab
net start openhab
}
Stop-Transcript
}
obviously you can get more elegant and do all kinds of other actions but this basic script should spark some ideas
Thanks for your detailed answer, I will try to implement it this weekend.
I have only had openHAB fully stop because:
- Memory leak in java
- Ran out of HDD space on device running openHAB due to camera recordings building up and not stored on a NAS.
- Ram drive or other space constrained ZRAM running fully out of space.
2 and 3 are outside of openhab and require monitoring and consideration when setting up how the system works.
Cause 1, I run monitoring on the system using the system doctor binding from the marketplace…
The red lines are where I did a system reboot when upgrading to the latest milestone, and the circled red area is where I got a memory leak from bad code I had written in a binding. The graph allows me to catch these issues before committing code to the project. After every update I just take a look at the graph over the next few days before I go more hands off on the system.
As others have mentioned, you could also be meaning if a single thing goes down, this happens when:
- Cloud service goes down that is required.
- Battery goes flat for wireless sensors.
- Bug in a binding not handling unexpected stuff occuring to bring the device back online again.
- Firmware update to a device breaks something that requires an update to a binding to fix.
Since things are going to break, I deliberately now design and choose gear so that they continue to work when openHAB goes down. I aim to have gear that is fully stand alone, yet can be augmented by openhab. I try to stay fully clear of a solution that totally rely’s on openHAB to be the full brain doing the heavy lifting.
Example of a bad way I don’t like where I use a wifi device to measure temp, and then another wifi device to heat with openhab doing all the logic. I choose to still do it knowing it will fail at some point…
An example of a system that I like is my watering system. It is fully stand alone to water the garden, it will automatically vary the watering amount based on weather forecasts. I use openHAB if the rain gauge reports enough rain has fallen, I can stop anymore watering for 3 days, and then stop the warning icon from showing up on my TV that warns us not to hang washing out on the days the grass gets watered. If I shutdown openHAB my grass still gets watered, but with openhab it is a nicer experience that saves water from being used, and our washing does not get drenched by the sprinklers.
A couple things to consider and keep in mind
you must have the folder already created where you want the log to be saved to.
you need to have the scheduled task call powershell.exe as the program
lets assume you create the script and save it as monitor.ps1 save it in c drive in a folder named ps
Here are some options that should provide you a pretty reliable scheduled task configuration
select run whether user is logged in or not
check the Run with highest privileges box
set the trigger to one time and repeat every 5 minutes indefinitely
action is start a program Program to start is powershell.exe
in the optional arguments you may want to add this
-nologo -noninteractive -noprofile -ExecutionPolicy Bypass -file c:\ps\monitor.ps1
select whatever additional conditions you want to be evaluated
check the allow task to be run on demand box
check the Run task as soon as possible after a scheduled start is missed box
check the if running task does not end when requested force it to stop box
check the if the task is already running then the following rule applies select Do not start a new instance
make sure when you click ok after setting this up you provide the valid password in the prompt for the user you chose to run this task as.
Right click on the task and run it one time manually and make sure it shows in status The operation completed successfully (0X0) and you see the next schedule run date as the correct date and time based on your chosen interval. Also review your log and insure it updated with all info and date time stamps.
This will insure it runs without privilege elevation troubles and reruns as defined.
Regards
I have also come to this way of thinking and design in my home.
As an example, all of my closets have a light socket with a pull chain, no switch. Replacing the socket, modifying the closet wiring, adding a smart light switch to a convient location, and a sensor to the door, for 5 closets would have been much more expensive than the value I would realize.
I added a zigbee sensor to the door, replaced the light bulb with a light bulb controllable by OH, and setup an automation that turns the light on/off when the door opens… Works just fine. If OH automation is not working, the light is operable by 1 or 2 pulls of the chain.
this is the ONLY use case for a smart lightbulb I have ever found
yes, that is the approach of building escalators not elevators @rlkoshak always says.
You should never design your home automation solution (or actually anything considered critical really) with a single point of failure.
I use a combination of a shell script + cronjob + PushBullet + Openhab rule:
The shell script is called by by cron and by the Openhab rule, each calling it once each minute. The shell script knows who is calling it based on the UID of the process, where the cron job is checking to make sure OH is running. The way I implemented is made a small RAM filesystem where the cronjob side writes a ‘down’ file with a integer in it if it doesn’t exist, or decrements the number in the file if it does, every minute. If the number gets to 0, the cronjob uses Pushbullet to send a message to me and optionally run the systemctl command to restart OH.
The Openhab rule removes the ‘down’ file every minute so generally when both are working the file isn’t found on the file system and all is well. This has caught OH failures several times. This has worked well for me for a few years.
I also have my firewall alert me if the OH server goes offline.
I am running Node-Red on a separate Pi (in my shed). I send an MQTT message to it every 5 minutes to reset a watchdog timer. If Node-Red doesn’t get the message it emails me. In turn, Node-Red sends the reset back to openHAB and if OH doesn’t get the message it emails me. So far it’s caught Mosquitto stopping once and Node-Red crashing (twice) plus me accidentally unplugging the OH machine (a few times). I now have a USB switch (controlled from Node-Red) on the input to the OH machine after an issue with memory leaks in Docker