Autoreboot of OpenHABianPi when things go wrong?

My main OpenHAB system is 2.1 running on a windows box. At the moment it’s like that because I’m comfortable running and supporting windows machines.

However OpenHABianPi appeals to me because it’s easy to upgrade and setup things out of the box like an MQTT broker etc. So I have a raspberry PI up and running with the latest version of OH2.

I came home today and couldn’t access the sitemaps or in fact anything on the OpenHABianPi. I could ping it but anything else failed and SSH just seemed to do nothing when trying to connect.

Obviously the quick fix was just to power cycle the pi, which I eventually did, and it came back good as gold with no issues.

However if this was my production system I would not have been able to open the gate, disable the alarm and open the garage without reverting back to manual controls.

So my question is what kind of “watchdog” type plans do people have in place to recover from issues like this? I guess I could set something up that power cycles the Pi if I can’t SSH into it or something for a set amount of time. However a cold reboot like this makes me worry about SD card corruption and all that.

Any comments and suggestions would be appreciated.

At the OS level the RPi should be very stable. Even if openHAB were to crash out or just get messed up in its own world it really shouldn’t touch the OS and SSH should be fine. I guess it could be slow to respond if a process has topped out usage, might want to check logs and see what was on screen if you’ve got one plugged in.

My guess would you’ve got more problems with the RPi than just openHAB crashing it be it power supply, cable or SD card.

I’ve been running an RPi 2 (plain raspbian light headless) 24/7 without openHAB for 16 months or so without issue but plenty would say I got lucky and I’d be tempted to agree with them.

Personally I’d only consider using RPi 3 now fro OH if you’re looking for more production level stability. That’s because I would USB boot it from an SSD ideally or HD if you’re pushed. SD cards are known to wear and be unpredictable hence unreliable, but also don’t just boot it from a USB stick because lots of USB sticks are just SD cards in disguise.

I happened to have a 32GB mSATA card sitting around so bought a USB enclosure and a properly rated RPi 3 power supply and have upgrade my RPi 2 to 3. Lots of discussion around the place of power supplies and USB cables also not being of decent quality so again don’t just grab any old one you have lying around. The Pi is great and I’ve got a stack of them but it’s cheap because it doesn’t have all the fancy power management and electronics you get when paying more.

I’ve set up heartbeat style items in some areas, like the Pi Zero wired in to my heating system. It’s running a simple Python script that is subscribed to OH via MQTT and turns ON/OFF a relay on my boiler when OH business logic sends the right command. But the Pi also sends an ON command (heartbeat) every fives minutes to an OH item, rules OH side monitor and react appropriately so I know if my Pi Zero is in need of CPR. That’s fine for OH monitoring external devices but who monitors the monitor?! I did the same in OH so it does a sendCommand ON to an item every five minutes and as I have several client devices like my heating Pi they also monitor and report via email if OH has gone quiet.

Appreciate I just said only use RPi 3 but then say I’m using a Pi Zero but in my setup it is nothing more than a glorified dumb switch. It’s about risk levels and I don’t run OH on the Pi Zero. If it failed I’d know quickly due to the heartbeats, it can be turned off and the boiler reverts to manual so still functional for the user and I can rebuild it easily enough, raspbian light core OS install, few apt-gets, drop in the python script and a little cron mod.

Ultimately though any instability in OH should be resolved with a service restart and NOT a system restart. A forced reboot/power cycle etc should be a very rare requirement and as you say carries its own risks re corruption.

RaspberryPi has a built in watchdog. It’s a shame I don’t know any english website about the functionality but at first sight I think this will do:
The watchdoy is configurable an can react on various events. You could write a rule in openHAB which touches a file once a minute. You would have to configure the watchdog to wait 2 or 3 Minutes for a touch on this file before doing a reboot.

If using a SD-Card the best way would be to use it read only. There are several howtos about this in the internet. :slight_smile:
Of course you should then write any data to an external drive or, even better, a nfs share at your LAN.