Failover System for a Raspberry Pi at a Remote Location

FWIW, there’s a feature upcoming in openHABian that lets you mirror your SD card to another SD in a card reader.
Yes that’ll require some helping hands on the remote site, but combined with a full set of (cold) spare hardware, that’ll allow you to recover fast from almost any type of outage.


Thanks all for the responses!
Really appreciate the offer of scripts. Will need to think about this some more before reaching out.

Two identical Pis is an interesting solution. Not failover as such, but perhaps achieves the same results for my situation (there’s absolutely no onsite help available at the remote location!).
Presumably I can try to use the second pi to reboot the first pi - in case of a crash.

Does this two-identical-pi-configuration cause issues if all of the Openhab clients/things are (only) http-based? Presumably more network traffic but perhaps this would be manageable… Am thinking it may just work… Thoughts very welcome!

Just for information sake

If you had a Velbus infrastructure, you can have as many instances of openHAB (and others) as you wish.
(Either through 1 TCP gateway, or multiple USB interfaces)

That largely depends on the devices you want to operate. How do they respond when they get the same trigger twice. For example, a z-wave switch doesn’t care if you tell it to switch on twice. On the other hand if you’d like to receive a notification when something occurs then you will now get two notifications (one from each openHAB instance). It becomes troublesome when you have a device that toggles on/off using a single command, then the first openHAB instance may toggle it on and the second instance will toggle it off (or vice versa).

As a tried to explain in my post above:
Each of the identical pi knows if it is active or passive.
Only the active one executes the switching, notifications, etc.
So, I see no harm.
It’s running like a charm.

If u use bindings that „hiccup“ with more than one oh instance connected, then you may have to take other ways. But all my bindings are without „hiccups“.


How do you test the failure?

At work we use different ways of making it quick to fix a problem. Hundreds of people stand idol while I run around fixing it.

Spending your effort in getting the system stable is more beneficial than writng more unstable code that is hard to test.

One method we use is to run vm’s on enterprise servers. We had a memory dimm die I replaced it 0 downtime.

Another is to have 100% cold backup. 2 identical peices of hardware with a physical changeover switch so you can only power one at a time. I used it once because of corrupted sd card. Flick 1 switch and fix later

One is automated the other is manual wich one do you want to put you time towards?

Writing software is like making love becase sometimes you have to look after it for the rest of its life.


1 Like