How to Restore Second System Remotely

Hi there,
I had a similar discussion here:
Redundant OH

Maybe there is something in it for you.

For me that selfmade failover works perfect.
Never had an out of service situation since.

CU Thomas

well assuming you are moderately tech savy. Build the second Pi. configure a secondary IP on the 'Active ’ Openhab Pi. Install openhab with all necessary hubs (zwave/zigbee etc) that have already been configured into the fabric as secondary. config up openhab, and install Nagios monitoring. you can then use nagios to monitor various situations, such as no ping response for more then 10 minutes (thus allowing a reboot cycle to complete.), at which point you assume a hardware failure and add the floating IP on the secondary and you’re back in business.
You could simultaneously monitor the a URL parsing the reponse to make sure the Openhab is running and responding and in the event it fails to respond, say twice in a row, again move that floating IP to the secondary and start openhab… bang you have monitored HA,and you can have the monitor SMS or email you of the condition as well as taking corrective action.

I have two Aotech ZWave hubs in my ZWave Network one as primary the other as secondary…
runs like a champ

This is good to hear. To clarify, can the same z-wave device pair with multiple Aeotec controllers? If not, how do the primary and secondary Aeotec sticks work together?

I’m even okay not automating the fail over switch to a secondary Pi and OH instance. If I detect the first OH system is offline and I can’t get it running again remotely, then I can SSH into the second Pi remotely and start that OH service.

I’m just not sure how to share one Aeotec stick or if I should have a second Aeotec stick.

Good to know. I assume you mean the Smart Hubs? I’d see more value in adding that then a third RPi that’s only there to be a Z-Wave controller, but it depends on the cost.

I’m also curious about this. I would assume that securely included devices can only be connected to one or the other, but perhaps it’s in Aeotec’s model to allow for that.

I think you could make this even simpler. Just put them both on WiFi smart plugs. If your primary fails and you can’t reach it, turn it off and turn on the other one.

powercycling your pi is a bad way to reboot or power down.

ZWave devices don’t so much pair as join the mesh. ZWave is a wireless mesh protocol hence why most ZWave devices can act as relays for other ZWave devices. Unlike Bluetooth where it pairs with a device and then the two devices only see each other through that pairing.
Therefore having two hubs in the meshis not a problem unless both start sending conflicting commands.

Having remote updates is important and its not bad to do it. Yet you need to have a fail safe setup which can do automatic rollback.
Most of production setups in the wild (ie. smarthome gateways people buy) use a basic strategy of “double” copy of the system. Then, there is a special tool (mender, swupdate, rauc to name few) which can manage boot process and its variables. It do switch a partition which is used to launch system if new update is installed. If you combine that with hardware watchdog (to force restart on boot issues), you always have a way to rollback.
While openhabian does a lot to keep updates seamless it can’t give you any guarantees since operating system, as a whole, is in your control, not the tool. Having another PI just to control one which you intend to update will give you some visibility across network, but will rather not automatically fix the root cause of any failure.
Nowadays operating systems are quite complex and have many ways to fail. I’d say that if network is up then it is solvable. You always can go over VPN in there and tweak configurations. It is much worse when network does not get up. Then your second PI should have access to serial console to let you fix network.

Best,
Łukasz

The biggest flaw I see is running two instances using the same bindings wich connect to identical hardware. Depending on how the hardware is designed and how the binding works, this can easily lead to non-binary situations: e.g. at a time while moving from OH2 to OH3 my DoorBird was live on both instances - and the hardware of the DoorBird was unable to react to polling from two instances in parallel plus the App, and crashed now and then and was very fuzzy then… deactivating the binding on the old OH2 solved the issue instantly.
So be careful with parallel use of the same hardware - apart from logic points like hardware, which addresses a specific openHAB instance (e.g. Nuki Binding, which uses a specific callback), where you need to either change the IP address of the callback within the hardware itself or boot up the second Pi with the same IP address.

As pointed out, openHAB isn’t designed for that, but for quick manual interference.
What would ease the pain, but still uses another SPOF:

  1. place an Raspberry Pi on the remote location
  2. let it run openHABian and put in redundant necessary hardware (e.g. a second Z-Wave stick, …)
  3. connect it to a smart switch to power it up remotely
  4. lack the possibility to use one-hardware at a time auto swapping (e.g. RS232 serial connection, smartmeter connections, …)

Long story short: we’re talking smartHOME, not smartHIGHAVAILABILITY, so we’re all dependend on manual work to cope with failure situations…
I run a remote openHAB (250km away in a remote mountain cottage, LTE only) and even upgraded openHAB2 to openHAB3 from home via VPN. But! after 3 years, something’s wrong with the power adapter and my remote Pi suffers from under-voltage. For Covid-reasons my last visit was exactly a year ago - apart from that, the whole setup was stable since 4 years now.

Nope. Any ZWave device has a single controller only it’ll send it’s messages to (lifeline association).
Which is one reason why this “just make it redundant” approach won’t work.
As @rlkoshak quoted me, this stuff is hard. This isn’t dumb fridges.

And this is just for one specific technology (zwave). It’ll be different for Wi-Fi, KNX, ZigBee, …

I’ve only ever had the one Zwave controller but shouldn’t it be possible to use the backup and restore capability on some zwave controller to create a clone? Basically backup the production controller and restore it to the backup. Then have the backup plugged into a clone of the RPi, but leave this RPi turned off. Then in a failure situation one can power off the production RPi and power on the backup RPi and everything should be OK (assuming the networking is set up correctly).

Of course this too adds a new system to turn on/off the power which could itself fail. TANSTAAFL

1 Like

Of course it should, yes, but it has it quirks. Do routes get restored as well or will your home take half a day to regenerate them ? Does it work across HW and SW versions ? Cross vendor ?
I’m at times manually syncing my zwave controller to a backup device, but it didn’t work a single time without some issue of one or the other kind.

Just like there exist active-standby carrier-class redundant firewalls, loadbalancers and the like that do work (without human intervention being required in most cases).
But those only exist because a helluva lot of design thinking, engineering, testing efforts AND TIME went into making them work reliably, and that’s the thing here: the OP greatly underestimates the required efforts and remaining risk to get that stuff built and working right [you know this I’m not telling you but the OP and others to read].

Yep. It’s not even cheap.

2 Likes

Yep, that’s why I wrote a tutorial for a reboot/shutdown switch, but if the Pi has already failed then your options are pretty much limited to “turn off the power”. :wink:

Another thought on this, taking a different approach, at least to mitigate the risk of OpenHab system software problems bringing down the system in a way that’s unfixable remotely:

Consider a virtualized system, where OpenHab is deployed within a VM (or perhaps via Docker) and communicates with the Z-Wave adapter plugged into the host via USB passthrough capabilities. Get the VM host into a state where it’s rock solid, and don’t mess with the VM host when you’re not on-site to maintain it.

Then, you can leverage VM snapshotting and cloning capabilities to allow for roll-back (or even cut-over to a spare) should updates break things.

This sort of approach can make sense if the VM host and its hardware are very stable and reliable themselves, and the greatest concern is instability of OpenHab software (and/or it’s dependencies) upon configuration changes and upgrades.

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.