[SOLVED] OpenHAB Server Redundancy - any Ideas?

Hi there.
Assuming I have an OpenHAB Server running on Raspberry PI or some similar hardware. We all know that nothing lasts forever and in some nice moment Raspberry crashes due to SD card fault, or Power Supply or any other reason. I have all configuration files backed up, but when it happens I will probably loose hours reestablishing my home automation.
So what about the option, if I would have second OpenHAB server, running on different hardware, but with same rules. It would run in hot standby, taking control only if primary server stops working - relatively easy to implement.
The only question is binding - particulary popular Z-Wave. Is there any possibility to have two Z-Wave Masters in same network and automatic reconfiguration if one server fails?
If not - which home radion protocol allows that?

2 Likes

Iā€™ve seen some discussion on this in the old forum (canā€™t find them at the moment). My impression was that it is klunky and very difficult to get working and it has severe limitations.

If I were just worried about the time to set up and configure it again in the case of a crash, I would run openHAB in a VM or something like Docker and keep my configs in something like git.

Hopefully someone who has seriously looked into this has some better advice.

I recently moved my openHAB server into a OpenVZ container, for this very reason. I had one home server with all my apps and services running on it - as was evident when I moved everything to VZ containers, it would take me days and days to rebuild if there was a failure.

So now that I have done it once, I have a set of containers, running a small subset of apps/services (including OH), and they all get snapshots backed up weekly to my NAS.

If anything falls over I can quickly restore a VZ backup in a matter of minutes. No hot-failover, but still very good piece of mind!

Out of curiosity, what made you choose OpenVZ over the other container technologies. I usually use Docker as the example because at least in my environment it is the one most people have heard of but Iā€™m pretty ignorant of the others.

I too currently have everything running on one server (an old laptop so at least I have battery backup :wink:) like you did and the thought of setting it all up again from scratch after a crash is kind of scary.

Thanks

Rich

Had a look at Docker but it was a bit too lightweight - i.e. they recommend each and every process should run in a separate instance, whereas I wanted to model mine more like VMs - where I have a ā€˜net01ā€™ box which has dnsmasq, openvpn etc.

Plus when I was investigating the networking side of things looked a bit tricky to setup, but I never really gave it a go.

And VMs seemed a little heavy - since all apps/services run on the same OS - Debian 8. So OpenVZ seemed like the best choice, lightweight, easy to configure, and easy to backup/maintain.

YMMV.

Seems to me there are a couple of issues here ( Iā€™ve worried about both). One is crash of main computer (RPi or other) and second is automatically passing control to a ā€œbackupā€ server.

If you have multiple Pi 's, itā€™s conceivable to have the secondary also running OH and watching the primary for (lack of) responsiveness (on several levels) and, if non-responsive, cycling the power on the aberrant box (assuming youā€™ve got separate control paths for the boxes). If that does not work, the blocking factor there is the controller (eg z-stick gen X) plugged into the primary (Pi). I do not know of a way to have a second z-stick be a hot spare. It is not even clear if there is a way to backup a z-stickā€™s internal binding info for restoration. This is a single-point-of-failure.

If anyone knows how to do the z-stick internal binding backup, that would be helpful info.

Thank you all for replies. I know that the main problem should be in binding. Z-wave probably is not the best here, but maybe some other radio bindings allow that.
Having hot standby home server is a key element of increasing reliability of home automation. And higher reliability is needed to give more critical functions to openHAB. For example currently Iā€™m worried of giving heating control to openHAB - if it fails in winter, I donā€™t know if I will be there to fix it before it becomes too cold.

Well, I have the same situation (Pi, zwave), I, too, gave it a number of thoughts, and came to the conclusion that to build a fully automatic high availability system is a lot of work and will probably still not be robust, i.e. will still fail to work in some situations or may even cause trouble that you wouldnā€™t have without a second instance (of Pi, of the zwave controller, of openHAB, ā€¦ of whatever).

For HW to fail, you can (and should) simply buy spare parts and exchange them if needed.
The by far worst case is a disk (SD card) crash, because you have to reinstall and configure many different packages from scratch. Thatā€™s tedious work and takes even longer than you imagine. My SD card crashed twice, so I know what Iā€™m talking of, and even after getting the system to work again, I was amazed how often and longtime I kept finding bits and edges of the system that I hadnā€™t remembered to properly reinstall, just because that had had no immediate visible impact.

What Iā€™m doing now is to backup the config/rules files, PLUS, after major OS or SW changes, I take backups of the Piā€™s SD card on raw SD card level using a tool like dd or WinDiskImager. You can keep 2 or 3 SD cards with copies. Any of them will still work to recover your home, even if you grab one to not have the latest version. Itā€™s just a matter of minutes, and you can even instruct someone of your family how to do it. Remember, you wonā€™t be home when it happens, but your wife will :smile:
Just remember to always verify the latest SD backup (put it into your spare Pi and boot just once).

regards
Markus

1 Like

Better yet, configure your PI with a USB drive (or stick) and have everything but the core boot logic there. Still need the SD card for core boot, but everything else on a more robust medium is a good idea.

There are several pointers to how to do this on the forum.

Agreed in principle, but it can be a lot of work, too, to get there and to maintain that setup over time. So find a trade-off that you feel comfortable with. For me, this was to outsource write-intensive logging to my NAS, but keep the rest on SD. Once you donā€™t write that much any more, they donā€™t crash that often any more.

Iā€™ve solved some of this with Heartbeat, GlusterFS, and some of Linux HA tools. The USB device is def a sticking point. I personally use an Insteon Hub thatā€™s network attached and have a backup device incase of failure.
https://groups.google.com/forum/m/#!category-topic/openhab/discussions/39L6C4Du7OM

Iā€™m currently exploring 6lowPAN (802.15.4) boards and may roll my own solution which incorporates HA in its core design. Will be a while before its prod worthy though. Iā€™m also exploring out of band door strikes, sensors and such from a hard wired, DR perspective. Not full automation, but core access and control is maintained in a DR situation.

Honestly I think the only way to have real redundancy is to rewrite openHAB from the Spring framework to the Java EE standard framework. I suspect this is not going to happen for many reasons.

I think people tend to stare at the server itself, breaking their head how to get it ā€˜redundantā€™, and overlook other risks to availability.
One thing you should definitely take care of is the zwave controller.
Note that all zwave nodes store the controllerā€™s ID, and if you have to exchange that one, it has a different ID, and your nodes wonā€™t talk to it unless you reset and (re-)include them with the new controller.
Now check for yourself what thatā€™ll mean: you have to physically access all devices again. That was fairly easy when you installed them one-by-one, but now that youā€™ve put new wallpaper and paint on top, will you still be able to access them without leaving visible traces ? Not to mention the work and time required to do so.
AFAIK you can have a secondary controller running, but itā€™ll not be able to fully take over all of the primaryā€™s functions should that one fail.

Long story short, make sure you backup your zwave controller.
Many people still use the Aeon S2 stick without knowing that it can NOT be backed up.
The Aeon gen5 Stick is said to be or to become backupable, but I donā€™t know if that does work by now.
Iā€™m using RaZberry board, which you can backup using the z-way software

@chris , would it be possible to setup two Rpiā€™s, each with its own z-stick (Gen-?) and have one of them set as SUC to handle this failover situation. Or was I ā€œabsent from class that dayā€ ?

While the SUC might provide the backup for the network layer, I donā€™t think this will actually do what you wantā€¦ The SUC will (in theory!) keep track of where nodes are so they can talk to each other etc, so in this respect, it provides a backup for the network layerā€¦

However, what you really want is the next layer up - and that is reporting of all the ā€œstuff that happensā€. So, for example, if your primary controller goes down, then the secondary controller will not be notified about things like associations. So, when your motion sensor detects motion, it will inform the controller youā€™ve configured in its respective association group - it wonā€™t know that this controller is dead and it should talk to a different controllerā€¦

For associations, you might be able to work around this in some cases since thereā€™s often the possibility to configure multiple nodes into a group, or use multiple groups for different notifications, however this is not always the case. Some devices only have the possibility to set a single node in a group (for example, the Fibaro devices have a ā€˜controller updateā€™ group, and this only has 1 node).

Assuming you can get a reasonable configuration that notifies both controllers of updates through associations, your next problem is with battery devices. The WAKEUP command class which is used to notify the controller that a battery device is awake and can be configured can only be set to a single node - so, if the primary controller dies, then you wont be able to configure battery devices from the secondary device without manually waking them up.

Iā€™ll add one caveat to the above regarding wakeup - I say that you canā€™t set multiple nodes - you can however set the node to 255 which means the wakeup is broadcast to everyone. You might think that this solves the problem, however broadcasts can not be routed (to avoid loops) so this only works for devices that are in direct communication with the controller(s).

So, unfortunately, while it is in theory possible to configure the system as you describe, it probably wonā€™t allow you to get the redundancy that youā€™re probably looking for - or at least not in a simple wayā€¦

Cheers
Chris

Maybe you can use keepalived to create master slave cluster.
To start Openhab in your slave when your master crash.
And for your configuration files, you can use rsyncd to sync all files.
But for your zwave module, I donā€™t know.
Bye

Maybe a 2xrPi with shared drbd for the configuration files and a shared virtual ip?

I use proxmox and run openhab in a Debian vm.

Hi Chris,

I was thinking of setting up another raspberry pi unit with a razberry controller. Include it in the network.

In case the primary controller dies, can I just move the microsd card to the backup raspberrypi and promote it to primary controller? Would that work?

Am unable to get a backup of the razberry i have currently running and am too far down to start from scratch but at the same time I do not want to expand anymore until I have a solid backup plan in place.

Thanks

No, that wouldnā€™t work, because as I wrote, the devices store the primary controllerā€™s ID.
Thatā€™s why I said you need to backup and eventually restore it.
You can move the RaZberry board or Aeon stick when your Pi fails, of course.
But if thatā€™s the component to fail, youā€™re out of luck.
(thatā€™s at least my understanding of zwave - if anyone can prove me wrong, please do !).

how about this, I add a secondary controller to network.

if the razberry dies, I add a new one to the network and since the secondary controller is there, all I have to do is hook up the new controller to the secondary controller to make it part of the network.

then login to habmin and make the new razberry the primary controller for all my devices.

Thoughts?

might sound similar to the previous plan but in this case I am not moving the openhab.