[SOLVED] OpenHAB Server Redundancy - any Ideas?

Artyom_Syomushkin · October 1, 2015, 2:36pm

Hi there.
Assuming I have an OpenHAB Server running on Raspberry PI or some similar hardware. We all know that nothing lasts forever and in some nice moment Raspberry crashes due to SD card fault, or Power Supply or any other reason. I have all configuration files backed up, but when it happens I will probably loose hours reestablishing my home automation.
So what about the option, if I would have second OpenHAB server, running on different hardware, but with same rules. It would run in hot standby, taking control only if primary server stops working - relatively easy to implement.
The only question is binding - particulary popular Z-Wave. Is there any possibility to have two Z-Wave Masters in same network and automatic reconfiguration if one server fails?
If not - which home radion protocol allows that?

rlkoshak · October 1, 2015, 7:38pm

I’ve seen some discussion on this in the old forum (can’t find them at the moment). My impression was that it is klunky and very difficult to get working and it has severe limitations.

If I were just worried about the time to set up and configure it again in the case of a crash, I would run openHAB in a VM or something like Docker and keep my configs in something like git.

Hopefully someone who has seriously looked into this has some better advice.

ben_jones12 · October 1, 2015, 7:50pm

I recently moved my openHAB server into a OpenVZ container, for this very reason. I had one home server with all my apps and services running on it - as was evident when I moved everything to VZ containers, it would take me days and days to rebuild if there was a failure.

So now that I have done it once, I have a set of containers, running a small subset of apps/services (including OH), and they all get snapshots backed up weekly to my NAS.

If anything falls over I can quickly restore a VZ backup in a matter of minutes. No hot-failover, but still very good piece of mind!

rlkoshak · October 1, 2015, 8:46pm

Out of curiosity, what made you choose OpenVZ over the other container technologies. I usually use Docker as the example because at least in my environment it is the one most people have heard of but I’m pretty ignorant of the others.

I too currently have everything running on one server (an old laptop so at least I have battery backup ) like you did and the thought of setting it all up again from scratch after a crash is kind of scary.

Thanks

Rich

ben_jones12 · October 1, 2015, 8:49pm

Had a look at Docker but it was a bit too lightweight - i.e. they recommend each and every process should run in a separate instance, whereas I wanted to model mine more like VMs - where I have a ‘net01’ box which has dnsmasq, openvpn etc.

Plus when I was investigating the networking side of things looked a bit tricky to setup, but I never really gave it a go.

And VMs seemed a little heavy - since all apps/services run on the same OS - Debian 8. So OpenVZ seemed like the best choice, lightweight, easy to configure, and easy to backup/maintain.

YMMV.

bob_dickenson · October 1, 2015, 11:02pm

Seems to me there are a couple of issues here ( I’ve worried about both). One is crash of main computer (RPi or other) and second is automatically passing control to a “backup” server.

If you have multiple Pi 's, it’s conceivable to have the secondary also running OH and watching the primary for (lack of) responsiveness (on several levels) and, if non-responsive, cycling the power on the aberrant box (assuming you’ve got separate control paths for the boxes). If that does not work, the blocking factor there is the controller (eg z-stick gen X) plugged into the primary (Pi). I do not know of a way to have a second z-stick be a hot spare. It is not even clear if there is a way to backup a z-stick’s internal binding info for restoration. This is a single-point-of-failure.

If anyone knows how to do the z-stick internal binding backup, that would be helpful info.

Artyom_Syomushkin · October 2, 2015, 5:41am

Thank you all for replies. I know that the main problem should be in binding. Z-wave probably is not the best here, but maybe some other radio bindings allow that.
Having hot standby home server is a key element of increasing reliability of home automation. And higher reliability is needed to give more critical functions to openHAB. For example currently I’m worried of giving heating control to openHAB - if it fails in winter, I don’t know if I will be there to fix it before it becomes too cold.

mstormi · October 2, 2015, 5:49am

Well, I have the same situation (Pi, zwave), I, too, gave it a number of thoughts, and came to the conclusion that to build a fully automatic high availability system is a lot of work and will probably still not be robust, i.e. will still fail to work in some situations or may even cause trouble that you wouldn’t have without a second instance (of Pi, of the zwave controller, of openHAB, … of whatever).

For HW to fail, you can (and should) simply buy spare parts and exchange them if needed.
The by far worst case is a disk (SD card) crash, because you have to reinstall and configure many different packages from scratch. That’s tedious work and takes even longer than you imagine. My SD card crashed twice, so I know what I’m talking of, and even after getting the system to work again, I was amazed how often and longtime I kept finding bits and edges of the system that I hadn’t remembered to properly reinstall, just because that had had no immediate visible impact.

What I’m doing now is to backup the config/rules files, PLUS, after major OS or SW changes, I take backups of the Pi’s SD card on raw SD card level using a tool like dd or WinDiskImager. You can keep 2 or 3 SD cards with copies. Any of them will still work to recover your home, even if you grab one to not have the latest version. It’s just a matter of minutes, and you can even instruct someone of your family how to do it. Remember, you won’t be home when it happens, but your wife will
Just remember to always verify the latest SD backup (put it into your spare Pi and boot just once).

regards
Markus

bob_dickenson · October 2, 2015, 12:17pm

Better yet, configure your PI with a USB drive (or stick) and have everything but the core boot logic there. Still need the SD card for core boot, but everything else on a more robust medium is a good idea.

There are several pointers to how to do this on the forum.

mstormi · October 2, 2015, 1:42pm

Agreed in principle, but it can be a lot of work, too, to get there and to maintain that setup over time. So find a trade-off that you feel comfortable with. For me, this was to outsource write-intensive logging to my NAS, but keep the rest on SD. Once you don’t write that much any more, they don’t crash that often any more.

jbags81 · October 2, 2015, 2:19pm

I’ve solved some of this with Heartbeat, GlusterFS, and some of Linux HA tools. The USB device is def a sticking point. I personally use an Insteon Hub that’s network attached and have a backup device incase of failure.
https://groups.google.com/forum/m/#!category-topic/openhab/discussions/39L6C4Du7OM

I’m currently exploring 6lowPAN (802.15.4) boards and may roll my own solution which incorporates HA in its core design. Will be a while before its prod worthy though. I’m also exploring out of band door strikes, sensors and such from a hard wired, DR perspective. Not full automation, but core access and control is maintained in a DR situation.

kevin1 · October 2, 2015, 5:21pm

Honestly I think the only way to have real redundancy is to rewrite openHAB from the Spring framework to the Java EE standard framework. I suspect this is not going to happen for many reasons.

mstormi · October 2, 2015, 6:28pm

I think people tend to stare at the server itself, breaking their head how to get it ‘redundant’, and overlook other risks to availability.
One thing you should definitely take care of is the zwave controller.
Note that all zwave nodes store the controller’s ID, and if you have to exchange that one, it has a different ID, and your nodes won’t talk to it unless you reset and (re-)include them with the new controller.
Now check for yourself what that’ll mean: you have to physically access all devices again. That was fairly easy when you installed them one-by-one, but now that you’ve put new wallpaper and paint on top, will you still be able to access them without leaving visible traces ? Not to mention the work and time required to do so.
AFAIK you can have a secondary controller running, but it’ll not be able to fully take over all of the primary’s functions should that one fail.

Long story short, make sure you backup your zwave controller.
Many people still use the Aeon S2 stick without knowing that it can NOT be backed up.
The Aeon gen5 Stick is said to be or to become backupable, but I don’t know if that does work by now.
I’m using RaZberry board, which you can backup using the z-way software

bob_dickenson · October 3, 2015, 12:21am

@chris , would it be possible to setup two Rpi’s, each with its own z-stick (Gen-?) and have one of them set as SUC to handle this failover situation. Or was I “absent from class that day” ?

chris · October 3, 2015, 8:01am

While the SUC might provide the backup for the network layer, I don’t think this will actually do what you want… The SUC will (in theory!) keep track of where nodes are so they can talk to each other etc, so in this respect, it provides a backup for the network layer…

However, what you really want is the next layer up - and that is reporting of all the “stuff that happens”. So, for example, if your primary controller goes down, then the secondary controller will not be notified about things like associations. So, when your motion sensor detects motion, it will inform the controller you’ve configured in its respective association group - it won’t know that this controller is dead and it should talk to a different controller…

For associations, you might be able to work around this in some cases since there’s often the possibility to configure multiple nodes into a group, or use multiple groups for different notifications, however this is not always the case. Some devices only have the possibility to set a single node in a group (for example, the Fibaro devices have a ‘controller update’ group, and this only has 1 node).

Assuming you can get a reasonable configuration that notifies both controllers of updates through associations, your next problem is with battery devices. The WAKEUP command class which is used to notify the controller that a battery device is awake and can be configured can only be set to a single node - so, if the primary controller dies, then you wont be able to configure battery devices from the secondary device without manually waking them up.

I’ll add one caveat to the above regarding wakeup - I say that you can’t set multiple nodes - you can however set the node to 255 which means the wakeup is broadcast to everyone. You might think that this solves the problem, however broadcasts can not be routed (to avoid loops) so this only works for devices that are in direct communication with the controller(s).

So, unfortunately, while it is in theory possible to configure the system as you describe, it probably won’t allow you to get the redundancy that you’re probably looking for - or at least not in a simple way…

Cheers
Chris

mickael · October 3, 2015, 8:41am

Maybe you can use keepalived to create master slave cluster.
To start Openhab in your slave when your master crash.
And for your configuration files, you can use rsyncd to sync all files.
But for your zwave module, I don’t know.
Bye

raffaeletani · October 7, 2015, 4:00pm

Maybe a 2xrPi with shared drbd for the configuration files and a shared virtual ip?

I use proxmox and run openhab in a Debian vm.

rm65453 · October 7, 2015, 9:16pm

Hi Chris,

I was thinking of setting up another raspberry pi unit with a razberry controller. Include it in the network.

In case the primary controller dies, can I just move the microsd card to the backup raspberrypi and promote it to primary controller? Would that work?

Am unable to get a backup of the razberry i have currently running and am too far down to start from scratch but at the same time I do not want to expand anymore until I have a solid backup plan in place.

Thanks

mstormi · October 8, 2015, 3:41pm

No, that wouldn’t work, because as I wrote, the devices store the primary controller’s ID.
That’s why I said you need to backup and eventually restore it.
You can move the RaZberry board or Aeon stick when your Pi fails, of course.
But if that’s the component to fail, you’re out of luck.
(that’s at least my understanding of zwave - if anyone can prove me wrong, please do !).

rm65453 · October 8, 2015, 5:03pm

how about this, I add a secondary controller to network.

if the razberry dies, I add a new one to the network and since the secondary controller is there, all I have to do is hook up the new controller to the secondary controller to make it part of the network.

then login to habmin and make the new razberry the primary controller for all my devices.

Thoughts?

might sound similar to the previous plan but in this case I am not moving the openhab.