[SOLVED] OpenHAB Server Redundancy - any Ideas?

mstormi · October 2, 2015, 5:49am

Well, I have the same situation (Pi, zwave), I, too, gave it a number of thoughts, and came to the conclusion that to build a fully automatic high availability system is a lot of work and will probably still not be robust, i.e. will still fail to work in some situations or may even cause trouble that you wouldn’t have without a second instance (of Pi, of the zwave controller, of openHAB, … of whatever).

For HW to fail, you can (and should) simply buy spare parts and exchange them if needed.
The by far worst case is a disk (SD card) crash, because you have to reinstall and configure many different packages from scratch. That’s tedious work and takes even longer than you imagine. My SD card crashed twice, so I know what I’m talking of, and even after getting the system to work again, I was amazed how often and longtime I kept finding bits and edges of the system that I hadn’t remembered to properly reinstall, just because that had had no immediate visible impact.

What I’m doing now is to backup the config/rules files, PLUS, after major OS or SW changes, I take backups of the Pi’s SD card on raw SD card level using a tool like dd or WinDiskImager. You can keep 2 or 3 SD cards with copies. Any of them will still work to recover your home, even if you grab one to not have the latest version. It’s just a matter of minutes, and you can even instruct someone of your family how to do it. Remember, you won’t be home when it happens, but your wife will
Just remember to always verify the latest SD backup (put it into your spare Pi and boot just once).

regards
Markus

bob_dickenson · October 2, 2015, 12:17pm

Better yet, configure your PI with a USB drive (or stick) and have everything but the core boot logic there. Still need the SD card for core boot, but everything else on a more robust medium is a good idea.

There are several pointers to how to do this on the forum.

mstormi · October 2, 2015, 1:42pm

Agreed in principle, but it can be a lot of work, too, to get there and to maintain that setup over time. So find a trade-off that you feel comfortable with. For me, this was to outsource write-intensive logging to my NAS, but keep the rest on SD. Once you don’t write that much any more, they don’t crash that often any more.

jbags81 · October 2, 2015, 2:19pm

I’ve solved some of this with Heartbeat, GlusterFS, and some of Linux HA tools. The USB device is def a sticking point. I personally use an Insteon Hub that’s network attached and have a backup device incase of failure.
https://groups.google.com/forum/m/#!category-topic/openhab/discussions/39L6C4Du7OM

I’m currently exploring 6lowPAN (802.15.4) boards and may roll my own solution which incorporates HA in its core design. Will be a while before its prod worthy though. I’m also exploring out of band door strikes, sensors and such from a hard wired, DR perspective. Not full automation, but core access and control is maintained in a DR situation.

kevin1 · October 2, 2015, 5:21pm

Honestly I think the only way to have real redundancy is to rewrite openHAB from the Spring framework to the Java EE standard framework. I suspect this is not going to happen for many reasons.

mstormi · October 2, 2015, 6:28pm

I think people tend to stare at the server itself, breaking their head how to get it ‘redundant’, and overlook other risks to availability.
One thing you should definitely take care of is the zwave controller.
Note that all zwave nodes store the controller’s ID, and if you have to exchange that one, it has a different ID, and your nodes won’t talk to it unless you reset and (re-)include them with the new controller.
Now check for yourself what that’ll mean: you have to physically access all devices again. That was fairly easy when you installed them one-by-one, but now that you’ve put new wallpaper and paint on top, will you still be able to access them without leaving visible traces ? Not to mention the work and time required to do so.
AFAIK you can have a secondary controller running, but it’ll not be able to fully take over all of the primary’s functions should that one fail.

Long story short, make sure you backup your zwave controller.
Many people still use the Aeon S2 stick without knowing that it can NOT be backed up.
The Aeon gen5 Stick is said to be or to become backupable, but I don’t know if that does work by now.
I’m using RaZberry board, which you can backup using the z-way software

bob_dickenson · October 3, 2015, 12:21am

@chris , would it be possible to setup two Rpi’s, each with its own z-stick (Gen-?) and have one of them set as SUC to handle this failover situation. Or was I “absent from class that day” ?

chris · October 3, 2015, 8:01am

While the SUC might provide the backup for the network layer, I don’t think this will actually do what you want… The SUC will (in theory!) keep track of where nodes are so they can talk to each other etc, so in this respect, it provides a backup for the network layer…

However, what you really want is the next layer up - and that is reporting of all the “stuff that happens”. So, for example, if your primary controller goes down, then the secondary controller will not be notified about things like associations. So, when your motion sensor detects motion, it will inform the controller you’ve configured in its respective association group - it won’t know that this controller is dead and it should talk to a different controller…

For associations, you might be able to work around this in some cases since there’s often the possibility to configure multiple nodes into a group, or use multiple groups for different notifications, however this is not always the case. Some devices only have the possibility to set a single node in a group (for example, the Fibaro devices have a ‘controller update’ group, and this only has 1 node).

Assuming you can get a reasonable configuration that notifies both controllers of updates through associations, your next problem is with battery devices. The WAKEUP command class which is used to notify the controller that a battery device is awake and can be configured can only be set to a single node - so, if the primary controller dies, then you wont be able to configure battery devices from the secondary device without manually waking them up.

I’ll add one caveat to the above regarding wakeup - I say that you can’t set multiple nodes - you can however set the node to 255 which means the wakeup is broadcast to everyone. You might think that this solves the problem, however broadcasts can not be routed (to avoid loops) so this only works for devices that are in direct communication with the controller(s).

So, unfortunately, while it is in theory possible to configure the system as you describe, it probably won’t allow you to get the redundancy that you’re probably looking for - or at least not in a simple way…

Cheers
Chris

mickael · October 3, 2015, 8:41am

Maybe you can use keepalived to create master slave cluster.
To start Openhab in your slave when your master crash.
And for your configuration files, you can use rsyncd to sync all files.
But for your zwave module, I don’t know.
Bye

raffaeletani · October 7, 2015, 4:00pm

Maybe a 2xrPi with shared drbd for the configuration files and a shared virtual ip?

I use proxmox and run openhab in a Debian vm.

rm65453 · October 7, 2015, 9:16pm

Hi Chris,

I was thinking of setting up another raspberry pi unit with a razberry controller. Include it in the network.

In case the primary controller dies, can I just move the microsd card to the backup raspberrypi and promote it to primary controller? Would that work?

Am unable to get a backup of the razberry i have currently running and am too far down to start from scratch but at the same time I do not want to expand anymore until I have a solid backup plan in place.

Thanks

mstormi · October 8, 2015, 3:41pm

No, that wouldn’t work, because as I wrote, the devices store the primary controller’s ID.
That’s why I said you need to backup and eventually restore it.
You can move the RaZberry board or Aeon stick when your Pi fails, of course.
But if that’s the component to fail, you’re out of luck.
(that’s at least my understanding of zwave - if anyone can prove me wrong, please do !).

rm65453 · October 8, 2015, 5:03pm

how about this, I add a secondary controller to network.

if the razberry dies, I add a new one to the network and since the secondary controller is there, all I have to do is hook up the new controller to the secondary controller to make it part of the network.

then login to habmin and make the new razberry the primary controller for all my devices.

Thoughts?

might sound similar to the previous plan but in this case I am not moving the openhab.

chris · October 8, 2015, 5:18pm

The issue isn’t linked (directly) to it being a primary controller if all you want to do is to get the system up and running again quickly. The issue is that the links from the devices (associations, wakeup) are linked to the controller ID used by openHAB. So far, it’s as Markus said…

However, you can change a setting in the openHAB config called “masterController”, and set this to true to update some of the configuration. This will update the devices in your network automatically to update the wakeup (which sorts out battery device issues) and will also update the associations to point the associations to the new controller…

However, there are (at least) 3 issues -:

Wake classes will only update when the device is in direct range of the controller, and then only when woken manually.
Associations will only update if there is room in a group. So some devices have an association group with only 1 device (eg the Fibaro devices). In this case, it won’t work as the old controller ID will be configured and currently we don’t remove the existing node. This could be changed easily.
The association setting will only work for devices that are configured correctly in the database. If the <SetToController> setting isn’t set in the database, then it won’t get set automatically and you’ll need to update manually.

So, it’s not too bad to update things ‘quickly’ as far as the system is concerned - assuming there’s no other underlying issues…

There may however be other issues that means you’re better off to reinitialise the network if this happens - it’s a bit of a pain, but not too bad (a couple of hours work I guess).

Chris

rm65453 · October 8, 2015, 6:25pm

Hi Chris,

Thanks for the reply. My issue is not getting up and running quickly. My main issue is that I do not want to have to reinclude all the nodes again.

am aware that razberry allows to do backups but I have been unable to access their gui (have been trying for the last 4 days).

My main goal is to ensure that the network is intact when the primary controller dies. The entire purpose of the secondary controller would be to allow the addition of a new primary controller and then allow to update the nodes primary controller.

I understand that the primary controller id is stored in the device but i still should be able to access the devices through habmin even when the first controller dies if I have a secondary controller. if that assumption is correct then there should be no problem in updating the secondary controller to primary or adding a new primary controller to the network.

Sorry if i am missing something.

chris · October 8, 2015, 7:03pm

The system will continue to work even if you have NO controller. A secondary controller won’t help here… Where you need a controller is when the network changes, or you want to add new devices.

Yes, if you have an SUC, then in theory you can create a new primary controller, but it might be just easier to re-include your devices. In most cases, it’s not too much hassle… I see a lot of messages where people say their devices are hidden in the walls so they can’t do it - I guess this might be an issue for some devices, but all devices I’ve seen (light switches at least) allow the external switches to be used for inclusion…

I suspect that if your primary controller dies, it’s going to be a pain one way or the other. Thankfully, it doesn’t happen often, and if you’re lucky, you won’t have to worry about it

rm65453 · October 8, 2015, 7:16pm

I feel so stupid now. Forgot about the external switch lol

Thanks

shenson007 · October 14, 2015, 4:48am

Hi,
I am and running Openhab in a docker container on a 3 node consul cluster.
2 of the nodes are running gluster fs which replicates the data between the two and the docker container uses this glusterfs as its storage target (actually all of /opt/openhab is in the glusterFS)
I had this all up and running before openhab so it seemed like a good fit.

I have only been using openhab for 2 weeks and have zwave items as of now.
The z-stick is only on one server so currently that is the only server which can host openhab.
I want to add an additional z-stick as a secondary controller to another server which could serve as a failover target for the openhab container …

Anyone know if this will work?
I think openhab wont know the difference between z-sticks since both will be on /dev/ttyACM0

Sorry for the long post … and Hi all!

Artyom_Syomushkin · October 9, 2016, 10:13am

Since AEON GEN 5 Usb stick supports backups I can make a backup of it and put it on another redundant stick.
How do you think, will it work, if I will have both sticks operating in the same network on different instances of OH? We can make arrangement, that only one stick issues commands if that OH instance is active. Or it is not working in Z-Wave concept?

Artyom_Syomushkin · December 26, 2016, 2:32pm

As Z-wave stick redundancy is anyway solvable as suggests this thread: Has anyone done a successful backup of Aeon Labs Gen5 Z-Stick? - #39 by vossivossi
It’s now time to think about server redundancy again. What would be the best way to organize this, assuming that we will have two sticks in system? Ideal case would be a kind of virtual machine, which should look like single instance of OpenHAB and which would manage changeover of USB sticks.

This seems to be good Idea, if it is not too complex to implement. Could you share more about your setup? The USB stick fail-over is the only thing which has to be added here.