[SOLVED] OpenHAB Server Redundancy - any Ideas?

The issue isn’t linked (directly) to it being a primary controller if all you want to do is to get the system up and running again quickly. The issue is that the links from the devices (associations, wakeup) are linked to the controller ID used by openHAB. So far, it’s as Markus said…

However, you can change a setting in the openHAB config called “masterController”, and set this to true to update some of the configuration. This will update the devices in your network automatically to update the wakeup (which sorts out battery device issues) and will also update the associations to point the associations to the new controller…

However, there are (at least) 3 issues -:

  • Wake classes will only update when the device is in direct range of the controller, and then only when woken manually.
  • Associations will only update if there is room in a group. So some devices have an association group with only 1 device (eg the Fibaro devices). In this case, it won’t work as the old controller ID will be configured and currently we don’t remove the existing node. This could be changed easily.
  • The association setting will only work for devices that are configured correctly in the database. If the <SetToController> setting isn’t set in the database, then it won’t get set automatically and you’ll need to update manually.

So, it’s not too bad to update things ‘quickly’ as far as the system is concerned - assuming there’s no other underlying issues…

There may however be other issues that means you’re better off to reinitialise the network if this happens - it’s a bit of a pain, but not too bad (a couple of hours work I guess).

Chris

1 Like

Hi Chris,

Thanks for the reply. My issue is not getting up and running quickly. My main issue is that I do not want to have to reinclude all the nodes again.

am aware that razberry allows to do backups but I have been unable to access their gui (have been trying for the last 4 days).

My main goal is to ensure that the network is intact when the primary controller dies. The entire purpose of the secondary controller would be to allow the addition of a new primary controller and then allow to update the nodes primary controller.

I understand that the primary controller id is stored in the device but i still should be able to access the devices through habmin even when the first controller dies if I have a secondary controller. if that assumption is correct then there should be no problem in updating the secondary controller to primary or adding a new primary controller to the network.

Sorry if i am missing something.

The system will continue to work even if you have NO controller. A secondary controller won’t help here… Where you need a controller is when the network changes, or you want to add new devices.

Yes, if you have an SUC, then in theory you can create a new primary controller, but it might be just easier to re-include your devices. In most cases, it’s not too much hassle… I see a lot of messages where people say their devices are hidden in the walls so they can’t do it - I guess this might be an issue for some devices, but all devices I’ve seen (light switches at least) allow the external switches to be used for inclusion…

I suspect that if your primary controller dies, it’s going to be a pain one way or the other. Thankfully, it doesn’t happen often, and if you’re lucky, you won’t have to worry about it :relieved:

I feel so stupid now. Forgot about the external switch lol

Thanks

Hi,
I am and running Openhab in a docker container on a 3 node consul cluster.
2 of the nodes are running gluster fs which replicates the data between the two and the docker container uses this glusterfs as its storage target (actually all of /opt/openhab is in the glusterFS)
I had this all up and running before openhab so it seemed like a good fit.

I have only been using openhab for 2 weeks and have zwave items as of now.
The z-stick is only on one server so currently that is the only server which can host openhab.
I want to add an additional z-stick as a secondary controller to another server which could serve as a failover target for the openhab container …

Anyone know if this will work?
I think openhab wont know the difference between z-sticks since both will be on /dev/ttyACM0

Sorry for the long post … and Hi all!

Since AEON GEN 5 Usb stick supports backups I can make a backup of it and put it on another redundant stick.
How do you think, will it work, if I will have both sticks operating in the same network on different instances of OH? We can make arrangement, that only one stick issues commands if that OH instance is active. Or it is not working in Z-Wave concept?

As Z-wave stick redundancy is anyway solvable as suggests this thread: Has anyone done a successful backup of Aeon Labs Gen5 Z-Stick?
It’s now time to think about server redundancy again. What would be the best way to organize this, assuming that we will have two sticks in system? Ideal case would be a kind of virtual machine, which should look like single instance of OpenHAB and which would manage changeover of USB sticks.

This seems to be good Idea, if it is not too complex to implement. Could you share more about your setup? The USB stick fail-over is the only thing which has to be added here.

@shenson007 - could you please describe your setup - I’m greatly interested in having some kind of fail-safe cluster running openHAB + NodeRED + Mosquitto combination with redundant USB sticks.

Just wanted to raise this topic once more. I would like to increase my OH server reliability by adding redudnancy. E.g I want to have two Raspberries running OH instances and communicating each with my z-wave installation using Aeon USB sticks. I’ve already tested that sticks work just fine in this setup. So all I need to know is how to setup redundant Openhab. Is it perhaps possible using virtual machine or containers?
And what I also would like to know - how would it be possible to monitor my OH installation remotely - e.g if it still runs or died already?

I was researching redundancy of the RPI’s but settled on virtualization as it seems way more resilient and the hardware with RAID storage is widely available.
I had a first SD failure after 6 month of use (80 sensors, persistence on every change, 1Gb/year of DB data)
Now I’m on the High Endurance SD card that is suppsed to be 10x more resilinet an already clock the 18 month on it. But It will fail eventualy. So I decied to move OH2 to VM on a RAID 5-6 hardware.

After couple of years want to bump up the topic. My original Rpi2 with Z-wave stick still works fine and I have another one with cloned stick as backup in case of failure of the first one.

So my Idea is finally materialized in quite simple cold standby scheme:
Each of my redundant RPis connected to power via watchdog relays. One Rpi is powered up and another one is switched off. Active RPi shall continuously trigger these watchdog relays, so that the relay which powering it stays ON, and relay which powers backup RPi, stays OFF. If active RPI fails, after a while the watchdog relays change over and this RPI is powered down and backup powered up. It will boot up and take control over Watchdog relays.

I don’t need any state sharing between redundant RPis, so that should work fine. I only need to find such watchdog-enabled Z-wave or MQTT relays.

How is my idea?

why not just have both pi’s always on and sending MQTT messages back and forth with one as a master the other as a slave. If the master dies, the slave takes over?

Having both RPis always on complicates things a lot, as it will cause IP and Z-wave conflicts, and also won‘t give much reliability, as slave RPi will be also powered on.
So powering off is better for me

Could you please describe in detail how you set your Master- Slave OH configuration ?
I am trying to do something similar but with a Pi3 as Master (running openhabian), and an Ubuntu VM in ESXI (also to run openhabian) as Slave.

I am new to OH so please treat me as a complete idiot in your description

Finally implemented.

I bought two Shelly smart plugs. One configured as ON at power on with Auto-OFF timeout set to 5 mins. To this Plug my primary RPi is connected. Another Shelly configured as OFF at power on with Auto-ON timeout of 5 mins. To this Shelly backup RPi is connected. Both RPis have own USB Z-Wave sticks and are absolutely clones of each other with one small difference in rules.
When both plugged in, primary controller is powered on and boots up. During operation it runs a watchdog rule, which checks that Z-wave comms are working (messages from sensors appear periodically) and sends MQTT messages to Shellys every minute, keeping first Shelly ON and other OFF.

If for any reasons primary RPi fails, after 5 minutes latest the plugs change state, cutting primary RPI’s power and powering up backup RPi. It just boots up and resumes home automation operations with one difference - it doesn’t trigger relays any more and also starts sending emails to me, notifying that backup controller is in action.

So changeover takes something like few minutes, but it’s absolutely fine, as I don’t expect it to happen more often than once per year.

System can be improved even more, for example falling back to primary Rpi if backup also fails (as most likely primary was just hanging up) , but I don’t have plans to implement this.

This might be annoying for maintenance works. If you stop the OH instance on the primary Raspi you will only have 5 minutes before you finally lose power…

You are right, the maintenance mode requires some precautions, which would disable the watchdogs. But this is normal for such kind of installations

Sure. I am just curios to know why you implemented this kind of installation. I am using a relatively large Z-Wave network (>130 nodes) in my home with 3 controllers (3 network home ids) which are connected via ser2net to OH (serial protocol over ip). I never had any trouble with a Z-Wave stick, and never had any trouble with an OH instance which is directly connected to a Z-Wave stick (using local USB port). My only problems so far resulted from temporary unstable LAN/WLAN network connections between the sticks and the OH Server.
Did you ever have the problem that your OH Server with a locally connected Z-Wave stick stopped working?

It’s not about problems. I was running OH 1.8 on RPi2 with Aeon Z-wave stick for 4 years without any trouble. Last year I migrated to Rpi 3 and Openhabian 2.4. and it‘s also stable.
The reason of redundancy is that every HW and SW fails and it’s only a question of time. I don‘t know when it happens - maybe next month, maybe in 10 years, but I want to be sure, that my system has a chance to resume operation without manual action