What's your HA reliable architecture ? Mesh + central controller?

Fortunately, I am not a teacher :sweat_smile:

To be simple : I want a solution as simple as possible, with lighting dimming capability, and not have the whole lighting system failing because of a gateway/broker fail

Just to jump off on Markus’s great summary, you have to ask yourself how much does it cost to reach your desired availability? If you have four 9s of availability, something that I think is achievable with battery backup for critical systems and a relatively simple backup and restore approach, you are talking about 52 minutes a year of down time.

How much time and equipment is it worth to bring that time down even more? Let’s say it’s only three 9s. That’s still less than 9 hours of down time over the course of a full year. If you’re spending more than a couple of working days on this you are wasting your time (unless you really like doing this sort of engineering in which case have at it). The cost of the mitigation is higher than the cost of the risk.

And I can say with my own personal experience and experience shared by others, it’s very much possible to achieve between three and four nines of availability with just am UPS and a backup and restore system, perhaps with some spare hardware just in case the house gets hit with a surge or something.

So spend your time where it makes a difference. Create an architecture that degrades gracefully where you need it to. For the rest, well let it be offline for half a day once a year.

Many thanks @mstormi and @chris for your advices.

I have already a UPS and a backup and restore system.
I will take a look at how to create an architecture that degrades gracefully.

That’s all it takes? Most technologies can do local dimming (i.e. the basics work without the central controller): ZigBee (“Hue”) and ZWave can do via RF or with a locally wired input switch. Most wired systems probably as well.

The reason I suggested this in the first place. Even if my controller (openhab) goes down the light switches still function with dimming/on off

My heating (Drayton wiser) still functions without openhab


In Zigbee world you would have two solutions for this:

  • use a dimming actuator connected to your existing switches and being capable to dim conventional lightbulb/LEDs without gateway
  • use a Control Unit controlled by your existing switches and create a direct binding to the smart light(Zigbee Light Link), omitting the need for a gateway.

The challenge is that if you want to add intelligence on top but still keep the way of simpler controls by wires - it makes the whole system very complicated. This is the same challenge they have in cars. For example to make drive-by-wire system, but still have ability to control the wheels mechanically. This creates all those complex coupling/decoupling mechanisms and associated problems.

So in case of amateur home installation if you do it wrong most likely you will get more problems, than benefits. And architecture selection here is most critical thing.

So that’s why I’m a supporter of completely smart home approach - e.g. you control your home only by controller and do not override it or backup by wires. So yes - if your controller goes down, the light will not work.
In this case you should concentrate on increasing the reliability and availability of controller. I recommend a centralized system and not a mesh. Wireless actuators and sensors are not a problem - they fail independently, so it will mean that in case of unreliable link you will loose just one lamp - which is anyway redundant. But in case of central controller is much easier to manage failure modes and recoveries.

As for backup + restore process vs. redundant hardware - the first will not work in the above approach - the controller could fail when nobody skilled is at home (and most often too), so nobody will be able to switch on the light or power up backup controller. Thus you need redundancy with automatic changeover.

Which is not a complex thing at all, regardless of what others saying: you will just need a clone of your main controller, which can be as simple as Rpi+cloned USB Z-wave stick. And then you buy a couple of cheap Wi-Fi or Z-wave relays - I use Shellys for example. You power your main and cloned controller via these relays and setup them as a simple watchdog relay - one normally off and another normally on - e.g. main controller, when operating, will periodically send on command to one relay and off to another one to reset them. This will keep them in these states as long as main controller is live.
When main controller fails for any reasons it will stop sending the reset commands. The relays will switch and the main controller will be powered down and at the same time backup controller will be powered up. This process will take just few mins and your smart home functionality will be recovered. Yes, you will loose last states and persistence, but this won’t be a big issue - your house will just restart. And your backup controller doesn’t need to be so sophisticated as main one. It can be less powerful and perform just basic functions of the main one - e.g no sophisticated scenario activity, google calendars or whatever. So you can save HW cost.

Simple, isn’t it? If it doesn’t sound like this, just ask - maybe I didn’t explain it clear enough. IMHO this is much less effort, than investing into wired installation and solving the conflicts of local/remote control. I have a flow in Node-red which manages this process, which I can share.

That’s pretty much wrong.
Sorry to step in but you’re recommending to go for a system that has serious flaws by design.
It’s fine to use that in your own home but you should not recommend it to others, let alone beginners.

First, it applies to all-via-controller setups only that only you advocate - noone of us does. Some of its flaws see our posts above (and many other on the forum, all of us could give lengthy talks on that).

Second, there’s many risks and pitfalls in building failover systems . You need to always keep configurations in sync - let alone that hot standby or active-active versions would even require to keep the OH state consistent. That’s a highly complex and difficult thing and the opposite of K(eep) I(t) S(imple) S(tupid).
Being a cloud & data center architect by profession I know what I’m talking about and why I don’t want to replicate that in a home owner’s context.

Third, with proper preparations, you can have others replace failing hardware.
That’s right what the auto backup function in openHABian was built for.
Let alone it isn’t urgent anyway to get the replacement as lights will still work (and not to mention the hassle you will have with any DIY-controller-only solution if you wanted to sell your home one day).

1 Like

My daytime consultancy is connected to HA firmware on equipment you might use in your datacenter - broadband routers. At least the device I am supporting is - from the configuration perspective for customer identical as a single device. When you have a HA pair you just get some status of operation and alarm in case of HA breaks for some reason.

Ofc this is a complicated problem, especially with community development only. You will also need a dedicated hardware to have a controlled environment, so complexity escalated pretty quickly 
 but I think it’s worth pursuing this goal. Already having a dedicated hardware for saving support resources should be high on the list.

Markus, we always keep discussing this topic over and over.

I’m a professional designer too, but of embedded real-time control systems for critical applications, which are far closer to home automation than cloud and data centers. And in my systems redundancy is often built in. You don’t have to explain me what is sync and failover.

If you want to talk about flaws, risks and pitfails - please let it do here. I’m sorry, I’m not too often on this forum, because my redundant OH installation just works months and years without any attention, and BTW I donate to OH foundation 1 euro for every month of trouble-free OH operation, even if I don’t use OH cloud or latest version.

For example keeping configurations in sync or syncing states. Yes, this is required if you want to have a so-called bumpless changeover - so that user doesn’t see any difference when changeover occurs. And this is a complex thing, I know. But we don’t need it in HA. Really - if your main controller fails and light goes off, you just touch the panel and switch it on again - what is the problem? Also almost all home automation features are self-recoverable - e.g. they don’t need state synchronization. If your home controller just reboots during the day or night - will you need to perform any manual steps? I don’t need to.

Also in your proposal backup+restore you don’t have any sync between HW either, so in this case my method doesn’t differ at all from yours, except, that my system will do a changeover automatically without any user interaction and within 10 mins at latest.

So please, let stop talking about authority or wrong/right. But just tell me about design flaws in my setup - two identical cloned controllers with cloned z-wave sticks, two watchdog relays. First is on, second is off. Fail - first goes off, second goes on. Keep it simple, stupid.

Really? My light switches report to OH when they are changed. So openHAB always knows the states of the switches. I don’t have to deal with anything complicated. Maybe I just don’t understand what people are talking about but if I install a Shelly 1 behind a wall switch, for example I just have to hook it up to the existing wires and the existing wall switch an now I know the state of the lights at all times, and I can control them from the switch or from openHAB. And because openHAB can control it, the light can be automatically controlled with anything as simple as “turn on the light at 15:00” to using Christoph’s Design Pattern: Bayesian Sensor Aggregation and beyond.

Why is having the physical wall switch so much more complicated? There are no “redundat” wires or alternate control paths or anything like that. Just a smart relay and the wires that are already there.

As I’ve already said, except for controlling color (and that’s even starting to change) there is nothing automation wise that I can’t do with smart switches that is possible with a wireless configuration. And then the redundancy of manual control is built in.

I honestly don’t care what people ultimately do with their own system. That’s their business. But I do push back when people assert “X is really complicated” or “Y is really hard” when in my experience that is not the case.

2 Likes

Sorry, but as far as I know Shelly doesn’t report the status of it’s switch input. So you basically don’t know in which position your switch is now. You can only see the status of the relay. So doing many z-wave relays as well and this is a problem or local/remote - OH will always override the switch action. Imagine if the lamp is also controlled by motion sensor over OH. What switch can do in this case?

Mine must be doing magic then because I get status when I physically flip the switch back in OH in milliseconds. The same for my Zwave and Zigbee switches and outlets and smart plugs.

I always know what state my smart switches are in. I’ve never had a case where the light was ON but OH thought it was OFF or the other way around.

I’ve implemented this exact case, and more.

If it’s the right time of day, when motion is detected and the light isn’t already ON, openHAB turns it ON by flipping the switch to ON. If someone physically flips it OFF, openHAB gets that the light is now OFF unexpectedly (i.e. manually flipped) and understands the person doesn’t want the light to respond to the motion sensor any more so it overrides the motion sensor events and stops turning the light back on, for a time of course.

Whether openHAB turns on the light or a person flips the switch, the Item representing the light changes to ON, corresponding with the state of the light the switch controls.

I did not say that I want to keep wires.
For a dimming feature, or other features that implies to have a control element directly located on the light element, it seems simpler to not have wires to control the light element. That what i spell “advanced installation” in a previous post.
In setup with only On/Off controlled light, it seems simple to have a smart wall switch that power On or Off the light element with existing wire.
When I said keep wires, it was on in it specific case.

From what I understand, @mstormi and @Artyom_Syomushkin are talking about the same thing. There was a misunderstanding about states backup.

The automatic backup solution seems to be a good feature in a smart home.
The way @Artyom_Syomushkin implement it seems to me very interesting and simple to implement. I will take some time to implement it. Thanks for this idea.
Like have said Artyom, the central controller could fail when nobody skilled is at home, so if there is no way to use lights without the central controller, nobody will be able to switch on the lights. Lights and other important elements must be able to work with or without the central controller. The central controller should just add more features via rules and scenario, but should not add fail risks and reduce the availability of lights.

My thinking at this point is as follows :

  1. For simple elements like wall switch + light bulb :
    Add a controller back of the wall switch, to have a smart wall switch. This local controller actuates a relay that power On or power Off the light, using existing wires. This local controller can be triggered by the wall switch wired on an input of the controller, or wirelessly via a central controller like OH. Every time it is triggered (and why not at a specific interval too), it sends his state to the central controller. That is what @rlkoshak use. This way, lights can be controlled, even if the central controller fail.

  2. For advanced elements, like dimmed lights with dimmer switch :
    Add a controller back of the wall switch to have a smart wall switch, and add a controller on the light element (like Hue).
    Like above in 1), the smart wall switch can be triggered locally by the wall switch or wirelessly by a central controller like OH.
    Dimming value can be transmitted from central controller or from smart wall switch, to the light element. It should be able to be transmitted even if the central controller is down. In this case, avoiding a single point of failure is highly advisable. Using a star network with central gateway or MQTT broker is not advisable. Mesh network like Zigbee, Zwave or assimilate seems to be the way to go. Power on and power off feature can be realized like in 1), or via this robust wireless communication protocol.
    States of elements are transmitted on every change, like in 1), so central controller always know who is on and who is off.

  3. For convenience, a failover solution with a second controller could be implemented as Artyom suggested. To keep it simple, this second controller do not synchronize states, bu is just synchronized when upgrades or modifications are made on the setup. Like adding or deleting a element (wall switches, lights, etc).

  4. Bonus point :
    I have started to think about gracefully degradation. As I will mainly go DIY on all the elements, it could be interesting to add a forcing power button or a forcing dimming button on advanced lights elements (those with controller directly located on light, and controlled via mesh network). This way, even if central controller + smart wall switches + mesh network fail, lights can be always powered. It is less convenient to use, but it avoid a complete fail.

Those 4 points allow :

  • Not to have a single point of failure.
  • So to have functional nodes and elements in case of failure of the OH central controller.
  • To have degraded operating state easily usable by non-qualified people (point 4)).
  • Keep the whole installation simple. It just needs a 1 or 2 Pi with OH and some Zigbee/Z-Wave elements (and some DIY skills for advanced elements
)

What do you think about this strategy ?

All good points. Its nice to see people discussing reliability and redundancy.

I use tasmota on as many of the devices I can. It allows you to control things at the device so I have a distrubuted control system. If anything breaks only that thing will not work.

Mqtt is used to connect all devices and if a device goes offline it will try and use http. openHAB is the puppet master.

As advanced users we can do anything we want. For the home enthusiast I recommend using a Rpi running openhabian using sd card mirroring and if you have a ups or power brick run them off that.

Next step would be second pi with mirrored sd card. Static ip’s in config and a changeover switch so you can only power 1 pi at a time. Yes the SD card will need to have a working config on it. Maybe pull a backup from github.

Anything after this is a waste of TIME as it will take you longer to setup and test that the expected downtime.

1 Like

There are "smart " Dimmer switches too. I highly recommend looking into them. Only if you need to control the color of the bulb would you need a setup like you describe here. As with the ON/OFF, these report the current Dimming level. There is some added complexity though for certain devices around whether or not their return to their previous dimmed stated when OH turns them ON or not. You’ll have to read up before you buy something to find out what it’s behavior is and whether it’s acceptable for your use case.

But for all intents and purposes, the Dimmer case is exactly the same as your 1 case.

Why if you still have the central point of failure in openHAB? You’ve not actually solved your single point of failure problem. You either need to make the switch talk directly to the light (which is sometimes possible) or you still have the single point of failure and you’ve limited your choices with no gain. And if you come up with some redundant way to have a hot swappable instance of openHAB, why can’t you do the same for the MQTT broker and your WiFi AP hardware?

I’ve found in a lot of cases, both professionally and personally, if your system degrades gracefully you often don’t need to avoid the single point of failure. The system can limp along in a degraded state long enough for the failed part of the system to be restarted or replaced. Conversely, if you are going to go full on redundant everything, than why worry about degrading gracefully? Striving for both can in some cases add complexity.

Gotta have two Pis for redundancy with some sort of swap over approach and that comes with added complexity in keeping the two in sync as you tinker, adjust, and build your home automation.

1 Like

Yes, no and yes 
 I don’t question your motivation or competence but frankly I’m tired of discussing it.
Tired of the past year, of Corona, of the last days to get OH3 ready.
The central controller is a complex computer that has many states and communications relationships and dependencies. It’s not a simple relay that can only be on or off as you have put it, it’s a server rather than an embedded controller. Think networking (MAC and IP adresses). Both servers need to be network connected to sync and take the active role at any time.
And this is not a car you can drive to your repair shop once a year to have its controllers updated.
This is about continuous change of the home going on while you (and others, usually) live in and use it.

But you can use retractive switches? Even if openhab is down, the likes of fibaro dimmers can still turn on/off and dim, just not colour without openhab and scenes. And you never need to know the position of the switch as they’re retractive. Keeps the missus happy. And if you were to use say a hue bulb, then you just set the default power on behaviour to come on at full power warm white (in case it was blue when it failed)

Great. So you get back the status of the relay and not a switch. Agree, this is working. But if your OpenHAB controller fails, your system returns to strictly manual mode - e.g. your motion sensor will just stop working. But what if you need to have it working? Most people expect the direct interaction between motion sensor and lamp (e.g. as in mesh) when OH doesn’t work. Obviously this is not possible in your scenario, right? Same with heating for example. Your heating should be running and controlling your house temperature even when OH has died. You can’t bring it to fully manual mode, you still need a kind of sensored temp control and this will require a smart thermostat. But when OH is live you want to have more sophisticated control, and direct “Mesh”-like interaction will just interfere with your main controls. How would you resolve this conflict?

As a side note about switches - yes, you can add them anywhere to have the means of backup control. But I would need dozens of those if I want to have backup for all actuators - for example in my livingroom I have 7 Roller-shutters. Switch for each of them? Ha-ha!

Most certainly no they do not. The system is in (gracefully) degraded mode then where all they expect
is that flipping on the switch will turn on the light.

It still is if you want to. If you use e.g. ZWave sensors as @delid4ve suggested, you can set them up to send commands to both, an actuator and the controller so that’ll still work when the controller is down. KNX would allow for this too I believe.