Advise for High Performance HA OpenHAB server hardware

xsherlock · July 10, 2020, 3:22pm

Dear, I reach to you to seek advice on the HA hardware,
I’m quite experience with OH and started on various RPI’s couple year ago, with my main house server now running inside VM on the QNAP TS-877 with 8 core Ryzen. That is quite a robust solution however it is not truly HA. I has a single powersuply and the motherboard can fail any time.

My most important objective is to create the system that will not have a single point of failure. running some virtualization software that will allow me to backup easily and revert to any state from the past in case of some “failure”

It does not have to be insanely powerfully but should be able to survive any disk/hardware fail and have redundancy on everything.

TIA
Maciej

Bruce_Osborne · July 10, 2020, 3:34pm

OpenHAB itself is not designed for Enterprise level HA.

It should run on server hardware that has redundant power supplies, network interfaces, disks, and a UPS to keep power running.

TRS-80 · July 10, 2020, 4:31pm

Well, I guess you just buy a spare motherboard then, to keep around for redundancy purposes? And/or some “server” grade hardware as Bruce mentioned?

Problem with things like that is they are generally noisy and hot and use a lot of power. Fine for datacenter, where fan noise doesn’t matter, but not so much for something you might have running in your bedroom or other room of your home.

So for me, I have been very happy with using various Single Board Computer (SBC) which are more powerful than RPi. They give you more power and stability than RPi, whilst still being low power (compared to x86) and also less expensive. A good starting point to investigating these boards is Armbian Supported Hardware list.

Circling back to the “less expensive” aspect, and how that leads to redundancy. It is much more feasible (for me at least) to keep a few of these SBC running, both in terms of initial purchase price as well as ongoing power usage, than similar x86 based solutions. And therefore leads naturally into your desired redundancy.

In my case I am even doing a more “distributed” sort of architecture, with MQTT broker (Mosquitto) running on an older (and less powerful) Cubietruck, separate from my main OpenHAB installation which is running on more powerful ODROID-XU4. This way some things can still run (via MQTT) even in case OpenHAB (software or hardware) should go down for any reason.

If you use some Wi-fi capable outlet (e.g. Sonoff / Tasmota / Tuya devices, etc.) to plug your SBCs into, you can even remotely power cycle your OpenHAB (or other) SBC should any of them become completely unresponsive (you can no longer ssh in, etc.). I suppose it would even be possible to automate this somehow with watchdog services on your LAN, although I have not got this far yet as so far haven’t felt the need.

mstormi · July 10, 2020, 6:48pm

Step back a second and think how relevant that really is.
You need to have a spare set of every HW component ready and on location, that quickly gets you into the 4 digit range, just to cover a very unlikely case (far less likely than a power outage for example).
If you really want to double all HW, running on a Pi and having a spare unit ready is the only reasonable solution to this.
Or why not keep running on QNAP but keep a RPi ready to jump in in case of emergency.

Bruce_Osborne · July 10, 2020, 6:50pm

Only if you also have an external configuration backup.

My post was summarized in the first line but that sometimes is not enough for some people…

mstormi · July 10, 2020, 6:52pm

Err yes, my post was directed at the OP not you sorry.

rpwong · July 10, 2020, 7:22pm

Out of curiousity, why is this your highest priority? You’re getting a bunch of answers saying that it shouldn’t be (and I tend to agree), but I don’t want to make assumptions as to why it’s your main concern.

I’m also in the camp of “UPS, regular backups and a spare RPi in case my system fails”, but I could live with openHAB being down for a period of time.

Bruce_Osborne · July 10, 2020, 8:16pm

It sounds like you want an Enterprise system. I manage systems of load balanced servers in 2 different data centers, load balances with diverse Internet connections.

I work with another system that is designed so it can be patched or upgraded without any loss of client services. A large customer told the large vendor they needed a system with zero downtime. Of course, it comes with an appropriate price tag.

My home system is nowhere as robust. Just a VM on a server that is backed up by UPS power enough to gracefully shutdown.

broliyoung · July 10, 2020, 10:52pm

what about a kubernetes cluster? 3 arm sbc and you can have your cheap HA. I only have some dubts about peripherique device like zigbee or zwave dongle… I don’t know if is possible to mount one per board and use it when the container move through the worker. I’m not suggesting, I’m asking

Actually for this case I’m looking for an SBC more specific. I don’t need WIFI, Bluethooth, 2 HDMI etc.
I’m looking for 2/4GB ECC ram, 1 M.2, 4 USB, 1 Gbs eth and a complete powercycle control. I think this will be the year for ARMs board and maybe we will see a most specific board for domestic servers

Gad_Ofir · July 11, 2020, 5:10am

i have two software approaches for you …

Proxmox - you can have a cluster , but i think you will need 3 nodes
docker - you can backup to dropbox and use a recovery script , that will take somthing like 15 mins
https://www.youtube.com/watch?v=a6mjt8tWUws

Good luck… as above pepole said… its too much work to have this kind of system for home use
i have a spare machine and backup, from time to time

mstormi · July 11, 2020, 8:01am

A typical IT guy’s approach … doomed to fail in Home Automation.
High Availability is not about hardware, it’s about availability of services.
You correctly named dongles to be a potential point of failure, but to get everything to reliably work fully automatic means a LOT of more work and adjustments on the logical layers.
That ultimately just isn’t worth the effort.

xsherlock · July 12, 2020, 12:11am

As for possible failures I most fear, (and want to be protected) are any HDD/SDcard failures or some random OH corruptions. (I had plenty of those on OH2.1 to 2.3). The second one can only be fixed quickly and remotely by recovering a full VM backup to the last night version that was fine.

A cluster of SBC’s is tempting as it would be very cost effective, but I’m afraid that setup of it would be hard and a nightmare to maintain. What would be the course of action if a SDcard or eMMC on the SBC dies? Can a cluster automatically rebuid itself when a new clean SBC is added?But A triplet or 4 of Odroids N2 should be quite good for the task if there would be some nice foolproof supervisor software for it.

Qnap VM station is working reasonably well for that, but every time I upgrade it I have a hard time praying it will restart the VM fine.(to the point I just stopped doing that) Backups tend to slow down the VM with openhab to the breaking point . Sometimes (once in 3 month) I wake up in a house with all lights ON as the backup somehow crashed the OH and something timed out (that is a tinkerforge binding issue I guess). Apart from that GUI is easy to use and it works nice with that piece of mind that I can recover the whole VM. The hardware I have now has a single PSU only but some higher end units have 2.

Unfortunately I have little experience with WMware but I guess it better in backup and VM recovery.
Is anyone selling a turn key VM appliance with WMware?

Maciej

mstormi · July 12, 2020, 10:06am

Then your post asking for HW redundance was a bad move.
It made us victim to the XY problem. Pay more attention next time please.

That being said, go openHABian on a RPi. It has the ZRAM feature to mitigate SD wearout problems. It also has its own backup system that you can and should use to create a spare SD card.
A dedicated system (even if it’s just a RPi) is preferred over units to share functionality with other systems such as a NAS (even if they’re hypervisors).

igorp · July 12, 2020, 1:21pm

It helps, yes, but ZRAM is still just a hack which doesn’t change the fact that RPi 1,2,3,4 does not have an option to have a reliable storage. Attaching SSD via USB is better, but USB storage is sadly not among reliable connections.

Starting with Openhabian is fine - since it have a backup tools (not needed if you know what to tgz and in theory not needed if you plan to make HA setup which should in theory never fail if well designed) or with vanilla OpenHab on Armbian supported board that has reliable storage: eMMC or better, eMMC/SPI + NvME drive. ZRAM is only for performance reasons and can be disabled. On Raspberry Pi this leads to disaster:

Also possible perhaps completely virtual on some server grade hardware which has more redundancy levels, but probably cheapest is to have two or three devices (with reliable storage) running the same instance and some watchdog software to switch good for failed when failure occurs … master and several slaves or primary and several replicas in today’s political correctness

Bruce_Osborne · July 12, 2020, 1:32pm

The Pi was not designed for your application. It was designed to teach computer hardware & programming. It does that very well. There are many kinds of disk RAID technologies that may help but you are still constrained by:

1 Internet connection
1 USB dongle Z-Wave & Zigbee do not support redundant controllers/coordinators)
1 Power feed to the building
1 User who understands the configuration
1 location that can be burned down

igorp · July 12, 2020, 2:16pm

That is what propaganda is telling. And propaganda is almost always a complete bullshit. Why would in this case be different? Even they themselves admitted that reality was completely different. Most of the Rpi’s are not used for educational purposes and those which are …

Can you master things when they are too easy or do you master things when you are facing hard challenges? Rpi users are degraded to consumers which have to buy things for it, install some app on it, assemble things with step by step instructions. Everything they purchase is near to plug and play while outside, in real Linux, things are much more rough. But they have a good feeling that they learned something … they assembled, installed, purchased.

Most of Rpi users that started to learn Linux with Rpi doesn’t move anywhere - they don’t need to. Most of them associate Linux with Rpi and all the good things that were done by community for free with Rpi (this an that app runs on RPi, but in fact it already ran on Debian 20+ years ago … ). Like there were no world outside Rpi. That what my experiences tells me …

Anyway this is off topic and goes out of this forum.

Rpi storage is broken beyond repair and RAID can’t change that. Chip has PCI controller but Upton thinks you are hooked enough that its possible to sell you yet another versions of “educational” tool … Who would be such idiot to not buy Rpi5 or 6 which will perhaps have a reliable storage option?

I suspected that, but I was not sure. This means you have to look for different radio technology in a first place when thinking about HA.

Bruce_Osborne · July 12, 2020, 2:37pm

That “propaganda” is from the designers. I hope you are not going to look for hardware support for your Pi

Use an Intel NUC or some other computer designed more for your use. The Pi people made design decisions. It is not broken for their initial primary application and is has sold very successfullt.

igorp · July 12, 2020, 3:34pm

And?

Propaganda usually comes from those who are trying to sell you something. True, fake or something in between.

Why would I need a hardware support for hardware that I have no intention to use and I know very well anyway? I know how things are made and how they function, otherwise I would not be able to talk about this. You don’t have those troubles - you talk and persuade me even you have little to no knowledge about the topic.

Thanks for your advise, but don’t worry about me. I use knowledge and experience to determine which hardware is best for what purpose. Factory promo videos are white noise.

HA is high-end service deployment and should not have a slight association with a hardware that is in their primary application not broken but broken in just about every other. OMG, where this world went to?

Artyom_Syomushkin · July 12, 2020, 8:19pm

I think the best solution for “no single point of failure” is to have a cheaper, but fully duplicated central controller in cold backup. For example in my Openhab installation I use a RPi with Aeon Z-Wave USB Stick. I have cloned it completely to another RPi + USB Stick. Now I have couple of cheap Wi-Fi watchdog relays which keep only one RPi powered on all the time, while another one is kept off. When active RPi fails for any reason, watchdog relay trips and turns it off, while switching backup RPi on.
In worst case I have something like 5 mins break in service.

The total availability of such solution is:
If one RPi would have a 99% availability (e.g it has MTBF of 8760 Hours and I need 3 days to get another one, which is quite pessimistic scenario), the probability of total availability of two redundant RPis is 1- (0.01*0.01)= 99,99%. E.g. less than one hour per year. You can sleep well.

rpwong · July 12, 2020, 8:30pm

I agree. Let’s just acknowledge that some folks like the RPi and some folks don’t, then move on with our lives. The points for and against have been made in this thread and others, and there’s no need for us to keep debating them every time the opportunity arises.

That’s an interesting solution. How did you implement the WiFi monitoring? And how do you ensure that changes are synchronized to the backup (particularly with two ZWave controllers)?

I’d note that if you already have the spare RPi and make backups regularly, you’re only talking about a few hours of downtime. However, this would be great for anyone who travels regularly and has a lot of scheduled automation (e.g. sprinklers) and/or sensors (e.g. water leaks).