Advise for High Performance HA OpenHAB server hardware

rkrisi · July 16, 2020, 11:06am

Anyway I wanted to achieve something similar, maybe someone did something like this?

I have several RPis in my house which most of them do nothing, but I need them because they serve other important purposes (like print server, we use it not that much but it is always needed - and the printer is at a dedicated location).
Would be great if I could install openhab on one of this and it will act as a hot-fallback device.
What I will need to achieve and I don’t really know how to do this:

Daily config copy (simply create an openhab backup and copy it to the fallback device, unzip?)
Detect if the main server openhab instance is not running (not the whole computer is down, only openhab - this would also make maintenance on the main server easier and without disruption). Maybe just check if the basicui loads or the rest api is accessible or how?
Switch to the backup server if the main is not available. Start openhab and somehow redirect openhabian.local?
Switch back to the main if the openhab on that machine is up.

Someone has some ideas for this?

Thanks!

higgers · July 16, 2020, 11:14am

I also use Proxmox on three hosts, one four core Xeon and two small Mini-ITX mobos. The VMs discs are all created as ZFS volumes and I use Sanoid and Syncoid to send volumes between hosts. Syncoid sends ZFS volumes between hosts and Sanoid creates policy based snapshots. Snapshots can be specified like this:

[template_production]
        frequently = 0
        hourly = 0
        daily = 15
        monthly = 0
        yearly = 0
        autosnap = yes
        autoprune = yes

In the above example there will be 15 days of daily snapshots.

Syncoid is very useful at reducing downtime when sending a volume to another host. I run it twice, the first time whilst the VM is still running, this sends the bulk of the changes since the last time the volume was sent to the destination host (or the entire volume if this is the first time). I then shutdown the VM and run syncoid again to send the changes that occured during the time between the first send and the end of shutting down the VM. This second send is very quick (it’s unlikely there were many changes to the VM disc in that period of time) and I can then start the VM on the destination host.

I’m sure the above process could be scripted.

papaPhiL · July 16, 2020, 12:54pm

I think I tried that, but because I moved away from HA-cluster to individual nodes, I cannot talk from experience. In the ceph scenario, if the hosting node fails, another host will jump in, see https://www.youtube.com/watch?v=T8Wdko31JB4 . But with clustering, there are downsides, and declustering is a mess. So I recommend to not cluster and achieve redundancy in a different way.

Udo_Hartmann · July 16, 2020, 2:07pm

For clustering you will need at least three nodes (one is allowed to fail) and each of the nodes has to provide a data store for vm images.

marcel_erkel · July 16, 2020, 4:47pm

If you’re willing to put in the effort checkout https://clusterlabs.org/.

If you can read German (or maybe use Google Translate), the following blog may be interesting for you as well:

https://herc.de/de/openhab2-hacluster-mit-pacemaker-und-corosync/

papaPhiL · July 20, 2020, 9:15pm

Here are my scripts
https://github.com/openPhiL/openHouse/tree/master/How%20to%20build%20it/4%20Solutions/Server%20Redundancy

Gad_Ofir · July 21, 2020, 7:54am

wow thank you i will definitely look into this, i am doing a new home just now
i have a proxmox for testing already runing
PfSense (with its own NIC for firewall)
freeNAS
ubuntu server to run all my IOT stuff

but i am still unclear about how to setup the storge right…
storge for proxmox itself(i have an ssd of 256)
storge for freenas (still did not buy any hardware)
i want my NVR also to go there

just wanted to share my headache

papaPhiL · July 21, 2020, 7:26pm

I started with PfSense as well but endet up at OpenSense (because Open)…
I tried FreeNas, but at the end, I had no use but overhead.
I have multiple ubuntu servers.

as for the storage, I choose not to seperate proxmox itself from the main storage, because the main storage is redundant, so be using the storage for proxmox as well, I make proxmox-OS failsafe. ZFS helps to get good speed even with slower drives, you could use your 256ssd as a zfs-cache.
have not started with real nvr yet, hope my cpu power is enough.

Feel free to connect with me if you have open questions.

xsherlock · July 22, 2020, 11:37am

Lets talk storage a bit more…
If I have 3 node proxmox cluster where every node having a single SSD. Is proxmox somehow taking care of creating a common spanned diskspace across those 3 nodes to host the VM’s Or every node has only access to its own single (failure prone) drive.
What would be a setup to be truly resilient to one node failure?
If a node that is running the OpenHAB VM dies. From where do I recover a VM to run it on the spare node that was in standby?
Should I have a seprate NAS to hold the RAID protected network share where the VM’s run (would that not be slow?)

xsherlock · September 7, 2020, 8:46am

There is an excellent multi part guide here at the serve the home that details construction of HA cluster exactly in a way I was intending for a Uber OH server.

binderth · September 7, 2020, 9:02am

Again, I think especially in Homeautomation, there’s a bunch of more or less bulletproof, stable and reliable hardware actuators not hardware server. You should aim to use these stable actuators and it would be less costly (in terms of money and time spent) to use these appliances instead of routines, logic and stuff within openHAB.

So, in terms of High Availabilty and reducing possible Points of Failure:

get robust, stable hardware actuators
use their interfaces to connect stable hardware to openHAB
use openHAB to let’s say orchestrate or bridge inter-hardware use cases
don’t use openHAB as a central and single point of functionality/logic

Just to get an example:

don’t connect valves to openHAB to regulate your central heating
instead: use a robust (local! not cloud-dependend!) heating solution
and use the interface of your heating solution to read the environment variables for your heating to trigger some sort of actions in openHAB for “luxury” purposes
you avoid failures along the way, beginning from an misconfigured openHAB to a failing SD card to a broken IP-connection, …
if your openHAB dies, you still have a warm house, even if you can’t see temperatures or your openHAB tells you to close the windows in the room upstairs…

Don’t fall for the fallacy, openHAB has to organise everything, know everything, works as a central intelligence. It’s not designed for that. It’s a central logic, which extends already in place appliances and their logics.

mstormi · September 7, 2020, 9:28am

Ouch. Sorry but that’s total nonsense.
You want OH to be your central controller else there’s so much stuff you will not be able to implement.

What you probably wanted to say is:
have another control layer that works on lower levels and without openHAB, such as device-to-device communication in KNX or ZWave does.
But that’s a mere fallback solution to only cover basicmost functionality (such as one light per room).

PS: it’s “actuator” in English. A 230VAC powered actor is … “not the brightest idea” … well, maybe for a couple of minutes it is, but mind the smoke

lipp_markus · September 7, 2020, 9:32am

While I fully understand the fascination of having a failover solution for OH for the sake of learning and for the sake of it really being cool and have I have toyed with that thought myself quite a bit. Proxmox makes HA clusters deceivingly simple on first glance. However, thinking it through, it seemingly gets really complicated quickly. What these guides do not mention is the need for separate networks (proxmox highly recommends to put the corosync traffic on a separate network), the need for a redundant storage option and the need for fencing devices (isolating misbehaving cluser nodes). Pair this with the need for making all switches redundant (you may need to use stacked switches for both networks, otherwise if one of your switches goes out, your whole cluster is dead), etc and this is reaching a level of complexity and price that exceeds my willingness to do so. Just my thoughts (the above is all described in the proxomox doc in more detail).

binderth · September 7, 2020, 9:45am

yes, absolutely, changed it to be more specific on that part. and also actuator.

binderth · September 7, 2020, 10:00am

If you work in HA-environments for let’s say Enterprise E-Commerce systems, than I’ll understand the need this also in your home:

seperate networks, or at least VLANs for all your appliances including QoS rules for important traffic, also separating your internal traffic (smart TV, notebook, smartphones, …) from your smarthome traffic
two internet providers including failover
having every hardware at least double with a failover strategy (and I mean every hardware from a single light bulb to separate power cables or hardware hubs or two bus cables, network cables, two power suppliers, diesel engine on standby, …
best case you have two different means of sending commands to your actuators, to avoid failures of one transportation medium
…

having openHAB on a HA-cluster is only the tip of the iceberg, if you ask me. and even IF you have openHAB on some sort of HA-cluster how do you ensure failsafe:

avoid crashing of openHAB after a misconfiguration of openHAB itself
provide all openHAB clusters to use the same configuration on distributed systems
provide openHAB with alternating persistance for your states and item values?
provide openHAB with distributed I/O layer
…

you could do that, and I admit this would be a SHITLOAD OF FUN - but then again: it would cost a fortune - of money and spare time…

xsherlock · September 7, 2020, 11:18am

This is a bit offtop but indeed, master furnace on/off is the single one functionality in the house that I did not trust to OH. An option of having 1.5m3 of water tank in basement to boil, hoping that emergency valve will do it’s job at 2.5bar is not a scenario I want to be a part of. I rely here on the furnace thermostat probe and its controller. However I if that would fail I have OH to at least warn me that the tank temp is above anything that. But for the low temp room circuits of course I use OH to regulate the heating valves.

As for the HA costs being prohibitively expensive. I think what with 3 nodes made of the Odroid H2+ and 2 x CRS305 switches from Mikrotik to do 2.5 Gbps Eth I can pull that off below 1000 Euro, and that is below the price that single PSU QNAP I’m running the house now.