Failover and Fault Tolerant!

crankycoder · November 16, 2015, 1:57pm

Good morning All,
I wanted to share some pretty sweet news. Over the weekend I was able to configure openhab to be pretty fault tolerant with a 2 server setup. I documented my setup and configs on my blog site.

Here are the posts specifically related to this.

So, my system does still have some points of failure but they are not 100% critical.

my z-wave stick does move between the 2 via virtualhere which i document, but that usb server is still running on a raspberry pi so if that pi goes bye bye so does z-wave. But that is not my entire setup, so that only breaks part of my setup.
all of my persistence is going to a load balanced mysql cluster. No local storage of persistence. This is a point of failure, but again, not critically damaging to openhab.

It sounds like I have a ton of servers in my house (not the case) i have 3 servers in my house. a big file server, 2 decent esxi servers. most of my setups are vm’s clustered between the 2 esxi servers.

I will be doing some tweaks to my scripts and will post updates. Just thought you guys might want to see something I did with openhab.

ben_jones12 · November 16, 2015, 8:03pm

Nice one Jason - very interested to hear how VirtualHere works for you. I would love to decouple my USB dongles (Z-Wave and RFXCOM) from my main server and run them on a dedicated RPi2. I am running virtual OpenVZ containers and had quite a bit of difficulty getting the USB devices to appear in the containers.

crankycoder · November 17, 2015, 12:31am

so far the virtualhere has worked like a champ. I failed over back and forth a few times last night to test everything then walked around the house turning my lights on and off.

i obviously have the virtual ip in my openhab config so it goes to whatever the active server is.

But so far i am very surprised how well it works.

with 1 device its free, but i think beyond sharing 1 usb you need to buy a license. But at this point seems totally worth it.

ben_jones12 · November 17, 2015, 12:33am

Brilliant - happy to pay for something that works that well - how did you configure the Z-Wave binding in openHAB to access this virtual USB device?

crankycoder · November 17, 2015, 12:40am

that was one of the beautiful pieces. the binary that runs loads a driver into the system so it still shows up at /dev/ttyUSB0 or whatever it is when you have it physically plugged in. So i didn’t even change the binding!

ben_jones12 · November 17, 2015, 12:46am

Sounds too good to be true - I gotta try this!

Can imagine this will be very useful for testing as well, just shutdown my Linux server and leave the Z-Stick plugged into my RPi virtual node, then fire up my Windows dev environment and point it to the virtual device.

crankycoder · November 17, 2015, 12:53am

For sure. I didn’t even think of the dev applications there, but there is a windows driver. I have it installed, but only so i can see which box has control over the device. But for dev that would be sweet

geva · November 19, 2015, 3:15pm

A couple of virtualization servers and a file server is somewhat “a ton” of servers for most people to have at home. I only have one of each.

This does sound very interesting… I’m running OpenHAB on an RPi, but would love to move it to a lightning fast VM on my XenServer. Thanks for this info Jason.

crankycoder · November 19, 2015, 3:26pm

I agree, not everyone has multiple vm servers. But a single will still give some redundancy for patching of the OS and what not.
I only have the second because I do alot of dev/tech work that justified it

I am happy to share whatever info I can. Please let me know if there is something I can help with.

ubergeek · November 21, 2015, 1:14pm

Interesting approach @crankycoder , I will have to give it a go on my esxi. Where is the rsync portion shown in topography? Is that simply keeping the openhab folder in sync? Is it two way?

crankycoder · November 21, 2015, 5:52pm

At the moment I don’t have rsync setup. My configs haven’t changed much lately and I use a mysql db on a different server for all my persistence so I have not setup the rsync. But the plan is to do rsync between the 2 boxes 2-way.

pelnet · November 22, 2015, 8:32pm

Ha,

That’s a pretty nice setup, there. When I first started using OpenHAB some time ago, I posted my setup back then. For some reason, I can’t seem to find that post anymore, but no worries. By now I’ve got just about everything worked out, fault tolerant and redundant. First of all, OpenHAB only uses TCP/IP to communicate with the real world, which is imperative as I have both my OpenHAB servers (let’s call them OH1 and OH2) running on separate ESXi HP DL380 servers. These are both Linux VM’s using Keepalived to keep a shared IP online by means of VRRP.

The OpenHAB installation and certain custom logs as well as RRD files for CometVisu are stored on a separate virtual disk which has ‘RAID 1 over TCP’ using DRBD8. To avoid split brain and other issues with changing active node or after taking one node down, the disk/partition is configured with OCFS2, allowing DRBD to operate safely in primary/primary mode.

Although I monitor various internals and statistics by means of JMX via Zabbix, I rely on Monit to ensure the Java VM is behaving within limits/expectations (I/O access, CPU load, memory usage, etc).

As Keepalived takes care of the virtual IP, the transition scripts handle starting and stopping Monit. Monit in turn attempts to keep an instance of OpenHAB running, but the start.sh script also includes a check to prevent starting the daemon on a node without the shared IP.

Persistence data from OpenHAB is stored on a MySQL cluster, consisting of three nodes doing Galera replication, each again on separate HP DL servers. RRD data, as mentioned before is replicated in real time between nodes.

The setup itself mainly communicates over a custom HTTP protocol with DIY sensor and actor combination units, doing raw I/O, iR, TTL and RF communication with devices and appliances. Whenever possible, items are controlled by means of Ethernet/network. This includes practically all my HVAC (heating and DHW is handled by means of a modified OpenTherm Gateway and a Raspberry Pi, operated via OpenHAB), lighting, music (PC + Linux + Audacious), AV equipment, home theatre with projector, ambient sensors, alarm system with logging and email + SMS alerts, IP cameras and even my custom coffee maker. It can also turn computers on and off with Wake-on-LAN and SSH (or ‘net rpc’ for Windows boxes).

For usage on the road I have OpenVPN configured on separate pair of nodes and PPTP VPN support using a Cisco PIX firewall. This allows me to access OpenHAB with HABDroid (Android) or OpenHAB for iOS (I used to have an iPhone from my work).

At home I have some wall mounted push buttons here and there which trigger rules of varying complexity in OpenHAB, a couple of touchscreen terminals running CometVisu in Firefox, an all-inclusive master control CometVisu page, some smaller ones for other household members and low-res devices and I’m working on a HA Dashboard configuration, based on dashing.io, for a wall mounted tablet.

Still wishing for a desktop application for Linux/Windows, though… Surely, QT would be ideal for this? If only targeting Windows, .Net would be fine, but REST, JSON support is still pretty flaky…

crankycoder · November 23, 2015, 3:18pm

first off… can we be like best friends!!! haha

Seriously though. VERY nice setup. Very elaborate. Much more mature than my setup. I would love to see how your scripts are setup and various configurations.

Your comment about using dashing for the tablets, i would love to see how you have this setup. I have become some what of a fan of dashing over the last 9 months. I have a small dashing interface setup based on this

I had to reverse engineer some of it but it works very well so far. It definitely ups the WAF (wife acceptance factor)

pelnet · November 23, 2015, 6:19pm

Haha, thanks! My home network is a hobby gone wild, and the automation part had to ‘be done right’…

If you’re interested in the scripts and whatnot, drop me an email and let me know what you want. I’m not planning on putting it all on GitHub, but you’re welcome to anything you want. I can paste all the files here, but we’re talking about potentially thousands of lines (rules are 2K+ to start with…). The Keepalived/OCFS2/Monit/DRBD/start.sh would be doable…

I’m starting to like dashing, too! It’s awesome, but I’m more of a ‘function’ programmer, not a designer, so it’s rather challenging, and I’m thankful for any pre-made stuff which looks decent that I can find… each to his own.

That link (to the OpenHAB dashboard) is dynamite! Forget my crummy little thing, honestly. If you’re still interested, you can see it in the unmodified version here:

http://www.homeautomationforgeeks.com/dashboard.shtml

The reason I chose it was also WAF-influenced, but I like some eye candy too from time to time. I’ll see whether I can post some photos and screenshots later on.

Cheers,

P

crankycoder · November 23, 2015, 6:35pm

would love to see what all you have configured

marcolino7 · January 21, 2016, 11:32am

Hi @pelnet
I’m very intrested on how you monitoring your OpenHab with Zabbix. Now I monitor Linux OS via agent, but I would like to monitor also OpenHab. Can you provide more details on this?

Thanks

pelnet · January 21, 2016, 6:00pm

Hi,

Basically, I’ll break this down into three categories. The reason my configuration is a little split up is because I try to keep a hard and clear line between my home automation and the rest of my servers and home network. This reduces complexity and makes things more modular with less dependencies.

Zabbix monitoring, classic method (What would an admin responsible for the service, but not a user himself, care about?)

ICMP: Is the shared IP of the cluster online?
TCP 8480, 80, 443, and the other ports that should be listening on the shared IP
Processes: is the JVM running? Is Monit running?
Disk space: Is something eating disk/inodes, did I leave debug on?
CPU load/utilization: Critical alerting and comparison over time of various configs and their impact
File checksum on various logs: Is something writing to (f.e. events.log)? Are the RRD cronjobs being executed?
Generic stuff for OCFS2, O2CB, Keepalived, etc

Zabbix monitoring, JMX

Number of active threads
JVM memory footprint
Various other metrics that have indicated problems or given early warnings in the past (though, with 1.7.1, things are pretty damn stable)
Take a look at the JMX data of your instance - some metrics are of use, some aren’t. It depends greatly on your config.

“Internal” metrics from OpenHAB and auxiliary processes

XMPP and log alerts for failure to communicate with external data providers (weather, rubbish collection, etc)
BusOps a.k.a. operations per second on the OpenHAB bus. This is calculated by means of cronjobs which then write the value to an RRD file, which in turn can be viewed in CometVisu (eye candy).
Abnormal operations of devices (is the heating flip-flopping on and off, is the air conditioning running when the ambient temp is well within spec, that sort of stuff).
Temperature anomalies like sudden rises of ambient temp and generally exceeding pre-configured thresholds.

If I may be so bold; if you’re looking for some specifics on how to monitor your OpenHAB with Zabbix, let me know what you want, and I’m pretty sure I can whip up a proof-of-concept script which can be added to the Zabbix agent as a custom check.

Here’s a quick and dirty way to get your BusOP/s into an RRD file.

!/bin/bash
# first check whether master or backup
SHAREDIP='10.1.2.3';
GOTIP=`ip a | grep "$SHAREDIP"`
if [[ ! $GOTIP ]]; then
   echo "Slave nodes do not run this check";
   exit 1;
fi

cd /mnt/openhab/rrd
# create database if not exists
[ -f BUS_ACTIVITY.rrd ] || {
/usr/bin/rrdtool create BUS_ACTIVITY.rrd --step 300 \
 DS:load:GAUGE:1200:U:U \
 RRA:AVERAGE:00:1:3200 \
 RRA:AVERAGE:00:6:3200 \
 RRA:AVERAGE:00:36:3200 \
 RRA:AVERAGE:00:144:3200 \
 RRA:AVERAGE:00:1008:3200 \
 RRA:AVERAGE:00:4320:3200 \
 RRA:AVERAGE:00:52560:3200 \
 RRA:AVERAGE:00:525600:3200
}

LOAD=`tail -n0 -f /mnt/openhab/logs/events.log>/tmp/tmp.log & sleep 3; kill $! ; wc -l /tmp/tmp.log | cut -c-2`
/usr/bin/rrdupdate BUS_ACTIVITY.rrd N:$LOAD

If you replace the last line (/usr/bin/rrdupdate …) with the following, you can use it as a custom Zabbix check:

echo $LOAD

lysol · April 20, 2016, 1:55pm

Did you ever get rsync configured? I’m curious how you are going about it for automating it. cron job?

crankycoder · April 20, 2016, 2:22pm

cron would definitely work. That’s what I had been doing for awhile. I have been looking into doing something like DRBD for my configs lately though. Mainly from pelnet’s post above. I have been working alot more lately on some of my network stuff and some various new HA items like adding my stand alone hvac to openhab via mqtt + esp8266, and after my commute nightmares, i recently added some traffic tracking to give me a heads up before I leave the house if my drive is going to be a “1 cup” or “2 cup” of coffee type of commute.

The DRBD is hopefully on the list soon as I am putting custom built motion sensors in the house so I will need to bring my fault tolerant to top level.

benjherb · October 28, 2016, 12:13am

@crankycoder, how did you manage to get VirtualHere working with your Z-Stick Series 2? I’m having a ton of trouble with it, as was the guy in this thread I found on VH forums (I’ve posted about this there as well). Did you have to do any special config in OpenHAB or VH to get this working?