Failover and Fault Tolerant!

ben_jones12 · November 17, 2015, 12:46am

Sounds too good to be true - I gotta try this!

Can imagine this will be very useful for testing as well, just shutdown my Linux server and leave the Z-Stick plugged into my RPi virtual node, then fire up my Windows dev environment and point it to the virtual device.

crankycoder · November 17, 2015, 12:53am

For sure. I didn’t even think of the dev applications there, but there is a windows driver. I have it installed, but only so i can see which box has control over the device. But for dev that would be sweet

geva · November 19, 2015, 3:15pm

A couple of virtualization servers and a file server is somewhat “a ton” of servers for most people to have at home. I only have one of each.

This does sound very interesting… I’m running OpenHAB on an RPi, but would love to move it to a lightning fast VM on my XenServer. Thanks for this info Jason.

crankycoder · November 19, 2015, 3:26pm

I agree, not everyone has multiple vm servers. But a single will still give some redundancy for patching of the OS and what not.
I only have the second because I do alot of dev/tech work that justified it

I am happy to share whatever info I can. Please let me know if there is something I can help with.

ubergeek · November 21, 2015, 1:14pm

Interesting approach @crankycoder , I will have to give it a go on my esxi. Where is the rsync portion shown in topography? Is that simply keeping the openhab folder in sync? Is it two way?

crankycoder · November 21, 2015, 5:52pm

At the moment I don’t have rsync setup. My configs haven’t changed much lately and I use a mysql db on a different server for all my persistence so I have not setup the rsync. But the plan is to do rsync between the 2 boxes 2-way.

pelnet · November 22, 2015, 8:32pm

Ha,

That’s a pretty nice setup, there. When I first started using OpenHAB some time ago, I posted my setup back then. For some reason, I can’t seem to find that post anymore, but no worries. By now I’ve got just about everything worked out, fault tolerant and redundant. First of all, OpenHAB only uses TCP/IP to communicate with the real world, which is imperative as I have both my OpenHAB servers (let’s call them OH1 and OH2) running on separate ESXi HP DL380 servers. These are both Linux VM’s using Keepalived to keep a shared IP online by means of VRRP.

The OpenHAB installation and certain custom logs as well as RRD files for CometVisu are stored on a separate virtual disk which has ‘RAID 1 over TCP’ using DRBD8. To avoid split brain and other issues with changing active node or after taking one node down, the disk/partition is configured with OCFS2, allowing DRBD to operate safely in primary/primary mode.

Although I monitor various internals and statistics by means of JMX via Zabbix, I rely on Monit to ensure the Java VM is behaving within limits/expectations (I/O access, CPU load, memory usage, etc).

As Keepalived takes care of the virtual IP, the transition scripts handle starting and stopping Monit. Monit in turn attempts to keep an instance of OpenHAB running, but the start.sh script also includes a check to prevent starting the daemon on a node without the shared IP.

Persistence data from OpenHAB is stored on a MySQL cluster, consisting of three nodes doing Galera replication, each again on separate HP DL servers. RRD data, as mentioned before is replicated in real time between nodes.

The setup itself mainly communicates over a custom HTTP protocol with DIY sensor and actor combination units, doing raw I/O, iR, TTL and RF communication with devices and appliances. Whenever possible, items are controlled by means of Ethernet/network. This includes practically all my HVAC (heating and DHW is handled by means of a modified OpenTherm Gateway and a Raspberry Pi, operated via OpenHAB), lighting, music (PC + Linux + Audacious), AV equipment, home theatre with projector, ambient sensors, alarm system with logging and email + SMS alerts, IP cameras and even my custom coffee maker. It can also turn computers on and off with Wake-on-LAN and SSH (or ‘net rpc’ for Windows boxes).

For usage on the road I have OpenVPN configured on separate pair of nodes and PPTP VPN support using a Cisco PIX firewall. This allows me to access OpenHAB with HABDroid (Android) or OpenHAB for iOS (I used to have an iPhone from my work).

At home I have some wall mounted push buttons here and there which trigger rules of varying complexity in OpenHAB, a couple of touchscreen terminals running CometVisu in Firefox, an all-inclusive master control CometVisu page, some smaller ones for other household members and low-res devices and I’m working on a HA Dashboard configuration, based on dashing.io, for a wall mounted tablet.

Still wishing for a desktop application for Linux/Windows, though… Surely, QT would be ideal for this? If only targeting Windows, .Net would be fine, but REST, JSON support is still pretty flaky…

crankycoder · November 23, 2015, 3:18pm

first off… can we be like best friends!!! haha

Seriously though. VERY nice setup. Very elaborate. Much more mature than my setup. I would love to see how your scripts are setup and various configurations.

Your comment about using dashing for the tablets, i would love to see how you have this setup. I have become some what of a fan of dashing over the last 9 months. I have a small dashing interface setup based on this

I had to reverse engineer some of it but it works very well so far. It definitely ups the WAF (wife acceptance factor)

pelnet · November 23, 2015, 6:19pm

Haha, thanks! My home network is a hobby gone wild, and the automation part had to ‘be done right’…

If you’re interested in the scripts and whatnot, drop me an email and let me know what you want. I’m not planning on putting it all on GitHub, but you’re welcome to anything you want. I can paste all the files here, but we’re talking about potentially thousands of lines (rules are 2K+ to start with…). The Keepalived/OCFS2/Monit/DRBD/start.sh would be doable…

I’m starting to like dashing, too! It’s awesome, but I’m more of a ‘function’ programmer, not a designer, so it’s rather challenging, and I’m thankful for any pre-made stuff which looks decent that I can find… each to his own.

That link (to the OpenHAB dashboard) is dynamite! Forget my crummy little thing, honestly. If you’re still interested, you can see it in the unmodified version here:

http://www.homeautomationforgeeks.com/dashboard.shtml

The reason I chose it was also WAF-influenced, but I like some eye candy too from time to time. I’ll see whether I can post some photos and screenshots later on.

Cheers,

P

crankycoder · November 23, 2015, 6:35pm

would love to see what all you have configured

marcolino7 · January 21, 2016, 11:32am

Hi @pelnet
I’m very intrested on how you monitoring your OpenHab with Zabbix. Now I monitor Linux OS via agent, but I would like to monitor also OpenHab. Can you provide more details on this?

Thanks

pelnet · January 21, 2016, 6:00pm

Hi,

Basically, I’ll break this down into three categories. The reason my configuration is a little split up is because I try to keep a hard and clear line between my home automation and the rest of my servers and home network. This reduces complexity and makes things more modular with less dependencies.

Zabbix monitoring, classic method (What would an admin responsible for the service, but not a user himself, care about?)

ICMP: Is the shared IP of the cluster online?
TCP 8480, 80, 443, and the other ports that should be listening on the shared IP
Processes: is the JVM running? Is Monit running?
Disk space: Is something eating disk/inodes, did I leave debug on?
CPU load/utilization: Critical alerting and comparison over time of various configs and their impact
File checksum on various logs: Is something writing to (f.e. events.log)? Are the RRD cronjobs being executed?
Generic stuff for OCFS2, O2CB, Keepalived, etc

Zabbix monitoring, JMX

Number of active threads
JVM memory footprint
Various other metrics that have indicated problems or given early warnings in the past (though, with 1.7.1, things are pretty damn stable)
Take a look at the JMX data of your instance - some metrics are of use, some aren’t. It depends greatly on your config.

“Internal” metrics from OpenHAB and auxiliary processes

XMPP and log alerts for failure to communicate with external data providers (weather, rubbish collection, etc)
BusOps a.k.a. operations per second on the OpenHAB bus. This is calculated by means of cronjobs which then write the value to an RRD file, which in turn can be viewed in CometVisu (eye candy).
Abnormal operations of devices (is the heating flip-flopping on and off, is the air conditioning running when the ambient temp is well within spec, that sort of stuff).
Temperature anomalies like sudden rises of ambient temp and generally exceeding pre-configured thresholds.

If I may be so bold; if you’re looking for some specifics on how to monitor your OpenHAB with Zabbix, let me know what you want, and I’m pretty sure I can whip up a proof-of-concept script which can be added to the Zabbix agent as a custom check.

Here’s a quick and dirty way to get your BusOP/s into an RRD file.

!/bin/bash
# first check whether master or backup
SHAREDIP='10.1.2.3';
GOTIP=`ip a | grep "$SHAREDIP"`
if [[ ! $GOTIP ]]; then
   echo "Slave nodes do not run this check";
   exit 1;
fi

cd /mnt/openhab/rrd
# create database if not exists
[ -f BUS_ACTIVITY.rrd ] || {
/usr/bin/rrdtool create BUS_ACTIVITY.rrd --step 300 \
 DS:load:GAUGE:1200:U:U \
 RRA:AVERAGE:00:1:3200 \
 RRA:AVERAGE:00:6:3200 \
 RRA:AVERAGE:00:36:3200 \
 RRA:AVERAGE:00:144:3200 \
 RRA:AVERAGE:00:1008:3200 \
 RRA:AVERAGE:00:4320:3200 \
 RRA:AVERAGE:00:52560:3200 \
 RRA:AVERAGE:00:525600:3200
}

LOAD=`tail -n0 -f /mnt/openhab/logs/events.log>/tmp/tmp.log & sleep 3; kill $! ; wc -l /tmp/tmp.log | cut -c-2`
/usr/bin/rrdupdate BUS_ACTIVITY.rrd N:$LOAD

If you replace the last line (/usr/bin/rrdupdate …) with the following, you can use it as a custom Zabbix check:

echo $LOAD

lysol · April 20, 2016, 1:55pm

Did you ever get rsync configured? I’m curious how you are going about it for automating it. cron job?

crankycoder · April 20, 2016, 2:22pm

cron would definitely work. That’s what I had been doing for awhile. I have been looking into doing something like DRBD for my configs lately though. Mainly from pelnet’s post above. I have been working alot more lately on some of my network stuff and some various new HA items like adding my stand alone hvac to openhab via mqtt + esp8266, and after my commute nightmares, i recently added some traffic tracking to give me a heads up before I leave the house if my drive is going to be a “1 cup” or “2 cup” of coffee type of commute.

The DRBD is hopefully on the list soon as I am putting custom built motion sensors in the house so I will need to bring my fault tolerant to top level.

benjherb · October 28, 2016, 12:13am

@crankycoder, how did you manage to get VirtualHere working with your Z-Stick Series 2? I’m having a ton of trouble with it, as was the guy in this thread I found on VH forums (I’ve posted about this there as well). Did you have to do any special config in OpenHAB or VH to get this working?

smar · October 28, 2016, 6:44pm

I was having some challenges with a zwave.me stick and so switched to using usbip. This is a linux only solution though (the windows client is not so great and quite old), but if you are using linux, could work well. Mine has been in use for a few months now without any issues.

staehler67 · February 6, 2017, 8:21am

I’m having problems with monit, as it just monitors openhab, but is not able to restart it. monit runs as root and should be able to restart any service, but openhab fails. Could you please share your openhab monit config?
BTW: A manual start of openhab works fine.
Do you start openhab via systemd or sysVinit scripts?

pelnet · February 6, 2017, 5:39pm

Hi,

I can gladly share my monit config, but it might not help you much. I’m not using either systemd or sysvinit as I run it in a screen. I like to be able to log in on the server in question (H/A pair) and see what it’s actually doing. I’m fully aware of the logs, but it’s something that stuck from getting started with it way back when. The config is slightly more extensive, but this is the beef of the matter:

/etc/monit/conf.d/openhab:

check process openhab-jvm
   matching "java"
   start program = "/mnt/openhab/etc/openhab-monit-start.sh"
   stop program = "/mnt/openhab/etc/openhab-monit-stop.sh"
   if cpu usage > 10% for 5 cycles then restart

[…more conditions…]

My start.sh is also customized to check for example whether the node has the shared IP and if any locks are present.

/mnt/openhab/etc/openhab-monit-start.sh:

#!/bin/sh
/usr/bin/screen -t "OpenHAB" -dmS openhab /mnt/openhab/start.sh

/mnt/openhab/etc/openhab-monit-stop.sh:

#!/bin/sh
/usr/bin/pkill java;
/bin/sleep 15;
/usr/bin/pkill -9 java;

Again, there are some environmental specifics, but this is the logic.

Oh yeah, @crankycoder: You might want to take a look at DRBD9, it’s more like a fully featured SAN over Ethernet instead of just RAID1 with pitfalls. When I come around to rebuild the setup I might just drop OCFS2/O2CB. Stateful resource management with Corosync/Pacemaker might also be an option if you simply want primary/secondary DRBD with role switching on the fly.

Cheers,

PelliX

crankycoder · February 7, 2017, 12:15am

Thanks Pelli, Ill take a look. My DRBD8 has been acting up and not syncing. Ill try it with 9. I need to start my lab environment soon for 2.0!

bodomenke · July 29, 2019, 10:53pm

@pelnet Great! This seems to work quite well with Monit on MacOS!

Many thanks.