Power cut recovery

Hi all,
Came back this evening to all lights on and the house automation in disarray as a power cut had occurred whilst I was out.

It takes me around 30-40 minutes to go around and reset the things I know will have issues when booting up and can get up before my DHCP server (Google Wifi) is a available to offer leases. and the boot up my server, and then all the virtual servers etc… then reset RPi’s that perform tasks such as monitoring stuff.

I then have a number of sonoffs and other devices that do not come up cleanly and need checking.

it occurred to me that with some careful thought I could automate the recovery of the home to a desired state after such an event. It would mean some changes to my current settings but essentially I would need a UPS on my openhab server not to keep the power up but to provide time for OH to set a persistent flag to indicate power failure’ and to shutdown all the other virtual machines and the main virtual host server that it sits on.
By using a delay start of say 5 minutes after power returns I could then code that at system start if the flag is set perform the ‘power recovery’ setup that would automate as much of the tedious task of going around checking things work and switching lights off etc…

So a few things:

  1. What UPS (cheap, because thats me) is a good solid worker with openhab?
  2. Anyone have experience or code for any of what i have described?
  3. Any other ideas or suggestions?

Thanks

Paul

You’l need to describe your server hardware a bit for UPS suggestions.

The idea seems generally sound, if you have to take special actions after a power outage.
I don’t have any code like that, have instead written rules that assume that every boot is potentially after some kind of disaster and then do what is needed.

I don’t assume that a light found unexpectedly on is a problem - I don’t know what the occupants are up to “while I’ve been away”.
But I do e.g. start the usual occupancy related timeouts so that any such light acts as though it has just been triggered from motion, rather then staying on indefinitely.

The answers will depend not just on your hardware but all sorts of things.

Personally I kind of apply the same approach as rossko57. Any restart is assumed to be after a catastrophic failure and I take all the appropriate actions accordingly.

For my setup I have all of this automated but OH isn’t involved in any of it.

I have my router, server, and external hard drives plugged into an UPS. I use NUT and an old RPi1 plugging into it as the NUT server.

I’m running ESXi for my VMs and there is a NUT plug-in that will shutdown the VMs and turn the physical machine off when the UPS battery goes too low. This is more of a second tier safety measure as I also have all my VMs as NUT clients and those too will shutdown when the power gets too low (but not as low as when the ESXi will shut itself down).

I have ESXi configured to restart my VMs in a specific order and after a specific delay when power is restored to the machine so my NAS comes up first before my other services. The physical machine itself will turn back on automatically when power is restored.

NOTE: if power comes back when the UPS has five minutes left then I’ll be in trouble but I’ve not spent the time to make this more foolproof.

My WiFi is configured as just a dumb AP so I don’t care if it has the power cut, but it’s plugged into the UPS so we can keep internet when the power goes out, at least for an hour or so. My network is controlled by a physical machine running pfSense. I’ve gone back and forth and at this time I just let it run until the power runs out of the UPS. Probably not the best idea but again, I’ve not spent the time to make something more robust. It comes up super fast so I don’t experience networking problems. But I also assign static IPs to all my devices in pfSense which probably helps avoid some problems.

All of my other devices scattered throughout the house are left to their own devices. I’ve not invested in setting up UPS for my remote RPis and speakers and such.

On my OH machine, I suppose I should configure my Docker containers to come up in a specific order so Mosquitto and InfluxDB come up before OH, but so far I’ve not encountered any problems.

When OH starts up, I have Rules that reset certain things (e.g. assume no one is home, assume nothing is online) then do some polling to get the current status of things. But these are not actually changing anything, it’s just making sure that OH has as accurate information about the states of everything as is possible. But like rossko57 said, this runs every time OH restarts.

For your specific questions:

  1. It depends mainly on how much you need to power off of the UPS and for how long. I have a https://www.amazon.com/gp/product/B000FBK3QK/ref=oh_aui_search_asin_title?ie=UTF8&psc=1 mainly because I got it on sale. I could have gotten away with a 600w model. You will need to do some math to add up how much wattage your devices consume and then some more math to calculate how many VA you need to power it all for your desired amount of time. There are lots of resources online to help. Personally, I calculated the W and just hoped the VA was enough to power for at least an hour and I managed to just about hit it exactly.

  2. See above. With NUT and the stuff built into what you are already running this will likely just be a matter of configuration.

@rossko57
For item #1 I was thinking more about the make APC, Cyberpower or the like. I will of course calculate the capacity required based on my requirements.
I can see how timed lighting based on occupancy could assist, but I do not currently have that covered in my setup. I assume you use individual room occupancy detection such as PIRs?

@rikoshak
Some great ideas for me to consider.

The main differences for me from a OH power on and the power cut recovery is that many devices boot faster than google wifi, this means they come up prior to the DHCP server which Google mandates is enabled in GW. so controllers such as Phillips Hue need to be rebooted once the wifi mesh has been established and internet is up too. That is just one example of order dependency I have observed, I believe there some more.

It seems NUT is the way forward here and I will be certain to make sure I delve into that, initially I will create two modular rules one for diagnostics - this will go around each of the room and check whats working and whats not and report back, this report will also be available for each room through Alexa and as a whole so it will help me check things when I am trouble shooting.

The other module will be the recovery module and have logic that says if powercvut recovery is play and X is on but diagnostics reports it not available then reboot it. This one will be a little more complex as it will need to have tiers of dependencies I suspect.

Thank you both.

Paul

Okay, there are other ways to tackle that (fixed IP most obvious).
I guess the fallout is that OH cannot “see” the device? So that is detectable at OH start. But if OH can’t “see” it, how can it reboot it?

Since this was not a tutorial, I’ve moved this to a more appropriate category.

1 Like

One thing that comes to mind is that if the GW is on the power backup, only in extreme power outage cases will you lose network. It might be enough to invest in an UPS just for the GW which is large enough to keep it going longer than your typical power outage. If you don’t lose power to it then you don’t have the boot time problems. Only if you lose power for longer than your UPS can power it will you have the reboot problem.

At that point you can decide if it’s worth extra effort to handle the rest. How much work is it worth doing to solve a problem that occurs once a year or less?

Of course, if this is a fun project, than all that goes out the window. Do it all and have fun! :smiley:

1 Like

@rossko57

My day job is networks so I am very comfortable with the networking side.
As it happens the example problem devices are things like the Phillips Hue Bridge, this controller when not online on its correct IP address prevents all my lights from functioning.

  • The hue bridge does not support a static IP address
  • The Hue Bridge is connected by wired ethernet to a local Google Wifi AP
  • I use a reserved IP address in Google Wifi to ensure Hue has a known IP address.
  • The Hue ends up with a random 169 unable to locate DHCP server address.

The solution I am considering is to stick a sonoff between the power and the hue bridge. therefore if the recovery code detects the hue bridge is not available on its correct IP address it power cycles the bridge using the sonoff. As the sonoff is reachable by MQTT and does not suffer the issue as it connects by Wifi provided by the AP, problem is sorted and I look at the next one.

I could reposition the Philips Hue so that it connected by Wifi too and that would also solve the issue. However the reason it connects using wired ethernet is that the best position for it is very close to my Google Wifi point and it struggles to connect and becomes unreliable (blasted out) moving it further away is difficult to achieve as central positions are typically the same locations as my mesh wifi points. and due to some zigbee gaps it needs to be in the area it is currently located.

@ rikoshak

In New Zealand power cuts are far more frequent than in Germany having lived in both country’s for more than a decade.
I would say we go through spates of them and winter is particular bad we can have a couple in the same week. Often only lasting 5 to 10 minutes but enough disruption to my systems that takes me around an hour to go through. And if I am not home then it gets left until I get home with my family blaming my automation rather than the power cut.

As I have a mesh wifi system with three AP’s and looking to expand to 4. I am thinking that adding a UPS to each AP is not financial efficient. However, perhaps I should check if the DHCP server is only running on the primary AP that connects to the router if so, with careful organising I could UPS the google Wifi primary AP and the main virtual host server and remove many issues in one foul swoop. I like it.

Incidental I decided to go with a CyberPower UPS as it seems to have good support with NUT.

Thanks for all the great discussion.

Regards

Paul

But do you really need to add one to all of them? Would keeping one of them up during the power outage improve the boot times of the rest perhaps? I don’t use a mesh system so can’t say I know how they work. The fact that they take so long to boot certainly doesn’t recommend them.

But typically you can only have one DHCP server per subnetwork. Since all of your mesh is on the same subnetwork I suspect that indeed the DHCP server is only running on the primary AP. It’s worth doing some research to find out.

Where I live in Colorado, USA most of the power lines are buried and the power is very reliable. In the five years I’ve lived here we’ve only lost power three times. One caused by a forest fire, one caused by a car hitting a power substation, and the most recent caused by the bomb cyclone that happened a few weeks ago. The first two lasted under an hour. The last one was over four hours I think (we were actually on a cruise ship heading into Saint Thomas at the time).

For the last power outage I can say that everything did not come back up cleanly. But enough came back up cleanly that I was able to get almost everything working again using the wifi from a restaurant on the beach in about 10 minutes.

I think the APCs have good support too, but I went with CyberPower for the same reason.

In case it helps, here are my Ansible scripts I use to configure my NUT server:

---
# tasks file for nut

- name: Install nut server
  apt:
    name: "{{ item }}"
    update_cache: no
  become: true
  with_items:
    - nut
    - nut-client
    - nut-server
  register: nut_installed

- name: Configure NUT for the CyberPower CP1500AVRLCD
  ini_file:
    path: /etc/nut/ups.conf
    state: present
    section: cyberpower1
    option: "{{ item.option }}"
    value: "{{ item.value }}"
  with_items:
    - { option: "driver", value: "usbhid-ups" }
    - { option: "port", value: "auto" }
    - { option: "desc", value: "\"CyberPower CP1500AVRLCD\"" }
    - { option: "pollinterval", value: "15" }
  become: true

- name: Reboot
  include_role:
    name: reboot
  when: nut_installed.changed

- name: Start the NUT Driver
  systemd:
    daemon_reload: yes
    enabled: yes
    name: nut-driver
    state: started
  become: true

- name: Update upsd.conf
  blockinfile:
    path: /etc/nut/upsd.conf
    block: |
      LISTEN 127.0.0.1
      LISTEN {{ nutserver }}
      MAXAGE 25
  become: yes

- name: Add nut users
  ini_file:
    path: /etc/nut/upsd.users
    state: present
    section: "{{ item.section }}"
    option: "{{ item.option }}"
    value: "{{ item.value }}"
  with_items:
    - { section: "admin", option: "password", value: "{{ share_pass }}" }
    - { section: "admin", option: "actions", value: "SET" }
    - { section: "admin", option: "instcmds", value: "ALL" }
    - { section: "rich",  option: "password", value: "{{ share_pass }}" }
    - { section: "rich",  option: "upsmon", value: "master" }
  become: true

- name: Configure the server to run
  lineinfile:
    backrefs: yes
    line: MODE=standalone
    path: /etc/nut/nut.conf
    regexp: MODE=none
    state: present
  become: yes

- name: Start the NUT Server
  systemd:
    daemon_reload: yes
    enabled: yes
    name: nut-server
    state: started
  become: true

- name: Set up the NUT client
  include_role:
    name: nut-client

- name: Cycle the services
  systemd:
    name: "{{ item }}"
    state: started
  become: true
  with_items:
    - nut-driver
    - nut-server
    - nut-monitor

And here is the nut-client Ansible script

---

- name: Install nut client
  apt:
    name: nut
    update_cache: no
  become: true

- name: Set up the NUT client
  blockinfile:
    path: /etc/nut/upsmon.conf
    block: |
      MONITOR cyberpower1@localhost 1 rich {{ share_pass }} master
      DEADTIME 25
      NOTIFYCMD /etc/nut/notifycmd.sh
      NOTIFYFLAG ONLINE     SYSLOG+WALL+EXEC
      NOTIFYFLAG ONBATT     SYSLOG+WALL+EXEC
      NOTIFYFLAG LOWBATT    SYSLOG+WALL+EXEC
      NOTIFYFLAG FSD        SYSLOG+WALL+EXEC
      NOTIFYFLAG COMMOK     SYSLOG+WALL+EXEC
      NOTIFYFLAG COMMBAD    SYSLOG+WALL+EXEC
      NOTIFYFLAG SHUTDOWN   SYSLOG+WALL+EXEC
      NOTIFYFLAG REPLBATT   SYSLOG+WALL+EXEC
      NOTIFYFLAG NOCOMM     SYSLOG+WALL+EXEC
      NOTIFYFLAG NOPARENT   SYSLOG+WALL
  become: true

- name: Copy the notify shell script
  copy:
    src: notifycmd.sh
    dest: /etc/nut/notifycmd.sh
    mode: 0755
    owner: root
    group: nut
  become: yes


#- name: Cycle the services
#  systemd:
#    name: "{{ item }}"
#    state: started
#  become: true
#  with_items:
#    - nut-monitor

And the notifycmd.sh script mentioned in the last script

#!/bin/bash
sendmail=/usr/sbin/sendmail
email=$EMAIL_ADDRESS
to='To: '$email'\n'
from='From: '$email'\n'
hostname=$(uname -n)
subject='Subject: NUT ALERT: '$NOTIFYTYPE'\n\n'
body='Alert type: '$NOTIFYTYPE

msg=${to}${from}${subject}${body}

I don’t configure anything by hand if I can help it. Everything is scripted through Ansible which means my entire configured is documented and repeatable. I could rebuild all of my VMs and RPis automatically in about an hour except for ESXi, the NAS, the AP, and pfSense which I still need to build by hand. But those don’t take much longer than the rest to rebuild with backups.

2 Likes

Thank you very much for you ansible scripts.
Apparently my UPS will be here tomorrow so I can do some testing :wink:

Cheers

Paul

1 Like

@5iver
Thanks for moving this thread to the best location. I was mistaken that ‘discover’ meant discuss and tease out solutions and therefore selected the wrong category. Reading it again I can see the ambiguity.

“Discover and contribute solutions and instructions for your openHAB smart home automation.”

Thanks

Paul

1 Like

I can set a fixed IP on my v2 bridge at least via the Android app.

1 Like

If using a standard Router for DHCP, most Routers have an option to pin a MAC to a specific IP, so the bridge could get always the same IP :wink:

@Udo_Hartmann
That is correct and that is what I termed reserved address its a pseudo static, but it still requires the DHCP server to be online and dishing out addresses.

OMG, I have just searched around the app and you can indeed setup static IP.
Thanks for that.
One issue down. Assuming Google Wifi runs the DHCP function on the primary unit and I do as Rich suggest add that to the UPS then I am starting to get an improved recovery already with minimal work.

Thanks
Paul

1 Like

The guy who develops nuts works at Eaton, another UPS manufacturer. I prefer Eaton, I do not know devices by Cyberpower, but I stopped buying APC. They started to switch to a closed protocol a few years ago. The guys at apcupsd were pretty upset.

Hi,
if you are DIY guy and accept some work as well as lower quality (for lower money though) you go the way I’ve taken:

1 Like

I was trained as an electronics technician but those days went along with my sight. But great input for others reading the thread I have no doubt.

Thanks

Paul

It‘s nice to have an UPS, but a UPS is only as good as the battery is. So don‘t forget to check the battery from time to time. At most depending on surrounding temperature the battery can last 4 to 5 years, but it can even be bad within 2 years lifetime.