Monitoring solution for openHAB et al

I guess, this is not only openHAB or in any way specific to openHAB.

My use case is to have a monitoring system up and running and have also (not exclusively) openHAB monitored. Why this? The last week i had two “out of memory”-crashes for my OH3-setup (found no time yet to investigate, which is causing this), and I’d like to monitor other services in my smarthome (router, DNS-server, etc.) also with this.

So I’m thinking of using something like a SNMP-based monitoring tool to achieve this. are there any hints and tipps out there? I have a brand new DS720+ waiting for a new docker image or I have a bunch of Raspberry Pis laying around to be used for this purpose:

  • central monitoring
  • checking live-status of some appliances (TCP-Ping)
  • going into deeper monitoring like “available ram” on a machine like my Pi4 running OH3
  • sending emails or push notifications, if something is not as it should
  • “easy-to-install” - “not-to-complex-to-use”: e.g. having a more or less decent installation and configuration of monitoring

Something like LibreNMS, icinga2, zabbix, …? What’s a best fit for a geeky nerd like me?
Thanks for your help!

I’m using telegraf which is pulling stats from all the docker containers on the host (which openHAB is one). And from the host.

Into influx. Which was originally installed for openHAB.

Grafana for visualisation and notifications

1 Like

I already thought of setting up a zabbix VM for monitoring several services in my network.
It is only the time missing for this project to become reality :wink:

Just found a blog with some information on this :
https://www.lafois.com/2020/09/22/monitoring-an-home-automation-installation-properly/

I went down the Zabbix route and I’m pretty happy with it. But setting it up and configuring it is about as complicated as openHAB itself. But I get email alerts when stuff goes down and changes unexpectedly which is what I really wanted.

I’d post a screen shot but apparently I’ve got some Zabbix problems after yesterday’s upgrade.

[Edit] OK, the problem was I thought that I was keeping up with PostgreSQL upgrades but I wasn’t. I’m running 11.4 and Zabbix now requires 13. So it’s not a Zabbix problem.

Telegraph/InfluxDB/Graphana will probably look better but it looked like it requires even more fiddling and configuration.

openHAB has an add-on that supports Prometheus if you want detailed info about OH itself.

3 Likes

ok now, I just found out, that Zabbix 6.0 LTS is out since a few days - and it’s got a WAYYYY better installation routine than last time I tried. So I now have it running on a spare Raspberry Pi and got some cool first achievements with at least getting the vital data from my OH3 RaspberryPi via Zabbix agent and now on for other devices and applications.
Do you have some experience with the “Zabbix Java Gateway”, Rich? Should allow some more insight into Java Applications…? which would help identify problems with OOM-issues a bit faster, I guess…

I looked into it briefly but there really wasn’t anything special I wanted to pull from OH or Guacamole (my only two Java applications, and I don’t even run Guacamole any more) so beyond taking note that it exists I never looked into it closely.

So far I’ve not really suffered from OOM problems personally so never had a need to monitor it. But I can somewhat monitor it by proxy by getting an alert from Zabbix when the swap starts to be used.

1 Like

You could try netdata. By using the provided docker image it is close to zero configuration with an impressive amount of monitoring information out of the box.

2 Likes

First and foremost check a process watchdog to automatically start back process when it failed. Other things such as registering failure are still relevant, but first part of the problem is automatic recovery as much as it is possible. look at “systemd process watchdog” and automatic restart options for docker.
Best, Łukasz

2 Likes

You can configure openHAB to restart automatically on OOM errors… This is what I have in place. Nowadays I do not have any OOM errors but it is good the protection just in case there would be some regressions.

In addition I have configured raspberry pi watchdog module with max cpu and memory on case something else is taking the memory or somehow gets locked out.

The whole system is read-only so restart will restore things into "known good state ".

The java parameter for exiting on OOM errors is
-XX:+ExitOnOutOfMemoryError

You need to configure systemd to restart the service on failures, if it is not the default.

I have switched to AdoptOpenJDK due to issues experienced with zulu java Syslog Errors (100 GB) - #38 by ssalonen

Otherwise this setup has been stable for me for the last couple of years, since openHAB1

zulu java (

2 Likes

All right. Just a heads-up: got zabbix working and can now check vital signs of machines running the “Zabbix agent”. Only thing is, I didn’t yet figure out how to “copy” items and triggers, so I don’t have to replicate those for each Thing, I’d like to monitor in case it goes OFFLINE or any other state except ONLINE…

Tried to, but I guess, I’m just to dumb to figure out, how those docker options for my Synology DSM7 come together, did not manage to start one single docker container…

The triggers and items come through the templates you apply. So, for example, most of the standard health and status for a given Linux machine would come by applying the Linux template. A periodic ping I think comes from an ICMP template. Watching the running services comes through a systemd template.

If the built in templates are not sufficient, you can create your own templates. That is probably the proper way to do this. I’ve been happy with the built in templates so I’ve only done a very little bit of looking into what it takes to create one.

Note: Upgrading PostgreSQL between major versions is a huge pain. :frowning:

Yes, I found the templates already, thank you!
But my Use case is something like setting up an item for REST-API Requests for multiple Things states. So I can copy them and only edit the name of the thing, but then again, that’s only a handful…

I bet if you figure out how to create a template that discovers all the Things in OH, lots of people would be happy and use it. :wink: I know something like that should be possible based on how templates like systemd and MQTT work.

haha! #noPressureBut:wink:
i don’t think, that I could find the time for it, as I only have two “vital” things to check - and my bet is, if either one of those are OFFLINE or the Zabbix Agent finds an alarming/desaterous item, I should have to have a look…

but yeah, what I CAN do is to make a short HowTo afterwards on how I approached it. It’s not only the ONLINE-things, but also if persistence works, or if there’s a bunch of errors in the log, …

I manage a number of OH systems and needed remote access and to know if any were down. I looked at some network monitoring services but they were all way more than I needed and/or cost$$. Then I realized I could probably do it with openHAB! So:

For remote access we use Tailscale.

We have an office system which I installed the Network Binding on and then a “Pingable Network Device” through that with an Item linked to the “Online” channel for each system. The ip address for each system comes from the ip in TailScale. All of those Items are part of the Ping group and if any go down I get an email. I can then go to TailScale and see which one is down.

Finally I did the same install on my home system which lets me know if the office system goes down. Works great.

I can post instructions if anyone is interested.

Phew. That was more of a pain than I expected. It just goes to show my mantra, the longer between upgrades the harder the upgrade will be.

Note to self: When using Ansible and Docker to deploys stuff, make sure to actually pull the latest image during the upgrades. Simply restarting the container won’t do anything for you. It looks like I’ve not upgraded my PostgreSQL for three years now.

But, now I can generate those screen shots. :slight_smile:

This is part of my overview dashboard. It’s useful when I need to debug stuff but most of the time I just let it email me with problems.

fenrir is my Synology NAS. charybdis is my opnSense firewall running FreeBSD. The rest (off screen) are various Linux machined and VMs.

Now that I have 6 working I need to look so see what’s changed and what’s new and use what’s helpful.

For me the most useful warnings are when swap gets low, machine no longer responds to ping (or ping is slow) and the systemd service monitoring.

2 Likes

what kind of Agent do you use with your Synology NAS? DSM7?

DSM7 with SNMP enabled. Zabbix uses SNMP to pull its information but I’ve only applied the SAN template. DSM7 has it’s own email and reporting to handle most cases so I don’t need much info in zabbix.

All the rest are using zabbix-agent2 to feed into zabbix.

For the curious here is my zabbix ansible playbook which handles both deploying the server and the agents based on a variable you pass to it. It doesn’t deploy the database because I handle that separately since I have four services that depend on it (Vaultwarden, Librephotos, Nextcloud, and Zabbix).

---
# tasks file for roles/zabbix

- name: Create the zabbix user and group
  include_role:
    name: create-user
  vars:
    uid: "{{ zabbix_uid }}"
    gid: "{{ zabbix_gid }}"
    user_name: zabbix
    create_home: False
    service: zabbix

- name: Check if docker group exists
  shell: /usr/bin/getent group | awk -F":" '{print $1}'
  register: etc_groups
  changed_when: False

- name: Add secondary Groups to zabbix user
  user:
    name: zabbix
    groups: docker
    append: yes
  become: true
  when: '"docker" in etc_groups.stdout_lines'

- block:
  - name: Create the folders for logging and settings
    file:
      path: "{{ item }}"
      state: directory
      owner: zabbix
      group: zabbix
      mode: u=rwx,g=rwx,o=rx
      recurse: yes
    loop:
      - "{{ zabbix_home }}"
      - "{{ zabbix_home }}/alertscripts"
      - "{{ zabbix_home }}/externalscripts"
      - "{{ zabbix_home }}/modules"
      - "{{ zabbix_home }}/enc"
      - "{{ zabbix_home }}/ssh_keys"
      - "{{ zabbix_home }}/ssl/certs"
      - "{{ zabbix_home }}/ssl/keys"
      - "{{ zabbix_home }}/ssl/ssl_ca"
      - "{{ zabbix_home }}/snmptraps"
      - "{{ zabbix_home }}/mibs"
      - "{{ zabbix_home }}/export"
    become: true

  # Create database and user
  - name: Install psycopg2
    pip:
      name: psycopg2-binary
    become: True

  - name: Create postgres database for Zabbix
    postgresql_db:
      login_host: "{{ postgresql_host }}"
      login_password: "{{ postgresql_password }}"
      login_user: "{{ postgresql_user }}"
      name: "{{ zabbix_db_name }}"

  - name: Create zabbix user for zabbix database
    postgresql_user:
      db: "{{ zabbix_db_name }}"
      login_host: "{{ postgresql_host }}"
      login_password: "{{ postgresql_password }}"
      login_user: "{{ postgresql_user }}"
      name: "{{ zabbix_db_user }}"
      password: "{{ zabbix_db_password }}"
      priv: ALL

  - name: Add mosquitto external script
    template:
      dest: "{{ zabbix_home }}/externalscripts/mosquitto"
      mode: u=rwx,g=rx,o=rx
      src: mosquitto.j2
      force: true
    become: True

  # TODO build the server container to include mosquitto_sub

  # Server: config variables https://hub.docker.com/r/zabbix/zabbix-server-pgsql/
  - name: Pull/update the zabbix server docker image
    docker_container:
      detach: True
      env:
        DB_SERVER_HOST: "{{ postgresql_ip }}"
        POSTGRES_USER: "{{ zabbix_db_user }}"
        POSTGRES_PASSWORD: "{{ zabbix_db_password }}"
        POSTGRES_DB: "{{ zabbix_db_name }}"
        ZBX_STARTVMWARECOLLECTORS: "5"
        ZBX_STARTDISCOVERERS: "5"
        ZBX_HISTORYCACHESIZE: "128M"
        ZBX_HISTORYINDEXCACHESIZE: "4M"
        ZBX_HOUSEKEEPINGFREQUENCY: "1"
        ZBX_MAXHOUSEKEEPERDELETE: "5000"
  #      ZBX_LOADMODULE: dummy1.so,dummy2.so # modules are in /var/lib/zabbix/modules
  #      ZBX_DEBUGLEVEL: 3 # 0-5, 5 is TRACE
  #      ZBX_TIMEOUT: 4 # timeout for processing checks
  #      ZBX_JAVAGATEWAY_ENABLE: false
      hostname: "{{ ansible_fqdn }}"
      image: zabbix/zabbix-server-pgsql:latest
      log_driver: "{{ docker_log_driver }}"
      name: zabbix-server
      published_ports:
        - "10051:10051"
      pull: True
      restart: False
      restart_policy: always
      volumes:
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
        - "{{ zabbix_home }}/alertscripts:/usr/lib/zabbix/alertscripts"
        - "{{ zabbix_home }}/externalscripts:/usr/lib/zabbix/externalscripts"
        - "{{ zabbix_home }}/modules:/var/lib/zabbix/modules"
        - "{{ zabbix_home }}/enc:/var/lib/zabbix/enc"
        - "{{ zabbix_home }}/ssh_keys:/var/lib/zabbix/ssh_keys"
        - "{{ zabbix_home }}/ssl/certs:/var/lib/zabbix/ssl/certs"
        - "{{ zabbix_home }}/ssl/keys:/var/lib/zabbix/ssl/keys"
        - "{{ zabbix_home }}/ssl/ssl_ca:/var/lib/zabbix/ssl/ssl_ca"
        - "{{ zabbix_home }}/snmptraps:/var/lib/zabbix/snmptraps"
        - "{{ zabbix_home }}/mibs:/var/lib/zabbix/mibs"
        - "{{ zabbix_home }}/export:/var/lib/zabbix/export"

  # Web interface
  - name: Pull/update the zabbix web docker image
    docker_container:
      detach: True
      env:
        ZBX_SERVER_HOST: "{{ zabbix_server_ip }}"
        DB_SERVER_HOST: "{{ postgresql_ip }}"
        POSTGRES_USER: "{{ zabbix_db_user }}"
        POSTGRES_PASSWORD: "{{ zabbix_db_password }}"
        POSTGRES_DB: "{{ zabbix_db_name }}"
  #      ZBX_HISTORYSTORAGEURL: elasticsearch config
  #      ZBX_HISTORYSTORAGETYPES: elasticsearch config
        PHP_TZ: America/Denver
        ZBX_SERVER_NAME: Zabbix
      hostname: "{{ ansible_fqdn }}"
      image: zabbix/zabbix-web-nginx-pgsql:latest
      log_driver: "{{ docker_log_driver }}"
      name: zabbix-web
      published_ports:
        - "9090:8080"
      pull: True
      restart: False
      restart_policy: always
      volumes:
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
  when: zabbix_server

- name: debug group and hostname
  debug:
    msg: "{{ group_names }} {{ inventory_hostname }}"

- name: debug Is it an RPi?
  debug:
    msg: "RPi"
  when: ('pis' in group_names) and (not 'muninn' in inventory_hostname)

- name: debug Is it Ubuntu?
  debug:
    msg: "Ubuntu"
  when: (not 'pis' in group_names) and (not 'muninn' in inventory_hostname)

- name: Add zabbix repo muninn
  debug:
    msg: "Raspberry Pi OS 64-bit"
  when: ('muninn' in inventory_hostname)

# TODO broken Make this more dynamic and work
- name: Add zabbix repo ubuntu
  apt:
    deb: https://repo.zabbix.com/zabbix/5.4/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.4-1%2Bubuntu20.04_all.deb
  when: (not 'pis' in group_names) and (not 'muninn' in inventory_hostname)
  become: true

- name: Add zabbix repo rpi
  apt:
    deb: https://repo.zabbix.com/zabbix/5.4/raspbian/pool/main/z/zabbix-release/zabbix-release_5.4-1+debian10_all.deb
  when: ('pis' in group_names) and (not 'muninn' in inventory_hostname)
  become: true

- name: Add zabbix repo 64-bit
  apt:
    deb: https://repo.zabbix.com/zabbix/5.0/ubuntu-arm64/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bubuntu20.04_all.deb
  when: ('muninn' in inventory_hostname)
  become: true

- name : Install the zabbix-agent2
  apt:
    name: zabbix-agent2
    update_cache: yes
  become: true

- name: Install the config
  template:
    dest: /etc/zabbix/zabbix_agent2.d/custom-zabbix.conf
    mode: u=rw,g=r,o=r
    src: custom-zabbix.j2
    force: true
  become: True
  notify:
    - Restart the zabbix agent to pick up config

- name: Allow zabbix to have passwordless sudo
  lineinfile:
    dest: /etc/sudoers
    state: present
    regexp: '^zabbix'
    line: 'zabbix ALL=(ALL) NOPASSWD: ALL'
    validate: 'visudo -cf %s'
  become: True

- name: Add cron job to create log directory on reboot for RPis
  block:
    - name: Add the cron shell variable
      cron:
        name: SHELL
        env: true
        job: /bin/bash

    - name: Add cron job to create the directory at boot
      cron:
        name: "Zabbix log directory"
        job: mkdir /var/log/zabbix && chown zabbix:zabbix /var/log/zabbix
        special_time: reboot
  become: True
  when: "'pis' in group_names"

The machine muninn is handled a little differently because it’s running Raspberry Pi OS 64 while the rest are running 32-bit which means they need to pull the agent from different apt repos. The rest of my machines are Ubuntu VMs. opnSense has it’s own way to install zabbix agent and the synology is just a configuration setting.

That mosquitto external script is just something I played with and I can’t remember if I even got it to work properly. In the end all I really cared about was whether or not it was running.

#!/bin/bash

docker exec mosquitto mosquitto_sub -i zabbix -C 1 -u {{ default_user }} -P {{ default_pass }} -t "$1"

The custom agent config file template is just

Hostname={{ ansible_hostname }}
LogFileSize=1024
Server={{ zabbix_conn_str }}
ServerActive={{ zabbix_conn_str }}
1 Like

That can be done with powershell. That is a most “easy-to-install” solution and can run agentless. The “not-to-complex-to-use” part, is only depending on your powershell and other scripting skills.

(but, at this point, I don’t know how powershell can/will interact with Docker containers as I do not have ample knowledge of Docker)

The installation of Zabbix <6 was a real pain in the …, which brought me over the edge of my know-how and i abandend it last year. Zabbix 6 was easy-peasy as the installation script and docu was rewritten.

What I don’t get presently is how to run zabbix or netdata within my Synology Docker, which isn’t as convenient as to “drop the provided docker-script and it’ll run”, but I have to insert the volumes and env-variables and somehow I don’t get this running…