Monitoring solution for openHAB et al

You could try netdata. By using the provided docker image it is close to zero configuration with an impressive amount of monitoring information out of the box.

3 Likes

First and foremost check a process watchdog to automatically start back process when it failed. Other things such as registering failure are still relevant, but first part of the problem is automatic recovery as much as it is possible. look at “systemd process watchdog” and automatic restart options for docker.
Best, Łukasz

2 Likes

You can configure openHAB to restart automatically on OOM errors… This is what I have in place. Nowadays I do not have any OOM errors but it is good the protection just in case there would be some regressions.

In addition I have configured raspberry pi watchdog module with max cpu and memory on case something else is taking the memory or somehow gets locked out.

The whole system is read-only so restart will restore things into "known good state ".

The java parameter for exiting on OOM errors is
-XX:+ExitOnOutOfMemoryError

You need to configure systemd to restart the service on failures, if it is not the default.

I have switched to AdoptOpenJDK due to issues experienced with zulu java Syslog Errors (100 GB) - #38 by ssalonen

Otherwise this setup has been stable for me for the last couple of years, since openHAB1

zulu java (

2 Likes

All right. Just a heads-up: got zabbix working and can now check vital signs of machines running the “Zabbix agent”. Only thing is, I didn’t yet figure out how to “copy” items and triggers, so I don’t have to replicate those for each Thing, I’d like to monitor in case it goes OFFLINE or any other state except ONLINE…

Tried to, but I guess, I’m just to dumb to figure out, how those docker options for my Synology DSM7 come together, did not manage to start one single docker container…

The triggers and items come through the templates you apply. So, for example, most of the standard health and status for a given Linux machine would come by applying the Linux template. A periodic ping I think comes from an ICMP template. Watching the running services comes through a systemd template.

If the built in templates are not sufficient, you can create your own templates. That is probably the proper way to do this. I’ve been happy with the built in templates so I’ve only done a very little bit of looking into what it takes to create one.

Note: Upgrading PostgreSQL between major versions is a huge pain. :frowning:

Yes, I found the templates already, thank you!
But my Use case is something like setting up an item for REST-API Requests for multiple Things states. So I can copy them and only edit the name of the thing, but then again, that’s only a handful…

I bet if you figure out how to create a template that discovers all the Things in OH, lots of people would be happy and use it. :wink: I know something like that should be possible based on how templates like systemd and MQTT work.

haha! #noPressureBut:wink:
i don’t think, that I could find the time for it, as I only have two “vital” things to check - and my bet is, if either one of those are OFFLINE or the Zabbix Agent finds an alarming/desaterous item, I should have to have a look…

but yeah, what I CAN do is to make a short HowTo afterwards on how I approached it. It’s not only the ONLINE-things, but also if persistence works, or if there’s a bunch of errors in the log, …

I manage a number of OH systems and needed remote access and to know if any were down. I looked at some network monitoring services but they were all way more than I needed and/or cost$$. Then I realized I could probably do it with openHAB! So:

For remote access we use Tailscale.

We have an office system which I installed the Network Binding on and then a “Pingable Network Device” through that with an Item linked to the “Online” channel for each system. The ip address for each system comes from the ip in TailScale. All of those Items are part of the Ping group and if any go down I get an email. I can then go to TailScale and see which one is down.

Finally I did the same install on my home system which lets me know if the office system goes down. Works great.

I can post instructions if anyone is interested.

Phew. That was more of a pain than I expected. It just goes to show my mantra, the longer between upgrades the harder the upgrade will be.

Note to self: When using Ansible and Docker to deploys stuff, make sure to actually pull the latest image during the upgrades. Simply restarting the container won’t do anything for you. It looks like I’ve not upgraded my PostgreSQL for three years now.

But, now I can generate those screen shots. :slight_smile:

This is part of my overview dashboard. It’s useful when I need to debug stuff but most of the time I just let it email me with problems.

fenrir is my Synology NAS. charybdis is my opnSense firewall running FreeBSD. The rest (off screen) are various Linux machined and VMs.

Now that I have 6 working I need to look so see what’s changed and what’s new and use what’s helpful.

For me the most useful warnings are when swap gets low, machine no longer responds to ping (or ping is slow) and the systemd service monitoring.

2 Likes

what kind of Agent do you use with your Synology NAS? DSM7?

DSM7 with SNMP enabled. Zabbix uses SNMP to pull its information but I’ve only applied the SAN template. DSM7 has it’s own email and reporting to handle most cases so I don’t need much info in zabbix.

All the rest are using zabbix-agent2 to feed into zabbix.

For the curious here is my zabbix ansible playbook which handles both deploying the server and the agents based on a variable you pass to it. It doesn’t deploy the database because I handle that separately since I have four services that depend on it (Vaultwarden, Librephotos, Nextcloud, and Zabbix).

---
# tasks file for roles/zabbix

- name: Create the zabbix user and group
  include_role:
    name: create-user
  vars:
    uid: "{{ zabbix_uid }}"
    gid: "{{ zabbix_gid }}"
    user_name: zabbix
    create_home: False
    service: zabbix

- name: Check if docker group exists
  shell: /usr/bin/getent group | awk -F":" '{print $1}'
  register: etc_groups
  changed_when: False

- name: Add secondary Groups to zabbix user
  user:
    name: zabbix
    groups: docker
    append: yes
  become: true
  when: '"docker" in etc_groups.stdout_lines'

- block:
  - name: Create the folders for logging and settings
    file:
      path: "{{ item }}"
      state: directory
      owner: zabbix
      group: zabbix
      mode: u=rwx,g=rwx,o=rx
      recurse: yes
    loop:
      - "{{ zabbix_home }}"
      - "{{ zabbix_home }}/alertscripts"
      - "{{ zabbix_home }}/externalscripts"
      - "{{ zabbix_home }}/modules"
      - "{{ zabbix_home }}/enc"
      - "{{ zabbix_home }}/ssh_keys"
      - "{{ zabbix_home }}/ssl/certs"
      - "{{ zabbix_home }}/ssl/keys"
      - "{{ zabbix_home }}/ssl/ssl_ca"
      - "{{ zabbix_home }}/snmptraps"
      - "{{ zabbix_home }}/mibs"
      - "{{ zabbix_home }}/export"
    become: true

  # Create database and user
  - name: Install psycopg2
    pip:
      name: psycopg2-binary
    become: True

  - name: Create postgres database for Zabbix
    postgresql_db:
      login_host: "{{ postgresql_host }}"
      login_password: "{{ postgresql_password }}"
      login_user: "{{ postgresql_user }}"
      name: "{{ zabbix_db_name }}"

  - name: Create zabbix user for zabbix database
    postgresql_user:
      db: "{{ zabbix_db_name }}"
      login_host: "{{ postgresql_host }}"
      login_password: "{{ postgresql_password }}"
      login_user: "{{ postgresql_user }}"
      name: "{{ zabbix_db_user }}"
      password: "{{ zabbix_db_password }}"
      priv: ALL

  - name: Add mosquitto external script
    template:
      dest: "{{ zabbix_home }}/externalscripts/mosquitto"
      mode: u=rwx,g=rx,o=rx
      src: mosquitto.j2
      force: true
    become: True

  # TODO build the server container to include mosquitto_sub

  # Server: config variables https://hub.docker.com/r/zabbix/zabbix-server-pgsql/
  - name: Pull/update the zabbix server docker image
    docker_container:
      detach: True
      env:
        DB_SERVER_HOST: "{{ postgresql_ip }}"
        POSTGRES_USER: "{{ zabbix_db_user }}"
        POSTGRES_PASSWORD: "{{ zabbix_db_password }}"
        POSTGRES_DB: "{{ zabbix_db_name }}"
        ZBX_STARTVMWARECOLLECTORS: "5"
        ZBX_STARTDISCOVERERS: "5"
        ZBX_HISTORYCACHESIZE: "128M"
        ZBX_HISTORYINDEXCACHESIZE: "4M"
        ZBX_HOUSEKEEPINGFREQUENCY: "1"
        ZBX_MAXHOUSEKEEPERDELETE: "5000"
  #      ZBX_LOADMODULE: dummy1.so,dummy2.so # modules are in /var/lib/zabbix/modules
  #      ZBX_DEBUGLEVEL: 3 # 0-5, 5 is TRACE
  #      ZBX_TIMEOUT: 4 # timeout for processing checks
  #      ZBX_JAVAGATEWAY_ENABLE: false
      hostname: "{{ ansible_fqdn }}"
      image: zabbix/zabbix-server-pgsql:latest
      log_driver: "{{ docker_log_driver }}"
      name: zabbix-server
      published_ports:
        - "10051:10051"
      pull: True
      restart: False
      restart_policy: always
      volumes:
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
        - "{{ zabbix_home }}/alertscripts:/usr/lib/zabbix/alertscripts"
        - "{{ zabbix_home }}/externalscripts:/usr/lib/zabbix/externalscripts"
        - "{{ zabbix_home }}/modules:/var/lib/zabbix/modules"
        - "{{ zabbix_home }}/enc:/var/lib/zabbix/enc"
        - "{{ zabbix_home }}/ssh_keys:/var/lib/zabbix/ssh_keys"
        - "{{ zabbix_home }}/ssl/certs:/var/lib/zabbix/ssl/certs"
        - "{{ zabbix_home }}/ssl/keys:/var/lib/zabbix/ssl/keys"
        - "{{ zabbix_home }}/ssl/ssl_ca:/var/lib/zabbix/ssl/ssl_ca"
        - "{{ zabbix_home }}/snmptraps:/var/lib/zabbix/snmptraps"
        - "{{ zabbix_home }}/mibs:/var/lib/zabbix/mibs"
        - "{{ zabbix_home }}/export:/var/lib/zabbix/export"

  # Web interface
  - name: Pull/update the zabbix web docker image
    docker_container:
      detach: True
      env:
        ZBX_SERVER_HOST: "{{ zabbix_server_ip }}"
        DB_SERVER_HOST: "{{ postgresql_ip }}"
        POSTGRES_USER: "{{ zabbix_db_user }}"
        POSTGRES_PASSWORD: "{{ zabbix_db_password }}"
        POSTGRES_DB: "{{ zabbix_db_name }}"
  #      ZBX_HISTORYSTORAGEURL: elasticsearch config
  #      ZBX_HISTORYSTORAGETYPES: elasticsearch config
        PHP_TZ: America/Denver
        ZBX_SERVER_NAME: Zabbix
      hostname: "{{ ansible_fqdn }}"
      image: zabbix/zabbix-web-nginx-pgsql:latest
      log_driver: "{{ docker_log_driver }}"
      name: zabbix-web
      published_ports:
        - "9090:8080"
      pull: True
      restart: False
      restart_policy: always
      volumes:
        - /etc/localtime:/etc/localtime:ro
        - /etc/timezone:/etc/timezone:ro
  when: zabbix_server

- name: debug group and hostname
  debug:
    msg: "{{ group_names }} {{ inventory_hostname }}"

- name: debug Is it an RPi?
  debug:
    msg: "RPi"
  when: ('pis' in group_names) and (not 'muninn' in inventory_hostname)

- name: debug Is it Ubuntu?
  debug:
    msg: "Ubuntu"
  when: (not 'pis' in group_names) and (not 'muninn' in inventory_hostname)

- name: Add zabbix repo muninn
  debug:
    msg: "Raspberry Pi OS 64-bit"
  when: ('muninn' in inventory_hostname)

# TODO broken Make this more dynamic and work
- name: Add zabbix repo ubuntu
  apt:
    deb: https://repo.zabbix.com/zabbix/5.4/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.4-1%2Bubuntu20.04_all.deb
  when: (not 'pis' in group_names) and (not 'muninn' in inventory_hostname)
  become: true

- name: Add zabbix repo rpi
  apt:
    deb: https://repo.zabbix.com/zabbix/5.4/raspbian/pool/main/z/zabbix-release/zabbix-release_5.4-1+debian10_all.deb
  when: ('pis' in group_names) and (not 'muninn' in inventory_hostname)
  become: true

- name: Add zabbix repo 64-bit
  apt:
    deb: https://repo.zabbix.com/zabbix/5.0/ubuntu-arm64/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bubuntu20.04_all.deb
  when: ('muninn' in inventory_hostname)
  become: true

- name : Install the zabbix-agent2
  apt:
    name: zabbix-agent2
    update_cache: yes
  become: true

- name: Install the config
  template:
    dest: /etc/zabbix/zabbix_agent2.d/custom-zabbix.conf
    mode: u=rw,g=r,o=r
    src: custom-zabbix.j2
    force: true
  become: True
  notify:
    - Restart the zabbix agent to pick up config

- name: Allow zabbix to have passwordless sudo
  lineinfile:
    dest: /etc/sudoers
    state: present
    regexp: '^zabbix'
    line: 'zabbix ALL=(ALL) NOPASSWD: ALL'
    validate: 'visudo -cf %s'
  become: True

- name: Add cron job to create log directory on reboot for RPis
  block:
    - name: Add the cron shell variable
      cron:
        name: SHELL
        env: true
        job: /bin/bash

    - name: Add cron job to create the directory at boot
      cron:
        name: "Zabbix log directory"
        job: mkdir /var/log/zabbix && chown zabbix:zabbix /var/log/zabbix
        special_time: reboot
  become: True
  when: "'pis' in group_names"

The machine muninn is handled a little differently because it’s running Raspberry Pi OS 64 while the rest are running 32-bit which means they need to pull the agent from different apt repos. The rest of my machines are Ubuntu VMs. opnSense has it’s own way to install zabbix agent and the synology is just a configuration setting.

That mosquitto external script is just something I played with and I can’t remember if I even got it to work properly. In the end all I really cared about was whether or not it was running.

#!/bin/bash

docker exec mosquitto mosquitto_sub -i zabbix -C 1 -u {{ default_user }} -P {{ default_pass }} -t "$1"

The custom agent config file template is just

Hostname={{ ansible_hostname }}
LogFileSize=1024
Server={{ zabbix_conn_str }}
ServerActive={{ zabbix_conn_str }}
1 Like

That can be done with powershell. That is a most “easy-to-install” solution and can run agentless. The “not-to-complex-to-use” part, is only depending on your powershell and other scripting skills.

(but, at this point, I don’t know how powershell can/will interact with Docker containers as I do not have ample knowledge of Docker)

The installation of Zabbix <6 was a real pain in the …, which brought me over the edge of my know-how and i abandend it last year. Zabbix 6 was easy-peasy as the installation script and docu was rewritten.

What I don’t get presently is how to run zabbix or netdata within my Synology Docker, which isn’t as convenient as to “drop the provided docker-script and it’ll run”, but I have to insert the volumes and env-variables and somehow I don’t get this running…

To provide a bit more details about how I got the Zabbix to see my Synology DSM7 (I’m not running server or web on Synology so this is monitoring only.

  1. On Synology under Control Panel → Terminal & SNMP → SNMP

  2. I never could get it to work with SNMP 3 so I ended up configuring it to use v2. I’ll probably go back and try again someday.

  1. On Zabbix go to Configuration → Hosts → Create Host. Fill out the relevant info for the host.

  2. Add an SNMP interface and populate as follows (using your hostnames/IP addresses.

  1. Now click on “Macros” and add the SNMP_COMMUNITY variable we used above, making it match the community name you configured on the Synology in step 2.

I can’t remember if I had to wait awhile before the data started to flow.

1 Like

To add to the conversation…

I never looked into much details about how you can generate graphs with openHAB, but since I already had a Zabbix instance before using openHAB, I thought I could link the two together.

If you guys are interested I have a script that reads values from MQTT and transfer them back to Zabbix. So I can use all the cool graph stuff from Zabbix.

Although I suspect OH 3 is better at that now, I didn’t find a quick way to generate nice graph with OH 2.4 back in the days :stuck_out_tongue:

Edit: oh I just noticed MQTT is now supported natively since 6.0 LTS. That’s my script being useless I guess :laughing:

within OH3, you can do this either “on the fly” with just a click in the “items”-Section or you simply click some charts on a “page” - just like in zabbix.
e.g. here within the “items”:

1 Like

I have described my monitoring setup here:

1 Like

Please post the instructions. I would like to see how you did it.

Thanks

I just updated these instructions for OH3 and have not tested it yet but it should be good. I’m heading out of town for a week and I didn’t want to wait until I was back to post. Note that if you change the blockly script the email command inside of it goes away so you need to reinsert it. It’s this part:

        var Actions = Java.type("org.openhab.core.model.script.actions.Things");

        var mailActions = Actions.getActions("mail","mail:smtp:SMTP_Server");

        mailActions.sendHtmlMail("email@youremail.com", "Alert – openHAB System Offline", "An openHAB system has gone offline. Please check Tailscale to identify offline system.");

Also, you’ll need to change the name “SMTP_Server” to whatever the ID of your SMTP server is. The rule:


Alert – OH System Down

This rule will send an email if an openHAB system is offline for more than 3 hours.

Required: Tailscale installed on system to be monitored, another OH system to monitor IP addresses with Network Binding and Mail Binding installed, and a group (Item) for the OH systems you want to monitor with the aggregation function set to “All ON then ON else OFF”.

Settings > Things > + > Network Binding > Pingable Network Device > ID = Ping_Customername# > Label = Ping – Customername# > Hostname or IP = ip from Tailscale (without :8080) > Refresh Interval = 300000 > Retry = 6 > Create Thing

Settings > Things > Thing just created > Channels > Online (switch) > Add Link To Item > Create a new Item > Name = Ping_Customername# > Label = Ping – Customername# > Parent Group = OH Systems > Close > Link > Save

Rules > New Rule > ID = Alert_System_Offline > Name = Alert – System Offline

Code > Replace existing code with code below:


configuration: {}
triggers:
  - id: "1"
    configuration:
      groupName: OH_Systems
    type: core.GroupStateChangeTrigger
conditions: []
actions:
  - inputs: {}
    id: "2"
    configuration:
      blockSource: <xml xmlns="https://developers.google.com/blockly/xml"><block
        type="oh_log" id="3ntY/cqP4OEGlO7ekWty" x="-2016" y="20"><field
        name="severity">info</field><value name="message"><shadow type="text"
        id="@OcaG9FOaYg;%cr@X4`X"><field name="TEXT">OH System Offline Rule
        Started</field></shadow></value><next><block type="controls_if"
        id="4@U|GA#I9vjU{n3s.~8R"><mutation elseif="1"></mutation><value
        name="IF0"><block type="logic_compare" id="R{MRikkH@)iW/;AFZj[;"><field
        name="OP">EQ</field><value name="A"><block type="oh_getitem_state"
        id="G)zY2,`/ZYvEBN]~{%rM"><value name="itemName"><shadow type="oh_item"
        id="GaA70K1x29}AXBTTI{-{"><field
        name="itemName">OH_Systems</field></shadow></value></block></value><value
        name="B"><block type="text" id="H=Fzh6lL;=|~mCoqAwBd"><field
        name="TEXT">OFF</field></block></value></block></value><statement
        name="DO0"><block type="oh_timer" id="NdTJ2fW$1~t6gvsjW|mp"><field
        name="delayUnits">plusHours</field><value name="delay"><shadow
        type="math_number" id="otI%iP,VT#I=+[%M8=$v"><field
        name="NUM">3</field></shadow></value><value name="timerName"><shadow
        type="text" id="hry6.e7COO#4Yrlw%PNr"><field name="TEXT">OH System
        Offline Timer</field></shadow></value><statement name="timerCode"><block
        type="oh_log" id="{k@Er3wG8e6$n.*+A_g["><field
        name="severity">info</field><value name="message"><shadow type="text"
        id="C`ZX6$9jB.8rJ+#GpmIk"><field name="TEXT">Email Sent - OH System
        Offline</field></shadow></value></block></statement></block></statement><value
        name="IF1"><block type="logic_compare" id="`0xmVOp9Yi!jHAVTY~nr"><field
        name="OP">EQ</field><value name="A"><block type="oh_getitem_state"
        id="gn_P~zcF0r4[%lu*XxNO"><value name="itemName"><shadow type="oh_item"
        id="rB6;`FJ6$r67tZ2u0Y|0"><field
        name="itemName">OH_Systems</field></shadow></value></block></value><value
        name="B"><block type="text" id="Vv,,Sxe#prf8)FVHzium"><field
        name="TEXT">ON</field></block></value></block></value><statement
        name="DO1"><block type="oh_timer_cancel"
        id="o(tjF3t^a{zsOe8!gwom"><value name="timerName"><shadow type="text"
        id="N?I[ze3evs@))+@xy@h{"><field name="TEXT">OH System Offline
        Timer</field></shadow></value></block></statement></block></next></block></xml>
      type: application/javascript
      script: >
        var logger =
        Java.type('org.slf4j.LoggerFactory').getLogger('org.openhab.rule.' +
        ctx.ruleUID);


        var scriptExecution = Java.type('org.openhab.core.model.script.actions.ScriptExecution');


        var zdt = Java.type('java.time.ZonedDateTime');


        if (typeof this.timers === 'undefined') {
          this.timers = [];
        }



        logger.info('OH System Offline Rule Started');

        var Actions = Java.type("org.openhab.core.model.script.actions.Things");

        var mailActions = Actions.getActions("mail","mail:smtp:SMTP_Server");

        mailActions.sendHtmlMail("email@youremail.com", "Alert – openHAB System Offline", "An openHAB system has gone offline. Please check Tailscale to identify offline system.");

        if (itemRegistry.getItem('OH_Systems').getState() == 'OFF') {
          if (typeof this.timers['OH System Offline Timer'] === 'undefined' || this.timers['OH System Offline Timer'].hasTerminated()) {
            this.timers['OH System Offline Timer'] = scriptExecution.createTimer(zdt.now().plusHours(3), function () {
              logger.info('Email Sent - OH System Offline');
              })
          }
        } else if (itemRegistry.getItem('OH_Systems').getState() == 'ON') {
          if (typeof this.timers['OH System Offline Timer'] !== 'undefined') {
            this.timers['OH System Offline Timer'].cancel();
            this.timers['OH System Offline Timer'] = undefined;
          }
        }
    type: script.ScriptAction

Save > Save