I guess, this is not only openHAB or in any way specific to openHAB.
My use case is to have a monitoring system up and running and have also (not exclusively) openHAB monitored. Why this? The last week i had two “out of memory”-crashes for my OH3-setup (found no time yet to investigate, which is causing this), and I’d like to monitor other services in my smarthome (router, DNS-server, etc.) also with this.
So I’m thinking of using something like a SNMP-based monitoring tool to achieve this. are there any hints and tipps out there? I have a brand new DS720+ waiting for a new docker image or I have a bunch of Raspberry Pis laying around to be used for this purpose:
central monitoring
checking live-status of some appliances (TCP-Ping)
going into deeper monitoring like “available ram” on a machine like my Pi4 running OH3
sending emails or push notifications, if something is not as it should
“easy-to-install” - “not-to-complex-to-use”: e.g. having a more or less decent installation and configuration of monitoring
Something like LibreNMS, icinga2, zabbix, …? What’s a best fit for a geeky nerd like me?
Thanks for your help!
I already thought of setting up a zabbix VM for monitoring several services in my network.
It is only the time missing for this project to become reality
I went down the Zabbix route and I’m pretty happy with it. But setting it up and configuring it is about as complicated as openHAB itself. But I get email alerts when stuff goes down and changes unexpectedly which is what I really wanted.
I’d post a screen shot but apparently I’ve got some Zabbix problems after yesterday’s upgrade.
[Edit] OK, the problem was I thought that I was keeping up with PostgreSQL upgrades but I wasn’t. I’m running 11.4 and Zabbix now requires 13. So it’s not a Zabbix problem.
Telegraph/InfluxDB/Graphana will probably look better but it looked like it requires even more fiddling and configuration.
openHAB has an add-on that supports Prometheus if you want detailed info about OH itself.
ok now, I just found out, that Zabbix 6.0 LTS is out since a few days - and it’s got a WAYYYY better installation routine than last time I tried. So I now have it running on a spare Raspberry Pi and got some cool first achievements with at least getting the vital data from my OH3 RaspberryPi via Zabbix agent and now on for other devices and applications.
Do you have some experience with the “Zabbix Java Gateway”, Rich? Should allow some more insight into Java Applications…? which would help identify problems with OOM-issues a bit faster, I guess…
I looked into it briefly but there really wasn’t anything special I wanted to pull from OH or Guacamole (my only two Java applications, and I don’t even run Guacamole any more) so beyond taking note that it exists I never looked into it closely.
So far I’ve not really suffered from OOM problems personally so never had a need to monitor it. But I can somewhat monitor it by proxy by getting an alert from Zabbix when the swap starts to be used.
You could try netdata. By using the provided docker image it is close to zero configuration with an impressive amount of monitoring information out of the box.
First and foremost check a process watchdog to automatically start back process when it failed. Other things such as registering failure are still relevant, but first part of the problem is automatic recovery as much as it is possible. look at “systemd process watchdog” and automatic restart options for docker.
Best, Łukasz
You can configure openHAB to restart automatically on OOM errors… This is what I have in place. Nowadays I do not have any OOM errors but it is good the protection just in case there would be some regressions.
In addition I have configured raspberry pi watchdog module with max cpu and memory on case something else is taking the memory or somehow gets locked out.
The whole system is read-only so restart will restore things into "known good state ".
The java parameter for exiting on OOM errors is
-XX:+ExitOnOutOfMemoryError
You need to configure systemd to restart the service on failures, if it is not the default.
All right. Just a heads-up: got zabbix working and can now check vital signs of machines running the “Zabbix agent”. Only thing is, I didn’t yet figure out how to “copy” items and triggers, so I don’t have to replicate those for each Thing, I’d like to monitor in case it goes OFFLINE or any other state except ONLINE…
Tried to, but I guess, I’m just to dumb to figure out, how those docker options for my Synology DSM7 come together, did not manage to start one single docker container…
The triggers and items come through the templates you apply. So, for example, most of the standard health and status for a given Linux machine would come by applying the Linux template. A periodic ping I think comes from an ICMP template. Watching the running services comes through a systemd template.
If the built in templates are not sufficient, you can create your own templates. That is probably the proper way to do this. I’ve been happy with the built in templates so I’ve only done a very little bit of looking into what it takes to create one.
Note: Upgrading PostgreSQL between major versions is a huge pain.
Yes, I found the templates already, thank you!
But my Use case is something like setting up an item for REST-API Requests for multiple Things states. So I can copy them and only edit the name of the thing, but then again, that’s only a handful…
I bet if you figure out how to create a template that discovers all the Things in OH, lots of people would be happy and use it. I know something like that should be possible based on how templates like systemd and MQTT work.
haha! #noPressureBut…
i don’t think, that I could find the time for it, as I only have two “vital” things to check - and my bet is, if either one of those are OFFLINE or the Zabbix Agent finds an alarming/desaterous item, I should have to have a look…
but yeah, what I CAN do is to make a short HowTo afterwards on how I approached it. It’s not only the ONLINE-things, but also if persistence works, or if there’s a bunch of errors in the log, …
I manage a number of OH systems and needed remote access and to know if any were down. I looked at some network monitoring services but they were all way more than I needed and/or cost$$. Then I realized I could probably do it with openHAB! So:
For remote access we use Tailscale.
We have an office system which I installed the Network Binding on and then a “Pingable Network Device” through that with an Item linked to the “Online” channel for each system. The ip address for each system comes from the ip in TailScale. All of those Items are part of the Ping group and if any go down I get an email. I can then go to TailScale and see which one is down.
Finally I did the same install on my home system which lets me know if the office system goes down. Works great.
Phew. That was more of a pain than I expected. It just goes to show my mantra, the longer between upgrades the harder the upgrade will be.
Note to self: When using Ansible and Docker to deploys stuff, make sure to actually pull the latest image during the upgrades. Simply restarting the container won’t do anything for you. It looks like I’ve not upgraded my PostgreSQL for three years now.
DSM7 with SNMP enabled. Zabbix uses SNMP to pull its information but I’ve only applied the SAN template. DSM7 has it’s own email and reporting to handle most cases so I don’t need much info in zabbix.
All the rest are using zabbix-agent2 to feed into zabbix.
For the curious here is my zabbix ansible playbook which handles both deploying the server and the agents based on a variable you pass to it. It doesn’t deploy the database because I handle that separately since I have four services that depend on it (Vaultwarden, Librephotos, Nextcloud, and Zabbix).
---
# tasks file for roles/zabbix
- name: Create the zabbix user and group
include_role:
name: create-user
vars:
uid: "{{ zabbix_uid }}"
gid: "{{ zabbix_gid }}"
user_name: zabbix
create_home: False
service: zabbix
- name: Check if docker group exists
shell: /usr/bin/getent group | awk -F":" '{print $1}'
register: etc_groups
changed_when: False
- name: Add secondary Groups to zabbix user
user:
name: zabbix
groups: docker
append: yes
become: true
when: '"docker" in etc_groups.stdout_lines'
- block:
- name: Create the folders for logging and settings
file:
path: "{{ item }}"
state: directory
owner: zabbix
group: zabbix
mode: u=rwx,g=rwx,o=rx
recurse: yes
loop:
- "{{ zabbix_home }}"
- "{{ zabbix_home }}/alertscripts"
- "{{ zabbix_home }}/externalscripts"
- "{{ zabbix_home }}/modules"
- "{{ zabbix_home }}/enc"
- "{{ zabbix_home }}/ssh_keys"
- "{{ zabbix_home }}/ssl/certs"
- "{{ zabbix_home }}/ssl/keys"
- "{{ zabbix_home }}/ssl/ssl_ca"
- "{{ zabbix_home }}/snmptraps"
- "{{ zabbix_home }}/mibs"
- "{{ zabbix_home }}/export"
become: true
# Create database and user
- name: Install psycopg2
pip:
name: psycopg2-binary
become: True
- name: Create postgres database for Zabbix
postgresql_db:
login_host: "{{ postgresql_host }}"
login_password: "{{ postgresql_password }}"
login_user: "{{ postgresql_user }}"
name: "{{ zabbix_db_name }}"
- name: Create zabbix user for zabbix database
postgresql_user:
db: "{{ zabbix_db_name }}"
login_host: "{{ postgresql_host }}"
login_password: "{{ postgresql_password }}"
login_user: "{{ postgresql_user }}"
name: "{{ zabbix_db_user }}"
password: "{{ zabbix_db_password }}"
priv: ALL
- name: Add mosquitto external script
template:
dest: "{{ zabbix_home }}/externalscripts/mosquitto"
mode: u=rwx,g=rx,o=rx
src: mosquitto.j2
force: true
become: True
# TODO build the server container to include mosquitto_sub
# Server: config variables https://hub.docker.com/r/zabbix/zabbix-server-pgsql/
- name: Pull/update the zabbix server docker image
docker_container:
detach: True
env:
DB_SERVER_HOST: "{{ postgresql_ip }}"
POSTGRES_USER: "{{ zabbix_db_user }}"
POSTGRES_PASSWORD: "{{ zabbix_db_password }}"
POSTGRES_DB: "{{ zabbix_db_name }}"
ZBX_STARTVMWARECOLLECTORS: "5"
ZBX_STARTDISCOVERERS: "5"
ZBX_HISTORYCACHESIZE: "128M"
ZBX_HISTORYINDEXCACHESIZE: "4M"
ZBX_HOUSEKEEPINGFREQUENCY: "1"
ZBX_MAXHOUSEKEEPERDELETE: "5000"
# ZBX_LOADMODULE: dummy1.so,dummy2.so # modules are in /var/lib/zabbix/modules
# ZBX_DEBUGLEVEL: 3 # 0-5, 5 is TRACE
# ZBX_TIMEOUT: 4 # timeout for processing checks
# ZBX_JAVAGATEWAY_ENABLE: false
hostname: "{{ ansible_fqdn }}"
image: zabbix/zabbix-server-pgsql:latest
log_driver: "{{ docker_log_driver }}"
name: zabbix-server
published_ports:
- "10051:10051"
pull: True
restart: False
restart_policy: always
volumes:
- /etc/localtime:/etc/localtime:ro
- /etc/timezone:/etc/timezone:ro
- "{{ zabbix_home }}/alertscripts:/usr/lib/zabbix/alertscripts"
- "{{ zabbix_home }}/externalscripts:/usr/lib/zabbix/externalscripts"
- "{{ zabbix_home }}/modules:/var/lib/zabbix/modules"
- "{{ zabbix_home }}/enc:/var/lib/zabbix/enc"
- "{{ zabbix_home }}/ssh_keys:/var/lib/zabbix/ssh_keys"
- "{{ zabbix_home }}/ssl/certs:/var/lib/zabbix/ssl/certs"
- "{{ zabbix_home }}/ssl/keys:/var/lib/zabbix/ssl/keys"
- "{{ zabbix_home }}/ssl/ssl_ca:/var/lib/zabbix/ssl/ssl_ca"
- "{{ zabbix_home }}/snmptraps:/var/lib/zabbix/snmptraps"
- "{{ zabbix_home }}/mibs:/var/lib/zabbix/mibs"
- "{{ zabbix_home }}/export:/var/lib/zabbix/export"
# Web interface
- name: Pull/update the zabbix web docker image
docker_container:
detach: True
env:
ZBX_SERVER_HOST: "{{ zabbix_server_ip }}"
DB_SERVER_HOST: "{{ postgresql_ip }}"
POSTGRES_USER: "{{ zabbix_db_user }}"
POSTGRES_PASSWORD: "{{ zabbix_db_password }}"
POSTGRES_DB: "{{ zabbix_db_name }}"
# ZBX_HISTORYSTORAGEURL: elasticsearch config
# ZBX_HISTORYSTORAGETYPES: elasticsearch config
PHP_TZ: America/Denver
ZBX_SERVER_NAME: Zabbix
hostname: "{{ ansible_fqdn }}"
image: zabbix/zabbix-web-nginx-pgsql:latest
log_driver: "{{ docker_log_driver }}"
name: zabbix-web
published_ports:
- "9090:8080"
pull: True
restart: False
restart_policy: always
volumes:
- /etc/localtime:/etc/localtime:ro
- /etc/timezone:/etc/timezone:ro
when: zabbix_server
- name: debug group and hostname
debug:
msg: "{{ group_names }} {{ inventory_hostname }}"
- name: debug Is it an RPi?
debug:
msg: "RPi"
when: ('pis' in group_names) and (not 'muninn' in inventory_hostname)
- name: debug Is it Ubuntu?
debug:
msg: "Ubuntu"
when: (not 'pis' in group_names) and (not 'muninn' in inventory_hostname)
- name: Add zabbix repo muninn
debug:
msg: "Raspberry Pi OS 64-bit"
when: ('muninn' in inventory_hostname)
# TODO broken Make this more dynamic and work
- name: Add zabbix repo ubuntu
apt:
deb: https://repo.zabbix.com/zabbix/5.4/ubuntu/pool/main/z/zabbix-release/zabbix-release_5.4-1%2Bubuntu20.04_all.deb
when: (not 'pis' in group_names) and (not 'muninn' in inventory_hostname)
become: true
- name: Add zabbix repo rpi
apt:
deb: https://repo.zabbix.com/zabbix/5.4/raspbian/pool/main/z/zabbix-release/zabbix-release_5.4-1+debian10_all.deb
when: ('pis' in group_names) and (not 'muninn' in inventory_hostname)
become: true
- name: Add zabbix repo 64-bit
apt:
deb: https://repo.zabbix.com/zabbix/5.0/ubuntu-arm64/pool/main/z/zabbix-release/zabbix-release_5.0-1%2Bubuntu20.04_all.deb
when: ('muninn' in inventory_hostname)
become: true
- name : Install the zabbix-agent2
apt:
name: zabbix-agent2
update_cache: yes
become: true
- name: Install the config
template:
dest: /etc/zabbix/zabbix_agent2.d/custom-zabbix.conf
mode: u=rw,g=r,o=r
src: custom-zabbix.j2
force: true
become: True
notify:
- Restart the zabbix agent to pick up config
- name: Allow zabbix to have passwordless sudo
lineinfile:
dest: /etc/sudoers
state: present
regexp: '^zabbix'
line: 'zabbix ALL=(ALL) NOPASSWD: ALL'
validate: 'visudo -cf %s'
become: True
- name: Add cron job to create log directory on reboot for RPis
block:
- name: Add the cron shell variable
cron:
name: SHELL
env: true
job: /bin/bash
- name: Add cron job to create the directory at boot
cron:
name: "Zabbix log directory"
job: mkdir /var/log/zabbix && chown zabbix:zabbix /var/log/zabbix
special_time: reboot
become: True
when: "'pis' in group_names"
The machine muninn is handled a little differently because it’s running Raspberry Pi OS 64 while the rest are running 32-bit which means they need to pull the agent from different apt repos. The rest of my machines are Ubuntu VMs. opnSense has it’s own way to install zabbix agent and the synology is just a configuration setting.
That mosquitto external script is just something I played with and I can’t remember if I even got it to work properly. In the end all I really cared about was whether or not it was running.
That can be done with powershell. That is a most “easy-to-install” solution and can run agentless. The “not-to-complex-to-use” part, is only depending on your powershell and other scripting skills.
(but, at this point, I don’t know how powershell can/will interact with Docker containers as I do not have ample knowledge of Docker)
The installation of Zabbix <6 was a real pain in the …, which brought me over the edge of my know-how and i abandend it last year. Zabbix 6 was easy-peasy as the installation script and docu was rewritten.
What I don’t get presently is how to run zabbix or netdata within my Synology Docker, which isn’t as convenient as to “drop the provided docker-script and it’ll run”, but I have to insert the volumes and env-variables and somehow I don’t get this running…