Hello everyone,
we are developing an add-on bundle for Eclipse Smart Home/OpenHAB, experience some difficulties with it and are looking for help here now. We tried out a lot of different approaches but all failed. I think, the bundle will be helpful for others too, so we release it under EPL, and maybe it is interesting enough to even directly incorporate it into upstream later.
The bundle exposes internal metrics of ESH/openHAB in Prometheus’ format (https://prometheus.io/docs/concepts/data_model/) so it can be scraped/ingested there and fed into a Grafana board, to get a good understanding of the system’s health, potentially especially for multiple systems deployed in the field, not directly physically accessible to the one who monitors it.
The metrics there currently are
- openhab_bundle_state{bundle="[BUNDLENAME]"} [BUNDLESTATE]
- openhab_thing_state{thing="[THINGID]"} [THINGSTATE]
- openhab_inbox_count [INBOXCOUNT]
- smarthome_event_count{source="[BUNDLENAME]"} [EVENTCOUNT]
We now would like to add internal states of several bindings (most importantly Z-wave), which are hard to aquire, because they are internal. But regularly, the bindings output quite a lot of interesting information (e.g. size of send queue, number of sending attempts) to the logs in DEBUG level.
It will, in the general approach, add more metrics to the output like
- openhab_logmessages_total{level="[LOGLEVEL]"} [MESSAGECOUNT]
- openhab_logmessages_error{type="[LOGGERNAME]"} [MESSAGECOUNT]
Afterwards we probably want to filter, process and parse a custom set of messages to get e.g. Z-waves internal message queque etc.
The Prometheus javaclient library which we use and which comes as OSGi bundles, has logging collectors for log4j, log4j2 and logback (https://github.com/prometheus/client_java#logging), so the “only” thing we’d need to do is to setup our bundle (or a class in it) as an appender to the openHAB logging facility, to get the information parallel to the usual file logging.
And here comes the problem: We were not able to figure out how to hook it up to the Karaf/Felix/Pax in the distribution runtime (we managed to do it for Logback/Equinox inside the development IDE).
Can anybody here point us in the right direction, how or where we need to register our bundle/fragment so that all the logging messages can be configured to end up here? We googled a lot, but it seems nobody had that exact same requirement earlier, all articles and hints we found are outdated or only providing bits or are based on different requirements.
You can see the current state here: https://github.com/KuguHome/openhab-prometheus-metrics , build instructions etc. are included in the README.md, a working jar is here: https://github.com/KuguHome/openhab-prometheus-metrics/releases/download/v0.9/com.kuguhome.openhab.prometheusmetrics-2.4.0-SNAPSHOT.1.jar. To make it run in the addons/
folder, you will need to add two more files:
- http://central.maven.org/maven2/io/prometheus/simpleclient/0.4.0/simpleclient-0.4.0.jar
- http://central.maven.org/maven2/io/prometheus/simpleclient_common/0.4.0/simpleclient_common-0.4.0.jar
The master branch has the working version, which is missing the openhab_logmessages_
metrics, the eclipse-workable-logback-metrics
branch has those metrics, but will only work in the Eclipse IDE, not inside the standalone distribution.
An example scrape looks like this: https://github.com/KuguHome/openhab-prometheus-metrics/wiki/Example-scrape
We have set it up on a couple of running instances and this is how it looks like when scraped by Prometheus and visualized by Grafana:
The Grafana Dashboard code is here: https://github.com/KuguHome/openhab-prometheus-metrics/wiki/Grafana-source
What we did until now to get it running inside openHAB distribution runtime:
- added appender to org.ops4j.pax.logging.cfg
- put jar bundle into maven repo according to org.ops4j.pax.url.mvn.cfg
(so bundle should be picked up at startup). - put line of appender here startup.properties
mvn:io.prometheus/simpleclient_log4j2/0.4.0 = 8
mvn:org.ops4j.pax.logging/pax-logging-api/1.10.1 = 8 <---- already was there
Got that:
2018-07-04 17:34:59,960 CM Configuration Updater (Update: pid=org.ops4j.pax.logging) ERROR Unable to invoke factory method in class class org.apache.logging.log4j.core.config.AppendersPlugin for element Appenders. java.lang.NullPointerException
Please let me know if you need any more information.
Also we are very interested in your opinion of the bundle or any other kind of feedback.
Some more insights about our process of thoughts:
- We don’t want to read the Log from the files, because it seems very ineffcient to parse things which were there as objects before and then have been serialized/formattet
- We don’t want to use a Syslog output and pipe it into a solution like Logstash/Elasticsearch etc. because that’s as inefficient (object->format->parse) and needs much more setup overhead, other things running on the openHAB machine
- We looked into the Openhab Log Viewer (which is running independently on a different Port 9001) and the Log Reader Binding, but they do the same (read the files) and are also not helping our target of getting internal states onto a time-based, visually appealing graph for good diagnostics
- Rather, we want to hook into the stream of raw object data directly inside of openHAB, using the APIs that are there in Logback/Log4j(2)/Pax for this exact purpose to build an own collector
- There was some talk about Promethes style metrics earlier here: Overview of expectable command latency on different hardware? but we think the Hardware/OS data (memory, CPU etc.) is easier to aquire by running a node_exporter next to openHAB and scrape it from there. But maybe @fish is interested anyways? (Having said he has a soft spot for metrics)
Looking forward to your responses,
Cheers,
Hanno - Felix Wagner (Product Manager for Kugu Home GmbH)
PS: I crosspost this to the ESH forum, because it seems to be a very deep technical problem (https://www.eclipse.org/forums/index.php/m/1792637/#msg_1792637)