Grafana Image Charts

Tags: #<Tag:0x00007faee879a4e8> #<Tag:0x00007faee879a3a8>

I’ve been reworking how I access my home services which includes gaining access to Grafana. I am only willing to expose Grafana to the internet with authentication but when I add authentication it makes it hard to embed the charts into my sitemap. So I decided to experiment with statically generating the charts as jpgs.

NOTE: Generating static images of charts is a really CPU intensive activity (apparently). This will bring your RPi to it’s knees. If you run Grafana on an RPi, I’m sorry but this tutorial is not for you.

Approach

The obvious approach is to put the rendering URL on your sitemap directly but that URL is not accessible outside of your LAN so you can’t see the charts if you are accessing OH remotely. So I opted to wget the images periodically and save them to the $OH_CONF/html folder.

Then I put them on my sitemap using Image elements and use a String Item, Switch element, and the visibility tag to switch between different time periods.

Items

String ChartVisibility "Period"

Rules

rule "Pull charts from Grafana"
when
	Time cron "0 0/5 * * * ? *"
then
	if(!chartLatch.tryLock(0, TimeUnit.SECONDS)) {
		logWarn(logName, "Pulling charts from Grafana is already running, you may need to lower the refresh rate")
		return;
	}
	try {
		chartLatch.lock
		logDebug(logName, "Getting chart images...")
		val argTemplate = "/usr/bin/wget 'http://10.10.1.127:3000/render/d-solo/000000001/home-automation?orgId=1&from=%FROM%&to=now&panelId=%PID%&width=1000&height=500&tz=America%2FDenver&timeout=10000' -O /openhab/conf/html/%OUT%"

		// These could be globals but I like having them here for readability
		val List<String> froms = newArrayList("h", "d", "w", "M", "y")
		val Map<String, String> panels = newHashMap("1" -> "temp",
													"2" -> "hum",
													"8" -> "light",
													"4" -> "power")

		panels.keySet().forEach[ pid | 
			val template = argTemplate.replace("%PID%", pid)
			froms.forEach[ String fr |
				val file = panels.get(pid)+fr+".jpg"
				val results = executeCommandLine(template.replace("%FROM%", "now-"+fr).replace("%OUT%", file), 5000)
				logDebug("Chart", "Results from wget for " + file + "\n" + results)
			]
		]

		logDebug(logName, "Done grabbing charts")
		chartLatch.unlock
	}
	catch(Exception e) {
		logError(logName, "Error in Pull cherts from Grafana: " + e.toString)
		chartLatch.unlock
	}
	finally {
		chartLatch.unlock
	}
end

Theory of operation. Generating charts will take a long time. For me on my VM it takes almost 40 seconds to generate all 20 images. The latch is there to prevent new instances of this Rule from running if a previous instance is still running. See Design Pattern: Rule Latching for details.

I put the wget command into a String and marked off a few fields I want to replace to generate different charts, denoted by %FIELD%. These fields are the only parts of the command that change to generate each chart.

  • %FROM% - the start of the time period, e.g. now-1d would be “now minus one day”. All of our time periods are -1 so the only thing we need to change is the period (i.e. the letter)
  • %PID% - the Panel ID for the chart you want to render. You can most easily find it by clicking on the share
  • %OUT% - the name of the file to save the chart into.

Note that your path the the html folder may be different. I run in Docker and that is where openHAB is placed inside the container.

Next we have a couple of constants. These store all of the time periods as a List and all of the panels with the corresponding root file name as a Map. We will use these to avoid a bunch of duplicated code generating each chart.

First we loop through all of the keys of the Map. This gives us the %PID%. Then we loop through the List of time periods (froms) to get the %FROM%. Finally in that loop we use the %PID% to get the base filename and build the file name using that and the %FROM%.

With all these values we can now construct the wget command using the String.replace method to replace the marked out portions of the URL with the calculated values, which we execute using executeCommandLine.

Ideally, we would be able to use the “fire-and-forget” version of the Action but when I tried it I ended up with zero length files. So we must wait for one wget to complete before calling the next one. So we give executeCommandLine a second timeout parameter. With that parameter we can capture the output from wget so we log that out as a debug.

Sitemap

				// Temperature
				Switch item=ChartVisibility mappings=[Hour=Hour,Day=Day,Week=Week,Month=Month,Year=Year] 
				Image url="http://argus:8080/static/temph.jpg" refresh=5000 visibility=[ChartVisibility == "Hour", ChartVisibility == UNDEF]
				Image url="http://argus:8080/static/tempd.jpg" refresh=5000 visibility=[ChartVisibility == "Day"]
				Image url="http://argus:8080/static/tempw.jpg" refresh=5000 visibility=[ChartVisibility == "Week"]
				Image url="http://argus:8080/static/tempM.jpg" refresh=5000 visibility=[ChartVisibility == "Month"]
				Image url="http://argus:8080/static/tempy.jpg" refresh=5000 visibility=[ChartVisibility == "Year"]

We put the charts into the sitemap and use the visibility flag to show the desired time period. ChartVisibility gets set to “Hour”, “Day” etc. using the mappings. By default, the Hour chart is shown.

Improvements

There is a lot not to like about this approach. Here are some ideas for improvements.

External Script

Tying up a Rule execution thread for 40 seconds can be a really bad idea. Unfortunately the only way to avoid this is to move the wget calls outside of openHAB. Try a shell script and a Linux cron job instead. I’m not going to try to implement this one. Hint: you can use curl to get the current state of ChartVisibility or you can use something like sensorReporter with the execActuator plugin and have OH send the current state of ChartVisibility to the script.

Only generate the time period you are currently viewing

We don’t really need to generate all the charts for all the time periods. Modify the Rule to only generate those charts for the currently selected time period. Hint: you will want to trigger the Rule when ChartVisibility changes as well as on the cron. You can change how often the charts get regenerated based on the value of ChartVisibility too. Could be a good job for Design Pattern: Looping Timers.

Items

String ChartVisibility "Period"

Rules

rule "Pull charts from Grafana"
when
        Item ChartVisibility changed or
        System started
then
        val argTemplate = "/usr/bin/wget 'http://10.10.1.127:3000/render/d-solo/000000001/home-automation?orgId=1&from=%FROM%&to=now&panelId=%PID%&width=1000&height=500&tz=America%2FDenver&timeout=10000' -O /openhab/conf/
html/%OUT%"

        val Map<String, String> panels = newHashMap("1" -> "temp",
                                                    "2" -> "hum",
                                                    "8" -> "light",
                                                    "4" -> "power")
        if(chartTimer == null) {
                chartTimer = createTimer(now, [ |
                        val startTime = now.millis
                        logInfo(logName, "Getting chart images...")

                        val period = ChartVisibility.state.toString
                        panels.keySet().forEach[ pid |
                                val String template = argTemplate.replace("%PID%", pid).replace("%OUT%", panels.get(pid)+".jpg")
                                val String fr       = if( period == "NULL" || period == "UNDEF" ) "now-1h" else "now-1"+period
                                val String results  = executeCommandLine(template.replace("%FROM%", fr), 5000)
                                logInfo(logName, "Results from wget for " + pid + " and period " + period) // + "\n" + results)
                        ]

                        var Number reschedTime = 60*1000
                        switch(period) {
                                case "d": reschedTime = 5*60*1000     // 5 minutes
                                case "w": reschedTime = 15*60*1000    // 15 minutes
                                case "M": reschedTime = 60*60*1000    // 1 hour
                                case "y": reschedTime = 12*60*60*1000 // 1 day
                        }

                        reschedTime = reschedTime - (now.millis - startTime)
                        if(reschedTime.intValue < 0) reschedTime = 0
                        logInfo(logName, "Done grabbing charts, rescheduling in " + reschedTime + " milliseconds.")
                        chartTimer.reschedule(now.plusMillis(reschedTime.intValue))
                ])
        }
        else {
                chartTimer.reschedule(now)
        }
end

Instead of using a polling on the Rule trigger we only call the Rule when ChartVisibility changes or the System started and use a Looping Timer to periodically pull the charts. If the Timer doesn’t exist we create one and have it execute immediately.

In the Timer we take a time stamp and then get all the charts for the currently selected time period, similar to what we do above. Only now we only save the chart to the value in panels without adding the time period to the end (i.e. temp.jpg instead of temph.jpg). We’ve also changed the values of ChartVisibility to match those needed by Grafana instead of doing a map between Hour and h, for example.

Next we calculate when to reschedule the next pull of the chart images. We adjust the amount of time between the pulls based on the currently selected time period. For example, the year time period doesn’t change much so we only pull a new one once a day. The hour time period does change a lot so we pull a new image every five minutes.

We use the timestamp from the beginning of the timer and the current timestamp to adjust the reschedTime to account for how long it took to pull the images. Then we reschedule the timer.

If we enter the Rule and there already is a Timer, we reschedule it to run immediately because that means that a new time period has been chosen. The new time period’s charts will get pulled and the timer will start looping based on the newly selected time period’s polling time.

Sitemap

Switch item=ChartVisibility mappings=[h=Hour,d=Day,w=Week,M=Month,y=Year]
Image url="http://argus:8080/static/temp.jpg" refresh=1000

Notice how we no longer need a separate entry for each time period nor do we need to mess with visibility. The newly chosen time period will appear in no more than one second after the new chart is generated thanks to the refresh.

The good thing about this version is it only pulls N charts, instead of N*5 charts where N is the number of charts and 5 is the number of time periods. In the example above we went from 20 charts taking over half a minute to 4 charts taking around 8 seconds to generate. We also control how often we generate those charts so we are not wasting time generating the chart images more often than necessary.

The problem with the above though is we’ve moved the long running code out of the Rule’s thread pool which is five deep by default, into the Timer thread pool which is only 2 deep by default. If you have a lot of cron triggered Rules or Timers this could become a problem.

Is there a way to move the long running code back to the Rules thread pool where there is a bit more room to accommodate the long running code?

Expire binding

If we use Design Pattern: Expire Binding Based Timers we can simplify the code a bit and move the long running code back to the Rules thread pool. But it will come at the small cost of no longer being able to take account of the time taken to retrieve the images.

Items

String ChartVisibility "Period"
Group:Switch ChartPolls
Switch ChartPoll_h (ChartPolls) { expire="1m,command=OFF" }
Switch ChartPoll_d (ChartPolls) { expire="5m,command=OFF" }
Switch ChartPoll_w (ChartPolls)  { expire="15m,command=OFF" }
Switch ChartPoll_M (ChartPolls)  { espire="1h,command=OFF" }
Switch ChartPoll_y (ChartPolls)  { expire="24h,command=OFF" }

Notice the naming convention used here. See Design Pattern: Associated Items for details.

Rules

rule "Change the chart polling period"
when
        Item ChartVisibility changed or
        System started
then

        val period = if(ChartVisibility.state.toString == "NULL" || ChartVisibility.state.toString == "UNDEF") "h" else ChartVisibility.state.toString

        // Cancel any running Timers
        ChartPolls.members.filter[ c | c.state == ON ].forEach[ c | c.postUpdate(OFF) ]

        // Start kick off a new pull of the charts
        sendCommand("ChartPoll_"+period, "OFF")
end

rule "Pull charts from Grafana"
when
    Member of ChartPolls received command OFF
then
        logInfo(logName, "Getting chart images...")

        val argTemplate = "/usr/bin/wget 'http://10.10.1.127:3000/render/d-solo/000000001/home-automation?orgId=1&from=%FROM%&to=now&panelId=%PID%&width=1000&height=500&tz=America%2FDenver&timeout=10000' -O /openhab/conf/html/%OUT%"

        val Map<String, String> panels = newHashMap("1" -> "temp",
                                                    "2" -> "hum",
                                                    "8" -> "light",
                                                    "4" -> "power")
        val period = if(ChartVisibility.state.toString == "NULL" || ChartVisibility.state.toString == "UNDEF") "h" else ChartVisibility.state.toString

        panels.keySet().forEach[ pid |
                val String template = argTemplate.replace("%PID%", pid).replace("%OUT%", panels.get(pid)+".jpg")
                val String fr       = if( period == "NULL" || period == "UNDEF" ) "now-1h" else "now-1"+period
                val String results  = executeCommandLine(template.replace("%FROM%", fr), 5000)
                logInfo(logName, "Results from wget for " + pid + " and period " + period) // + "\n" + results)
        ]

        // Reschedule the timer
        sendCommand("ChartPoll_"+period, "ON")
        logInfo(logName, "Done getting chart images")
end

Notice how the second half of the Timer Rule above all but disappears thanks to the use of Associated Items.

This approach has the advantage that the long running code is in the Rules thread pool instead of the Timer thread pool. The code itself is also shorter and less complex.

2 Likes

Nice write up… I did smth similar back when I run OH in RPi, but now as I’m running it inside a Docker on a dedi home server, I followed the same approach… It’s way more efficient than to regenerate charts on-the-fly… There are some charts in my setup that are generated on a regular time-base interval, but also, there are some that are updated on a per-event basis (but here timer-resets are used to prevent flooding of wget requests…)

Rich,

excellent description as always. Cudos.

Would you be able to elaborate more on getting the rendering feature turned on?

I am running in docker as well and would love to see which route you went (grafana-image-renderer in a separate docker, enabling rendering in the grafana docker, …).

Regards
Ralf

I didn’t do anything special. I use the latest official Grafana Dinner image, making sure it’s configured to allow anonymous login on, expose port 3000, and use the URL for the static image when clocking on the share menu for the panel.

I didn’t install anything special not did I change anything to make it work. For awhile Grafana disabled the rendering library because it was running amok but it appears to be working better now.

Just trying to understand.
The idea is to have Grafana charts posing on your sitemaps for the internet, but you want to use authentications, right?
How do you access the sitemap (from the internet) at first? I believe you use authentication as well, or?
Isn´t this just becoming yet another authentication for the same then?

I am running the official image on an ARM (Raspberry 3B+). I can also ramp it up on my laptop (amd64). I need to check if rendering works on amd64 out of the box. Should be easy following your path.

Is it turned on on ARM on the official image as well? I think of having one Pi dedicated only to rendering the images (I have several running at my home for various tasks)

This is my docker-compose for grafana

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - 3000:3000
    environment:
      GF_USERS_ALLOW_SIGNUP: "false"
      GF_AUTH_ANONYMOUS_ENABLED: "true"
      GF_SECURITY_ADMIN_PASSWORD: "NotDisclosed"
      # GF_SECURITY_ALLOW_EMBEDDING: "true"
      # GF_RENDERING_SERVER_URL: "http://${IP}:8081/render"

Not exactly.

The root idea is to have charts on my sitemap that work whether I access openHAB from my LAN, through myopenhab.org, or directly through a reverse proxy. For the latter two the only way to get the charts directly from Grafana is if Grafana itself were available on the internet. myopenhab.org will not reverse proxy other services running on your LAN for you.

So I have two choices. I can expose Grafana to the internet directly, but I refuse to do that without some authentication, or I can take static images of the charts and put them into the html folder where openHAB can serve them itself. The latter approach is what is documented here.

Mostly through myopenhab.org. I’ve experimented some with access through a reverse proxy but I’m not sure I want to spend the effort o keep monitoring it. But yes, both have authentication. The problem isn’t adding authentication to Grafana. That’s easy. I can use the reverse proxy to provide authentication or I can use the authentication built into Grafana itself. The problem is adding authentication to Grafana but still allowing openHAB to embed the charts in the sitemap without requiring me to enter my Grafana username and password on my sitemap. There were also some problems getting the embedding to work at all in the browser (it seemed to work in the Android app). At that point I decided that I’d rather keep Grafana behind my firewall and just put the images on my sitemap.

As of a few months ago, PhantomJS would consume all available RAM on the RPi causing openHAB to crash. Maybe that has been fixed, maybe not. I don’t know.

Hmm… I dont think I understand.
I run Grafana on my windows server, (same LAN as the openhab server). I show rendered Grafana charts fine on my sitemaps when accessing through myopenhab.org (and ofcouse local as well)…

Yeah, I know… I was hit hard from it :slight_smile: Thats why I moved Grafana to my windows server insted. It works just fine.

All I can say is that has never worked for me completely. Either the Android App wouldn’t show it, or the Web UI wouldn’t show it, or both would fail to show it. And when the PhantomJS problems started, they quit working entirely (I think the timeout was too low). Maybe it will work better now.

It never really bothered me much but I decided to see if there were ways I could go around it.

The approaches above (particularly the new ones I just added) do have some advantages though over putting the link to Grafana directly on the sitemap.

  • The browser is not sitting waiting for each chart to render when you load it which does cause a noticeable delay, or at least it used to.
  • We don’t overload Grafana with a barrage of requests as each client connected to the OH will be requesting it’s own set of renders for all the charts they have on the sitemap (whether they are shown or not), potentially all at the same time. If you have more than one client connected with more than just a couple of charts this can be a huge load. And it’s an unnecessary load because the bulk of the renders will not be significantly different from each other. My VM is perfectly able to handle this load, but it does peg one of the two CPUs while generating charts.
  • It’s easier on the myopenhab.org server because there are fewer requests going from the clients through it to the OH instance. For example, if you look at my original example, I have 30 requests (because I have a couple of charts on my sitemap more than once) that flow through myopenhab.org every five seconds. With the newly added approaches I only have 6 requests for the entire sitemap.

Also, I had an ulterior motive. I wanted to create another example that shows a block of code undergoing gradual refinement. There are not many examples like this on the forum and I think it’s useful for users to see in an example where the first version of the code is not the final version of the code. Get it to work and then refine.

I was able to get to the refinements a little sooner than I expected. :slight_smile: