[Guide] Dell server monitoring and control with iDRAC and openHAB

I recently integrated my Dell servers into openHAB for monitoring and control purposes, hence thought this guide might benefit some or instill some new ideas in how openHAB can be leveraged across network equipment.

Introduction


image

Background

My situation, and yours for sure will be different, is as follows:

  • 3x Dell R210 II servers with iDRAC6 (v. 2.9.0) express & enterprise modules in Proxmox HA cluster
  • 1x Dell R210 II server dedicated bare metal pfSense box, iDRAC6 (v. 2.9.0) express & enterprise
  • Dell R210 II servers are usually on the quieter side (for 1Us), with iDRAC this can though be even further optimized due to manual fan RPM control
  • openHAB runs inside a LXC container on Proxmox in the HA cluster
  • All servers (=Proxmox), VMs and LXCs are running Debian, besides pfSense freeBSD

Further to the technical background, below some other influencing factors:

  • I am living in a country with average ambient temperatures of 35 °C (if aircon is turned off)
  • We limit aircon use usually to evening and night times, whereas not in the living room (night times)
  • My server rack is located in the living room which exposes it to said temperatures and also has a certain noise level

Objectives

I did not feel on setting up a whole Grafana environment to monitor my servers and also do not need all the details, hence opted for openHAB to see what can be done with it:

  • Monitor and document historical CPU temperature with a frequency of 1 minute
  • Monitor and document historical fan RPM speeds, also with a frequency of 1 minute (could be longer though)
  • Control fan RPM speeds to reduce/increase speed on certain triggers (i.e. temperature, home status)
  • Receive an alarm via Telegram and openHAB’s Location cards when historical 5 minute temperatures are above 85 °C
  • Automatically increase fan RPM speeds when said alarm is triggered, and reduce it when the alarm is resolved
  • Have a ping to my servers to see when they were online last (minor side effect)

Requirements

To achieve above all in openHAB I am using a combination of mechanics.
Following is required in my environment, but might be different for you:

Remarks

  • As mentioned above, my circumstances might be different from yours, hence please adapt
  • Also, there are different tools that could be used to achieve the same, please adapt where needed
  • The guide should work though also for other, more recent, iDRAC versions (i.e. v7, 8, 9 etc.) and/or on other manufacturers products with their respective command set (i.e. HP, SuperMicro)
  • As will be clear from my code, I am sure it can be improved and simplified (it though works for me currently), happy to simplify though where I can, hence feedback welcomed
  • I will also revise/update this guide once I implemented a few further improvements

Implementation Guide

Step 1 - Getting basic monitoring working

The monitoring of CPU temperature and fan RPM speeds is obviously critical to get any control mechanics working.
I therefore opted for using openHAB’s exec binding in combination with lm-sensors and iDRAC’s ipmi commands.

Before going into openHAB and creating Things, Items etc., I tested the lm-sensor and ipmi commands from my openHAB machine to make sure that these are executable remotely on my Proxmox servers.
I will not go into details of how to setup iDRAC6 or lm-sensors as there are already various guides out there: [1], [2].

Command used to query CPU temperature:

ssh USER@IP sensors | grep -A 0 'Package id 0' | cut -c17-20

Take note that sensors queries sensor data via lm-sensors on the remote machine, grep than filters the result by what is required, the package (=CPU) temperature output and cut then reduced the full line output to the required temperature numbers with one decimal (i.e. 23.0).
Again, the exact grep and cut commands might be different on your machine.

Command used to query fan RPM speeds:

ipmitool -I lanplus -H IP -U USER -P PASSWORD sensor reading "Ambient Temp" "FAN 1 RPM" "FAN 2 RPM" "FAN 3 RPM" | grep 'FAN 1 RPM' | cut -c20-23

Again, above command is a combination of, in this case, ipmi, grep and cut.
The output will be the RPM speed of FAN 1 and the same command can be used for FAN 2 or FAN 3 etc.
Please make sure to replace IP, USER and PASSWORD with your values.

Once those commands are working via ssh/ipmitool, we can now implement them in openHAB itself.

Adding commands to openHAB’s exec.whitelist (!!):

This is a very important step as by nature, openHAB will block any command execution.
The commands need to be added in the exec.whitelist to make them executable.
The file can be found in: /etc/openhab/misc/exec.whitelist(non docker install, install via apt on debian)
Modify the document and add above mentioned ipmitool and ssh sensor commands and save the document.

Exec binding Thing setup:

Screenshots:


Exec binding Thing’s Item setup:

Screenshots:

Once setup, the above should query CPU temperature and fan RPM speed every 60 seconds.
I also added the state description meta data to show RPM and °C suffixes.

Screenshot:

Step 2 - Basic fan RPM control

Basic fan RPM control can be achieved simply by creating rules in openHAB.
Take note though that, even though it could be done with the UI’s Rule DSL, all my rules are written in ECMA as this enables combining of rules and more advanced rule actions.

Dell fans speeds can be modified via ipmi and set in a range of 0% - 100%.
Take note that this requires setting fan speed control to manual (!!) and you will loose automatic fan speed adjustments in scenarios in which CPU temps are high.
To enable manual fan RPM control execute:

enable: ipmitool -I lanplus -H IP -U USER -P PASSWORD raw 0x30 0x30 0x01 0x00
disable: ipmitool -I lanplus -H IP -U USER -P PASSWORD raw 0x30 0x30 0x01 0x01

Also take note that on server restart, the manual control is deactivated and needs to be re-activated.

The exact commands are documented here and here. Based on your sever, you might be able to query/control even more parameters (Dell R210 IIs are more “basic” and only show CPU temp and fan speeds).

Create control items:

To change the RPM speeds, an item is needed to select certain RPM settings that then can be send via ipmitool to the Dell’s iDRAC.

Screenshots:


Rule triggers and ECMA rule:

The fan RPM control rule triggers whenever the just created control item changes.
Based on this change, it will then execute a ipmitool command remotely on the server.
I have defined the following prefix variables in my rule to simplify:

var ChronoUnit = Java.type("java.time.temporal.ChronoUnit");
var ZonedDateTime = Java.type("java.time.ZonedDateTime");
var telegramAction = actions.get("telegram","telegram:telegramBot:telegram");
var Log = Java.type("org.openhab.core.model.script.actions.Log");
var ScriptExecution = Java.type("org.openhab.core.model.script.actions.ScriptExecution");
var PersistenceExtensions = Java.type("org.openhab.core.persistence.extensions.PersistenceExtensions");
var Exec = Java.type('org.openhab.core.model.script.actions.Exec');
var Duration = Java.type('java.time.Duration');

Remark: Not all of above is needed, it is though my standard copy-paste block for all rules.

I then defined variables for IP, username, password as well as the different percentage fan RPM settings:

//iDRAC IP and user access
    
var ip_address_pve10 = "IP1";
var ip_address_pve20 = "IP2";
var ip_address_pve30 = "IP3";
var ip_address_pfSense = "IP4";
    
var username = "USER";
var password = "PASSWORD";
    
//FAN speed percentage values in variables
    
var percentage_0 = "0x00";
var percentage_5 = "0x05";
var percentage_10 = "0x0A";
var percentage_20 = "0x14";
var percentage_30 = "0x1e";
var percentage_50 = "0x32";
var percentage_70 = "0x46";
var percentage_80 = "0x50";
var percentage_100 = "0x64";

Take note that the above hex values equate to percentages (i.e. 0x00 = 0%).
You can use an online hex value converter to get your own percentages, whereas above list should be enough.

The rule afterwards uses the event.itemName to identify which fan RPM control Item was used to switch RPM speeds for which server.
It will then execute the respective command via Exec.executeCommandLine (I shortened the rule below as it just repeats for each percentage value):

    if(this.event === undefined) {
        // event does not exist
    }
    
    if(event.itemName == "exec_pve10_fanspeed_set"){
      if(pve10_fanspeed_control == "0"){
        var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve10, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_0);
        Log.logInfo("TEST", "PVE10 (" + ip_address_pve10 + ") set to 0% " + percentage_0);
      } else if(pve10_fanspeed_control == "5"){
        var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve10, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_5);
        Log.logInfo("TEST", "PVE10 (" + ip_address_pve10 + ") set to 5% " + percentage_5);  
      } else if(pve10_fanspeed_control == "10"){...

Step 3 - Intermediate fan RPM control

Above rule enables the fan RPM speed control via an Item, whereas I wanted to make sure that the CPU temperature is kept in certain limits.
This includes to (1) increase/decrease fan RPM speeds based on “homestatus” (i.e. away, sleep, home) and (2) to alarm in emergencies where CPU temperatures are very high, increasing fan speeds to resolve the alarm.

Increase/decrease fan RPM speeds based on homestatus:

This rule is pretty straight forward, hence I will not go into much depth here.
It though increases fan speed when we are not at home (away = 0) and/or when we are asleep (sleep = 2), it then reduces fan speeds again during daytime when we are at home (home = 1):

if(event.itemName == "presence_home"){
   if(presence_home == "0" || presence_home == "2"){
     var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve10, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_50);
     var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve20, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_50);
     var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve30, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_50);
     var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pfSense, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_50);
     Log.logInfo("TEST", "Night Time or Away - All Server Fan set to 50% Speed");
   } else if (presence_home == "1"){
      var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve10, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_30);
      var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve20, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_30);
      var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pve30, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_30);
      var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", ip_address_pfSense, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", percentage_30);
      Log.logInfo("TEST", "Day Time - All Server Fan set to 30% Speed");
    }
}

This ensures that CPU temperatures are kept at acceptable ranges when we are a sleep (and the living room aircon is turned off) as well as when we are not at home as in both scenarios, we do not mind louder fan noise (it is not audible in bed rooms), while it also ensures that fan noise is reduced when we are at home.

Alarm notification and fan speed automation in emergencies (high CPU temp):

As above all requires to set the server’s fan speed control to manual, and hence loose automatic increase/decrease to prevent overheating, I needed another mechanic (or rule) to at least notify me and have a short-term fan speed increase to elevate the high temperatures.
To do so I am using the server’s CPU temperature items as triggers which then execute another ECMA script.
The script is using an average of CPU temperature of the last 5 minutes:

var ScriptExecution = Java.type("org.openhab.core.model.script.actions.ScriptExecution");
var PersistenceExtensions = Java.type("org.openhab.core.persistence.extensions.PersistenceExtensions");
    
var pve10_temperature_avg = PersistenceExtensions.averageSince(ir.getItem("exec_pve10_cputemperature"), now.minusMinutes(5));
var pve20_temperature_avg = PersistenceExtensions.averageSince(ir.getItem("exec_pve20_cputemperature"), now.minusMinutes(5));
var pve30_temperature_avg = PersistenceExtensions.averageSince(ir.getItem("exec_pve30_cputemperature"), now.minusMinutes(5));
var pfSense_temperature_avg = PersistenceExtensions.averageSince(ir.getItem("exec_pfSense_cputemperature"), now.minusMinutes(5));

And is then using a rule to modify fan speeds temporarily and set alarms in the UI and Telegram:

if(event.itemName == "exec_pve10_cputemperature"){
   if(pve10_temperature_avg >= 85 && pve10_temperature_alarm == "OFF"){
      events.sendCommand("exec_pve10_temperature_alarm", "ON");
      events.sendCommand("exec_pve10_fanspeed_set", "50");
      telegramAction.sendTelegram("PVE10 Temperature too high: " + pve10_temperature_avg + " - FAN Speed set to 50%");
      Log.logInfo("TEST", "PVE10 Temperature Alarm!");
    } else if (pve10_temperature_avg < 80 && pve10_temperature_alarm == "ON"){
      events.sendCommand("exec_pve10_temperature_alarm", "OFF");
      events.sendCommand("exec_pve10_fanspeed_set", "30");
      telegramAction.sendTelegram("PVE10 Temperature back to normal: " + pve10_temperature_avg + " - FAN Speed set to 30%");
      Log.logInfo("TEST", "PVE10 Temperature Alarm Resolved!");
    }
} else if (event.itemName == "exec_pve20_cputemperature"){
   if(pve20_temperature_avg >= 85 && pve20_temperature_alarm == "OFF"){
      events.sendCommand("exec_pve20_temperature_alarm", "ON");

Final notes

The following aspects are still WIP and I will update the guide once completed:

  • Automatic enablement of manual fan control when servers restart
  • More fine-grained fan RPM controls on CPU temperatures (not only in emergencies)
  • Overwrite functionality to activate automatic fan RPM control on all servers (as last resort)

I hope above guide was helpful and can be used in other scenarios.

Changelog

1 Like

Update #1

Below includes a revised and simplified (manual) fan speed control rule.
Instead of hard-coding most items, I went this time for a more flexible setup.

//Set item variables

var triggerItem   = ir.getItem(event.itemName);
var triggerState  = ir.getItem(triggerItem.name).getState();
var triggerLabel  = ir.getItem(triggerItem.name).getLabel();
var substring_pve = triggerItem.name.substring(5, 10); //you need to change this depending on your items

//User access

var username = "USER";
var password = "PASSWORD";

if(this.event === undefined) {
    // event does not exist
}

//Set correct IP Address

switch (triggerItem.name) {
   case "YOUR-Triggering-ItemName1":
      var substring_ip = "YOUR-IP-Server1";
      break;
   case "YOUR-Triggering-ItemName2":
      var substring_ip = "YOUR-IP-Server2";
      break;
   case "YOUR-Triggering-ItemName3":
      var substring_ip = "YOUR-IP-Server3";
      break;
   case "YOUR-Triggering-ItemName4":
      var substring_ip = "YOUR-IP-ServerX";
      break;
}

//Set correct fan speed HEX value

switch (triggerState) {
   case 0:
      var speed = "0x00";
      break;
   case 5:
      var speed = "0x05";
      break;
   case 10:
      var speed = "0x0A";
      break;
   case 20:
      var speed = "0x14";
      break;
   case 30:
      var speed = "0x1e";
      break;
   case 50:
      var speed = "0x32";
      break;
   case 70:
      var speed = "0x46";
      break;
   case 80:
      var speed = "0x50";
      break;
   case 100:
      var speed = "0x64";
      break;
}

//Actions

var execFAN = Exec.executeCommandLine(Duration.ofSeconds(5),"ipmitool","-I", "lanplus", "-H", substring_ip, "-U", username, "-P", password, "raw", "0x30", "0x30", "0x02", "0xff", speed);
Log.logWarn("servers", substring_pve + " (" + substring_ip + ") set to " + triggerState + "%");

As you can see, above reduces the number of lines drastically (in case of multiple servers).
Hope this helps!

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.