Journey to JSR223 Python 2 of 9

EDIT: Grrrrr, second try. First version was completely lost.

Rule

This is one of my more complicated Rules but it doesn’t really matter if it doesn’t work right. This is a Rule that alerts me when a service or device goes offline.

import org.eclipse.smarthome.model.script.ScriptServiceUtil
import java.util.Map

val Map<String, Timer> timers = newHashMap

rule "A sensor changed its online state"
when
    Member of gSensorStatus changed
then
    if(previousState == NULL) return;

    // val alerted = gOfflineAlerted.members.findFirst[ a | a.name == triggeringItem.name+"_Alerted"] as SwitchItem
    val alerted = ScriptServiceUtil.getItemRegistry.getItem(triggeringItem.name+"_Alerted") as SwitchItem
    if(alerted === null) {
        logError("admin", "Cannot find Item " + triggeringItem.name+"_Alerted")
        aInfo.sendCommand(triggeringItem.name + " doesn't have an alerted flag, it is now " + transform("MAP", "admin.map", triggeringItem.state.toString) + "!")
        return;
    }

    var n = transform("MAP", "admin.map", triggeringItem.name)
    val name = if(n == "") triggeringItem.name else n

    // If we are flapping, reschedule the timer and exit
    if(timers.get(triggeringItem.name) !== null) {
        timers.get(triggeringItem.name).reschedule(now.plusMinutes(1))
        logWarn("admin", name + " is flapping!")
        return;
    }

    // If alerted == OFF and triggeringItem == OFF then sensor went offline and we have not yet alerted
    // If alerted == ON and triggeringItem == ON then the sensor came back online after we alerted that it was offline
    if(alerted.state == triggeringItem.state) {
        val currState = triggeringItem.state
        // wait one minute before alerting to make sure it isn't flapping
        timers.put(triggeringItem.name, createTimer(now.plusMinutes(1), [ |
            // If the current state of the Item matches the saved state after 5 minutes send the alert
            if(triggeringItem.state == currState) {
                aInfo.sendCommand(name + " is now " + transform("MAP", "admin.map", triggeringItem.state.toString) + "!")
                alerted.postUpdate(if(currState == ON) OFF else ON)
            }
            timers.put(triggeringItem.name, null)
        ]))
    }
end

Theory of Operation

The Rule triggers whenever a member of gSensorStatus changes state.

First we check to see if the change was from a state we don’t care about, namely NULL or UNDEF (there is a small bug in this original version as it’s missing UNDEF).

Next we use Design Pattern: Associated Items to get an Item that stores a flag indicating whether we have already alerted for this Item or not. Then we us Design Pattern: Human Readable Names in Messages to convert the Item name to a more friendly name for the logs and alert messages.

If a Timer already exists we know the device is flapping so reschedule the alert (I think this is wrong and the timer should just be canceled).

Finally, if we are not flapping, and the alerted state matches the Item state we set a timer to send the alert message. So if an Item goes OFF and we haven’t alerted, we set the timer. In the timer body we send the alert and set the alerted flag Item to ON. Then when the device comes back online alerting will match the Item state and a new alert will be sent indicating the device is back online.(I’m wondering if all the changes I’ve made over the years whether this is still necessary.)

Python

JSR223 provides access to a newish feature on Items that we can leverage with this Rule to simplify the overall configuration: metadata. Item metadata allows one to define key/value pairs on an Item and with JSR223 the metadata can be accessed and modified fro Rules. Sorry Rules DSL, this isn’t supported (yet?) and the REST API is not fine grained enough to be useful.

With metadata we can replace the associated Item and map file to translate the Item name to a friendly name. One thing to be aware of with metadata is while you can define it on an Item in the .items file, any dynamic changes made to that namespace will be overwritten on a reload of that .items file. So for this Rule I’m going to use two namespaces, a static one defined in the .items file and a dynamic one created and managed by the Rule.

Here is an example of an Item with a statically defined metadata. See the Helper Library docs for the full documentation of metadata.

Switch vGrafana_Online "Grafana Status [MAP(admin.map):%s]"
    <network> (gSensorStatus, gResetExpire)
    { channel="network:servicedevice:argusgrafana:online",
      expire="2m",
      Static="meta"[name="Grafana"]  }

Here is the Rule.

from core.rules import rule
from core.triggers import when
from core.metadata import get_key_value, set_metadata
from personal.util import send_info
from threading import Timer
from core.actions import Transformation
from core.log import log_traceback

# -----------------------------------------------------------------------------
# Python Timers for online alerts
alertTimers = {}

@log_traceback
def alert_timer_expired(itemName, name, origState):
    status_alert.log.debug("Status alert timer expired for {} {} {}".format(name, origState, items[itemName]))
    del alertTimers[itemName]
    if items[itemName] == origState:
        send_info("{} is now {}".format(name, Transformation.transform("MAP", "admin.map", str(items[itemName]))), status_alert.log)
        set_metadata(itemName, "Alert", { "alerted" : "ON"}, overwrite=False)
    else:
        status_alert.log.warn("{} is flapping!".format(itemName))

@rule("Device online/offline", description="A device we track it's online/offline status changed state", tags=["admin"])
@when("Member of gSensorStatus changed")
def status_alert(event):
    status_alert.log.info("Status alert for {} changed to {}".format(event.itemName, event.itemState))

    if isinstance(event.oldItemState, UnDefType):
        return

    alerted = get_key_value(event.itemName, "Alert", "alerted") or "OFF"
    name = get_key_value(event.itemName, "Static", "name") or event.itemName

    #If the Timer exists and the sensor changed the sensor is flapping, cancel the Timer
    if event.itemName in alertTimers:
        alertTimers[event.itemName].cancel()
        del alertTimers[event.itemName]
        status_alert.log.warning(name +  " is flapping!")
        return

    '''
    If alerted == "OFF" and event.itemName == OFF than sensor went offline and we have not yet alerted
    If alerted == "ON" and event.itemName == ON then the sensor came back online after we alerted that
    it was offline
    '''
    status_alert.log.debug("Looking to see if we need to create a Timer: {} {}".format(alerted, event.itemState))
    if alerted == str(event.itemState):
        # Wait one minute before alerting to make sure it isn't flapping
        alertTimers[event.itemName] = Timer(60, lambda: alert_timer_expired(event.itemName, name, event.itemState))
        alertTimers[event.itemName].start()

The rule works pretty much the same as the original Rules DSL Rule with a couple of minor changes to address the issues I identified above (i.e. cancelling the Timer on flapping, check for UNDEF as well as NULL).

Lessons Learned

  • Timers in Python are tricky. See the thread below for details.

  • Some errors (e.g. calling status_alert.log("Some log statement") in the function called by the timer lambda will not be reported in the logs, they will just fail silently. In fact it would seem that all errors in a Timer lambda are suppressed. I suspect one could put a try/exxcept in the function to log out the errors.

Bonus Rule

I have a simple Rule that runs at System started to reset any Design Pattern: Expire Binding Based Timers on an OH restart. This is done through a Group, Persistence, and restoreOnStartup.

Rules DSL Version

rule "Reset Expire binding at startup"
when
  System started
then
  logInfo("Startup", "Restarting Expire binding Timers")
  gResetExpire.members.forEach[ GenericItem i | i.postUpdate(i.state)]
end

At System start, send an update to each member of gResetExpire with the state that was restored. This will cause any running Expire binding Timers to start again.

Python

# -----------------------------------------------------------------------------
@rule("Reset Expire Binding Timers", description="Sends an update to all members of gResetExpire of their restored state", tags=["admin"])
@when("System started")
def reset_expire(event):
    reset_expire.log.info("Restarting Expire binding Timers")
    for timer in ir.getItem("gResetExpire").members:
        events.postUpdate(timer, timer.getState())

Previous post: Journey to JSR223 Python 1 of 9
Next post: Journey to JSR223 Python 3 of?

Edit: Applied corrections and updates based on feedback from CrazyIvan359 and 5iver below.

5 Likes

@rlkoshak why are you assigning the timer to a variable and then adding it to the dict? Have you tried adding it directly to the dict to see if the global is still required?

    if event.itemName in alertTimers:
        alertTimers[event.itemName].stop()
        del alertTimers[event.itemName]
        status_alert.log.warning(name +  " is flapping!")
        return
    if alerted == str(items[event.itemName]):
        # Wait one minute before alerting to make sure it isn't flapping
        alertTimers[event.itemName] = Timer(60, lambda: alert_timer_expired(event.itemName, name, event.itemState))
        alertTimers[event.itemName].start()

Was this str() conversion necessary? The metadata functions should be returning values of the same type as the data. Static="meta"[name="Grafana"] should return a string when fetching the name value. If you were having issues with it, try status_alert_rule.log.warn(type(get_key_value(event.itemName, "Alert", "alerted"))) to print the data type to the log.

You are doing two different None checks here. For alerted you are doing a check for the value being the string "None", is that what get_key_value() is returning? For name you are checking == None which is no Pythonic, the correct way to make this check is if nameKeyVal is not None. In this case the simplest way to write this would be:

name = nameKeyVal if nameKeyVal else event.itemName

This checks if nameKeyVal has a true-ish value (essentially anything that is not False, 0, or None)

That is what got me into trouble. That is what I did the first time around (see JSR223 Python: Timer in dict doesn't work). Scott’s recommendation was that I had to promote the Timer to a global and I couldn’t do that by assigning it to the dict directly.

When I do, I get a _Timer type which doesn’t have the getStatus() and stop() functions, the latter of which is required for my use case (I gotta be able to stop the Timer). The Timer did still work though. I just couldn’t see it’s status nor stop it.

I don’t remember. Either I got an error or that was what an example I found used and I followed it. Or it was part of some blind flailing about trying to figure out why something wasn’t working. I just tried it without the str and indeed it is not needed. I’ll update the OP.

Yep, I just caught that when I looked into the unnecessary str check you just pointed out. The name line was completly broken

And if this is the only thing I’m doing that is not Pythonic I’m not doing too bad. It takes awhile to learn the “proper” way of doing things in a new language. While not as bad as Perl, Python seems to have a lot of ways to do something with one “proper” way.

1 Like

getState()

Since getStatus() is not being used, this does not need to be done. When using global, it changes the type of timer and the two types have different methods. Without using global, you can use cancel() and isAlive(). With global, you can use stop(), getStatus(), and I think isAlive is in there too (might be is_alive). More discussion here.

Just checking if the timer is in the dict is not enough, since the timer will still be in there after it expires. Also check isAlive(). Edit: oh wait… you’re removing them :wink:.

OK, there is one place I went wrong. I didn’t realize there is both a stop and a cancel function and which one exists when depends on whether it was made a global or not. I had used cancel() and then when I changed to the code above it complained that cancel() didn’t exist. That was the first that part of the code ran and I thought I just made a dumb typo (like the getStatus() you pointed out).

But I delete it from the dict as the last line of the function called when the timer goes off.

def alert_timer_expired(itemName, name, origState):
    if items[itemName] == origState:
        send_info("{} is now {}".format(name, Transformation.transform("MAP", "admin.map", items[itemName])), status_alert.log)
        set_metadata(itemName, "Alert", {"alerted" : "ON"}, overwrite=False)
    else:
        status_alert.log.info("{} is flapping!".format(itemName))
    del alertTimers[itemName]

I’m experimenting with updates, will post them to the OP once I figure out why my timer functions is dying.

OK, all updates based on both of your recommendations appear to be working.

The more I look at this the more it makes sense why it’s not working and you’re getting strange types back. org.python.core is a Java package name, somehow the global is making Jython give you back a Java thread I think.

And if you need to check if the timer has finished executing, then if isAlive() returns False the timer has finished.

Try this:

from core.rules import rule
from core.triggers import when
from core.metadata import get_key_value, set_metadata
import personal.util
reload(personal.util)
from personal.util import send_info, send_alert
from threading import Timer
from core.actions import Transformation

# -----------------------------------------------------------------------------
# Python Timers for online alerts

alertTimers = {}

def alert_timer_expired(itemName, name, origState):
    if items[itemName] == origState:
        send_info("{} is now {}".format(name, Transformation.transform("MAP", "admin.map", items[itemName])), status_alert.log)
        set_metadata(itemName, "Alert", {"alerted" : "ON"}, overwrite=False)
    else:
        status_alert.log.info("{} is flapping!".format(itemName))
    del alertTimers[itemName]

@rule("Device online/offline", description="A device we track it's online/offline status changed state", tags=["admin"])
@when("Member of gSensorStatus changed")
def status_alert(event):

    if event.oldItemState == NULL or event.oldItemState == UNDEF:
        return

    alertedKeyVal = get_key_value(event.itemName, "Alert", "alerted")
    alerted = alertedKeyVal if alertedKeyVal else "OFF"

    nameKeyVal = get_key_value(event.itemName, "Static", "name")
    name = nameKeyVal if nameKeyVal else event.itemName

    #If the Timer exists and the sensor changed the sensor is flapping, cancel the Timer
    if event.itemName in alertTimers:
        alertTimers[event.itemName].cancel()
        del alertTimers[event.itemName]
        status_alert.log.warning(name +  " is flapping!")
        return

    '''
    If alerted == "OFF" and event.itemName == OFF than sensor went offline and we have not yet alerted
    If alerted == "ON" and event.itemName == ON then the sensor came back online after we alerted that it was offline
    '''
    if alerted == str(items[event.itemName]):
        # Wait one minute before alerting to make sure it isn't flapping
        alertTimers[event.itemName] =  Timer(60, lambda: alert_timer_expired(event.itemName, name, event.itemState))
        alertTimers[event.itemName].start()
2 Likes

Ha! I beat you to it. :slight_smile:

It works as you posted it. I just needed to use the right functions. I was lead astray by the fact that defining it one way has a stop() but the other way has a cancel(). The theory regarding the Java thread makes a lot of sense.

What I worry about though is that now we as Rules developers will need to keep track of the way that we defined the Timer because we get almost a completely different set of functions/methods depending on whether we promote them to a global or not. That will lead to a lot of confusion and posts to the forum I’m sure. Even if we document the heck out of this.

Is there anything we can do in the library that might avoid this confusion? Something like createTimer that guarantees we always get one version or the other and we can therefore only have to teach about the one?

Thanks a million to you and @5iver for all the great feedback! With just these few Rules I have a lot more confidence in continuing on.

3 Likes

Absolutely! For user-friendliness we could build a wrapper class that wraps threading.Timer and provides several analogous functions so that whichever name you use (say stop() or cancel()) it will call cancel() on the Timer object.

@5iver thoughts?

Personally, I’m a purist and prefer to stay in Python when I can. But ScriptExecution.createTimer is available and people are familiar with it from the DSL. I think this is what should be laid out for beginners. However, I do think we should document the issues that can come up with using threading.Timer, in case people venture down that path. ATM, I don’t think a wrapper class is worthwhile… there are plenty of other things that can be worked on.

We can do that, but be aware that many users are going to want to “Python all the things!” and will choose the Python Timer over OH Timers if given a choice. I certainly did and many others on the forum have done so as well.

At best I think you will have to exclusively reference the OH Timers in all docs and examples and even then it’s going to remain a problem and many threads will be created on the forum.

2 Likes

Threading presents some interesting concepts to grapple with, thankfully using timers is relatively straight-forward. When the timer thread calls the function in the lambda it is not calling it from the context of the rule function. It is a separate thread calling the lambda function and knows nothing about the rule context. There is salvation though, the rule itself exists beyond the execution of the rule when it fires, so you can pass the logger as an argument in the lambda and use it there.

Add the logger of the function you are calling from to the timer lambda call and modify said function thus:

def alert_timer_expired(itemName, name, origState, log):
    if items[itemName] == origState:
        send_info("{} is now {}".format(name, Transformation.transform("MAP", "admin.map", items[itemName])), status_alert.log)
        set_metadata(itemName, "Alert", {"alerted" : "ON"}, overwrite=False)
    else:
        log.info("{} is flapping!".format(itemName))
    del alertTimers[itemName]

The problem I had isn’t that I can’t log. The logger I have in the timer thread works actually with out needing to pass it. And I learned I could do this from the timer example pointed to by the docs.

The problem is if there is a problem in the timer function it fails silently. For example, the timer function in this very role was failing and I could tell it was failing because of the debug log statements. In fact, that is how I could tell which line was the problem. But I never did get an will in the logs.

Oh I see. I am surprised the logger worked.

@5iver still think a wrapper class for timers is unneeded? I think we can save everyone, ourselves included, a lot of headache if we wrap timers. We can try-except them and dump a stacktrace when the thread crashes. It would be fairly simple to write, I could do it this weekend even

I understand the concern, but I’m not convinced this would be necessary. It’s also bad timing for me, even to test. I’d rather wait to see if people really have issues with them. IMO, the time would be better spent working on documentation. This should help…

from core.log import log_traceback

@log_traceback
def alert_timer_expired(itemName, name, origState):
1 Like

Gah I keep forgetting about that decorator! We should recommend it’s use when writing timer triggered functions.

1 Like

I’ve added that to the Timer function above. If I understand correctly, this will bring this function into the context of the main logger so any errors will get logged out to the main logger?

The log_traceback decorator will wrap the function/class/method in a try/except and log a traceback if there is an exception. It checks to see if the decorated function has a log attribute (a rule function) or a log keyword argument was passed to the function. If so, it uses that as the logger. If not, it creates a logger using the LOG_PREFIX and uses that.

1 Like

Hi Rich,
I’ve just started considering converting some rules to Python too.
So I’ve read almost every post in this thread and some the others out of this series and try to understand the logic.

It’s a big source of inspiration to me - many thank’s for it!

Since I’m new to Python, I do not fully understand each every expression at first :wink:
e.g. the following confused me:

Is it working?
I assume it is meant doing the following, as Michael wrote:

Are the both equivalent - if yes: why ?
The or operator in Python is just a logic “or” - right?

Apologies if I’ve missed the answer somewhere in the other posts :wink:

This is a Python trick, if the first value is “falsey” you get the second one. You are correct about the two examples giving the same result, the first one is just easier to read.

2 Likes