Two rule versions running in parallel while editing a Jython rule?

Tags: #<Tag:0x00007fc3fdeceaa0> #<Tag:0x00007fc3fdece9d8>

I’m currently running today’s latest OH2.5 snapshot on OpenHABian on a RPi3B+ and while authoring an improved roller shutter automation rule in Python, I noticed that at times I have 2 instances of the authored rules running. I basically got errors from the “old” rule at places where I knew I already fixed the reported error (or the line referenced no longer made sense)… while the new rule kept silent. But at times I saw two identical lines in the log file, and some commands were clearly run twice instead of once, which made me suspicious.

For the rules, I’m using the @rule() and @when() decorators, but one rule also features a startup timer that will check that all required item registry items exist before initialising.

I’m using the system shutdown hook to properly dispose off any stale system startup timer, but apparently there’s more I should do in the scriptUnloaded() method?

The workaround is to restart OpenHAB after each edit, but that slows down my rule authoring by 3-4 minutes each time.

Olivier, you are already where angels fear to tread, be dragons, careful. Restarting OpenHAB may be a small price to pay although a pain

Maybe give HABApp a try? It’s python3 (!) and rule reloading works like a charm! :wink:

From my understanding of the code, it is unlikely that this is an OH issue. My guess is that you are not stopping a timer properly or are doing something else with threading. Your script may also have errors that are causing the javax.script.ScriptEngine (not the RuleEngine) to hang or crash.

When a script is saved, any rules previously created through that script are removed and recreated, unless there is an error in the script. The only processes that could still remain are from async threads created in the script or in a triggered rule that was created in the script. The same will happen when saving a DSL rule file containing a triggered rule with an active timer.

Without your script and logs, it’s all just guessing :slight_smile:. If you can, minimize your script to the simplest form that reproduces the issue.

Apparently I was doing something wrong with timers.

Meanwhile I am now disambiguating different runs of the same rules file by defining the following global:

logTitle = "automation_script_name.py@{ts}".format(ts=DateTime.now().toString("HH:mm:ss"))

This way I can always see which version is running by comparing timestamps when using LogAction as in the following method which describes how to launch a startup timer up to 5 times to check whether all key items in the item registry have been found:

from core.rules import rule
from core.triggers import when
from core.actions import LogAction

# Allows using the createTimer Action:
from core.actions import ScriptExecution
from org.joda.time import DateTime

# Default values and automation-specific constants:
class defaults:
    STARTUP_TIMER_RETRY_SECONDS = 30

# Global system state information:
system_startup_timer = None
system_startup_timer_retry_count = 0
system_initialized = False


def start_system_timer():
    """Start a system startup timer until all key items for this automation script have been found in the item registry,
    or until the number of retries has been exceeded.
    Unless the retry count is exceeded, initialisation will resume when all key items have been found in the item registry.
    """
    global logTitle
    logPrefix = "start_system_timer(): "

    global system_startup_timer
    global system_startup_timer_retry_count

    def system_timer_callback():
        """Callback method that is called when the timer expires. It can restart the system startup timer if needed.
        """
        global system_startup_timer
        global system_startup_timer_retry_count

        system_startup_timer_retry_count += 1
        LogAction.logDebug(
            logTitle,
            logPrefix
            + "At start of call-back (after timer expired): system_startup_timer is None? {test} -- system_startup_timer_retry_count is {retry_count}".format(
                test=str(system_startup_timer is None),
                retry_count=str(system_startup_timer_retry_count),
            ),
        )
        # Check if all key items needed for running this automation script exist:
        if ir_items.key_items_exist():
            # All key items exist, reset timer and retry counter and initialize automation:
            system_startup_timer = None
            system_startup_timer_retry_count = 0
            initialize_automation()
        elif system_startup_timer_retry_count > 5:
            # Not all key items exist in the item registry, retry counter limit reached -- nothing to do
            LogAction.logWarn(
                logTitle,
                logPrefix
                + "system_startup_timer_retry_count {retry_count} > 5 -- Nothing to do".format(
                    retry_count=str(system_startup_timer_retry_count)
                ),
            )
        else:
            # Not all key items exist in the item registry, reinitialize the timer
            LogAction.logInfo(
                logTitle,
                logPrefix
                + "At end of call-back (will re-spawn timer): system_startup_timer is None? {test} -- system_startup_timer_retry_count is {retry_count}".format(
                    test=str(system_startup_timer is None),
                    retry_count=str(system_startup_timer_retry_count),
                ),
            )
            system_startup_timer = None
            start_system_timer()

    if system_startup_timer is None:
        # This should always be the case, since we are in the callback method that runs when the timer expired
        LogAction.logInfo(
            logTitle,
            logPrefix
            + "system_startup_timer is None, retry count is {retry_count} -- Starting new timer".format(
                retry_count=str(system_startup_timer_retry_count)
            ),
        )
        system_startup_timer = ScriptExecution.createTimer(
            DateTime.now().plusSeconds(defaults.STARTUP_TIMER_RETRY_SECONDS), system_timer_callback
        )
    else:
        # This should not happen, unless there is an early restart or a double-trigger event -- do nothing
        LogAction.logWarn(
            logTitle,
            logPrefix
            + "system_startup_timer exists, retry count is {retry_count} -- THIS SHOULD NOT HAPPEN".format(
                retry_count=str(system_startup_timer_retry_count)
            ),
        )

In principle I could use a Python timer instead of an OpenHAB timer since I’m never rescheduling it.

The code above will work if (1) you define the following class (and instantiate it once), and (2) you define initialize_automation():

class my_item_registry_items:
    """Contains information on all item registry items required for this automation.
    """
    KEY_ITEM_A = "Item_name_for_key_item_A"
    KEY_ITEM_B = "Item_name_for_key_item_B"
    OPTIONAL_ITEM_C = "Item_name_for_optional_item_C"

    def key_items(self):
        return list( (
        KEY_ITEM_A,
        KEY_ITEM_B,
        )
    )

    def key_items_exist(self):
        """Check that all key items exist in the item registry. Can be used also to delay initialisation due to system startup

        Returns:
            boolean -- True if all items in the item list were found in item registry, else False
        """
        global logTitle
        logPrefix = "key_items_exist(): "

        missing_items = list()
        errors = 0

        for item_name in self.key_items():
            LogAction.logDebug(
                logTitle,
                logPrefix
                + "Checking the existence of key item '{item_name}' in the item registry".format(
                    item_name=item_name,
                ),
            )
            if len(itemRegistry.getItems(item_name)) == 0:
                LogAction.logError(
                    logTitle,
                    logPrefix
                    + "The key item '{item_name}' was not found in the item registry".format(
                        item_name=item_name
                    ),
                )
                missing_items.append(item_name)
                errors += 1
        if errors > 0:
            LogAction.logError(
                logTitle,
                logPrefix
                + "{num} key item{s} for the automation rules couldn't be retrieved in the item registry".format(
                    num=errors, s="" if errors == 1 else "s"
                ),
            )
            return False
        return True

ir_items = my_item_registry_items()

This way my System started rule can remain short:

@rule(
    "Automation - System started or automation enabled",
    description="Initialization when rules file is reloaded",
)
@when("System started")
@when(
    "Item {item} changed to ON".format(item=ir_items.AUTOMATION_SWITCH_ITEM)
)
def Rule_System_started(event):
    global logTitle
    logPrefix = "Rule_System_started(): "

    LogAction.logInfo(
        logTitle, logPrefix + "Verifying the existence of all key items in the item registry"
    )
    if ir_items.key_items_exist():
        initialize_automation()
    else:
        # Not all key items found in the item registry -- retry by using a timer:
        start_system_timer()

The actual initialization takes place in initialize_automation().

I just found out that I now have a new menu item in Paper UI: “Rules”. I can see all rules and probably also delete orphaned rules from that overview.

This comes with the new rule engine. You can’t delete rules created from scripts, but you can enable/disable them.

Good to know that rules can be disabled this way.

To allow easier identification of zombie rules, I now implemented the following trick to add a timestamp to each rule in a ruleset file:

from core.rules import rule
from core.triggers import when
from org.joda.time import DateTime


rule_init_timestamp = DateTime.now()
logTitle = "my_rule_set.py@{ts}".format(ts=rule_init_timestamp.toString("HH:mm:ss"))
ruleTimeStamp = " -- (Rule set intialised {date} at {ts})".format(
    date=rule_init_timestamp.toString("E d MMM yyyy"),
    ts=rule_init_timestamp.toString("HH:mm:ss (z)"),
)
rulePrefix = "My Rule Set | "


@rule(
    rulePrefix + "My Rule Set - System started",
    description="Initialize system state when the rule is reloaded" + ruleTimeStamp,
    tags=["my rule tag 1", ruleTimeStamp],
)
@when("System started")
def Rule_SystemStarted(event):

    initialise() # Initialisation happens in this method

Here’s how the rules overview now looks like in Paper UI:


By adding a rulePrefix to the @rule() decorators, I overcome the shortcomings of the current Paper UI implementation.

Duplicate rule instantiations can now be identified via Paper UI (note: the following screenshot predates the addition of rulePrefixto the @rule() decorators):

I didn’t see yet the tags appear in Paper UI ut I can imagine other admin UIs will allow to sort / filter on given tags, including the timestamp the rule was restarted.

Incidentally, I’ve noticed that, unlike with the Xtend-based DSL rules, editing one Python ruleset file will sometimes trigger a reload of all Python ruleset files…

By the way, I just found out how to list all NGRE rules via Karaf:

openhab> smarthome:automation listRules                                                                                                                                                                                                     
----------------------------------------------------------------------------------------------------
ID     UID                                 NAME                                STATUS         
----------------------------------------------------------------------------------------------------
1      1adb7ce4-5a50-43ae-bc32-37e23c6d151a TripleClick - Click                 IDLE           
2      2a5f8d59-019d-444d-aa28-beb2c699745f Update temperature info             IDLE           
3      3a81d373-bbc1-4306-8247-c5089a6ff3ff Initialize opening / closing times  IDLE           
4      50178d8c-f6a4-4bd8-ad0a-8725bca62cc0 Execute state machine automation    IDLE           
5      6be58a4a-bb20-4cf2-92ae-c49f7b53a617 Shutter automation - System started or automation enabled IDLE           
6      713ce26d-9b6a-454f-9901-20611a584ede Update temperature info             IDLE           
7      93807a16-bf6f-4230-a933-0d6b1fc4459b Initialize opening / closing times  IDLE           
8      9cfb8ad4-df87-4192-a061-dee6bb8ed407 TripleClick - System started        IDLE           
9      a23355d9-7433-4f80-85fb-9293a4393bf9 Shutter automation - System started or automation enabled IDLE           
10     afa48348-8dc8-4c0b-9dbd-7a30bc89b729 Buienradar - Update earliest rain forecast timestamp IDLE           
11     d70669b7-70bf-4181-8044-34974468bd2b System shutdown or automation disabled IDLE           
12     dc290dc2-6269-4d65-a683-9f8e580191f8 Buienradar - Update earliest rain forecast timestamp IDLE           
13     f6d092f3-95c0-483a-b4bf-1dd456e1de9c System shutdown or automation disabled IDLE           
14     fb827b46-6674-47e2-ab33-551e1d915083 Execute state machine automation    IDLE           
----------------------------------------------------------------------------------------------------

However, it appears I can’t remove the following existing rule:

openhab> smarthome:automation  removeRule 713ce26d-9b6a-454f-9901-20611a584ede
Rule with id '713ce26d-9b6a-454f-9901-20611a584ede' does not exist.

So it appears to me that some rules crashed without properly cleaning up their mess in the OH system.

For all scripts to have been reloaded, the files would have all needed to have been changed. Maybe there’s something in your environment that can explain both this and the duplicate rules.

Rules created through scripted automation can only be removed through the script that created them.

I’m happy to look into this for you, but…

And I now found out how I can restart the OH rule engine as well:

bundle:restart "openHAB Core :: Bundles :: Automation Script RuleSupport"