Unreliable rule triggering (cron) with openhab 2.3

I can’t say for certain that the finally can be reliably relied upon or not. I have some anecdotal evidence dating back to the 2.0 time frame that you can’t, but I’ve no recent experiences to say whether this is still the case and I didn’t have the time or expertise back when I did see this to fully explore the problem.

Anyway, I agree with Markus. You can program Fortran in any language and this looks like programming Java in Rules DSL. I can’t say if they are causing any problems per se, but that does stand out. And if you are running on an RPi there is some evidence that using primitives (.intValue) greatly increases the amount of time it takes to parse the .rules files for some reason (NOTE: I have more evidence for this then I do for finally not always running).

Anyway, as I documented here ((OH 1.x and OH 2.x Rules DSL only] Why have my Rules stopped running? Why Thread::sleep is a bad idea), any time I see Thread::sleep, locks, executeCommandLine, or sendHttp*Request calls I immediately think we have a running out of threads problem.

Another problem could be that you have too many cron triggered rules trying to trigger at the same time. There are only two threads available for cron triggered Rules so that could be a problem.

But why are you using a lock in a cron triggered Rule in the first place? What problem are you trying to solve with this lock? Are these the only two Rules that use this lock?

It isn’t. There really is no way the lock could be persistent across reloads of the same .rules file let alone a reboot of OH. Well, actually there is one way I know of. I’ve seen it reported that sometimes Timers that are created and then the .rules file is reloaded the Timers stick around and still trigger. So if you had a reference to your lock inside the body of such a Timer then it could stick around on the reload of the .rules file, but not a restart of OH.

I suspect what is happening is there is some edge case or unexpected data taking place during the Rules that trigger on System started in those cases that is preventing the lock from becoming unlocked.

Personally, were this my code, I’d focus on writing this in a way that either doesn’t require the locks in the first place or in a way that centralizes the lock in one Rule (see Design Pattern: Gate Keeper for example). The code will be less brittle and less complex overall as a result.

If you do want to debug this problem you can first look to see if you are in fact running out of cron triggered Rules threads (you only get 2 by default). After doing what Markus recommends you should be able to see if Rules are trying to trigger or not. Then you can see how many of your threads are in use using the commands Scott posted here.

Why did this code work before and doesn’t work now? I cannot say. Maybe quartz had more threads on those earlier versions and 2 is just not enough the way this code executes. Maybe the finally isn’t running (in which case I’d be very interested to know that is the case) so the lock isn’t being unlocked. Maybe it is just a change in timing and this has always been a problem but the timing is different when later versions of OH run.

hi guys,

thank you very much. my system is working now with 2.3 and it survives restarts.

I changed:

  • reduced the amount of ReentrantLock signifiantly
  • reduced the remaining locks (ReentrantLock) to the minimal scope
  • removed Thread:sleeps

Nevertheless I’m still suspicious, that this special situation shouldn’t had happened. I could see an error in the logs (RuleEngineImpl - Error during the execution), which should have been catched (and logged) be a catch in my lambda function or by a catch in my rule. Now I cannot reproduce this error any more.

But I see that the error handling is not working properly in lamda functions. The error is catched in the rule, not in the lambda function:

val Functions.Function1<NumberItem, Number> determineFactorX =
    [ NumberItem itemScaleFactor |
        val String logClass = "rrinfo"
        var String logKey = "determineFactorX"
        var Number powerScaleFactor = 0.0  // multiply with

        try {
        	powerScaleFactor = (NULL as Number).doubleValue()   // ERROR  !!!!!!!

        	// ...
        } catch (Throwable t) {
            logError(logClass, logKey + t.toString)
        }
        return powerScaleFactor
    ]


rule testCatches
when
    Time cron "11/4 * * * * ?"
then
    val String logClass = "rrdebug"
    val String logKey = "testCatches - "
    try {
        logInfo(logClass, logKey + "in")
        val Number scaleFactor = determineFactorX.apply(mbRawPvMpptPowerSf)
        // ...
        logInfo(logClass, logKey + "out")
    }
    catch (Throwable t) {
        logError(logClass, logKey + t.toString) 	// ERROR get catch here
    }
end

Log output:

// mostly
... testCatches - org.eclipse.smarthome.model.script.engine.ScriptExecutionException: Could not cast NULL to java.lang.Number ...
// sometimes (but I get this error also at calling other lambdas randomly)
... testCatches - java.lang.IllegalStateException

Filed an issue:
https://github.com/eclipse/smarthome/issues/6218

There is also a problem with lambda functions, which seems not to be thread safe.
https://community.openhab.org/t/lambda-functions-fail-not-thread-safe/51696

You’re obviously a Java programmer, but to clearly state this again: the rules DSL is NOT Java.
That implies a lot, namely that you mustn’t apply Java concepts. You expect everything to be threadsafe and capable of handling exceptions, and while it would of course be nice if that was the case, there should be nowhere that is promising you it is.
Sorry to be harsh, but get that Java picture off your head.

PS: yes still file that issue, please. Developers welcome any hints. Ask them if try/catch was supposed to work at all (I don’t think it ever has been, let alone being threadsafe).

Please read this thread again. I don’t expect thread-safe-ness. I don’t mind to use locks (ReentrantLock) to cope with threads. I removed all my locks from my OH 2.1 code, because they were not working properly at restarts (locked sections).

And no, I’m not a java programmer. Writing function and not coping code over and over again has nothing to do with java-ism!

But the functions are called within a threaded environment, all in OH is threaded. And when lambda functions are not thread safe, so I need working locks.

I don’t get out of this circle, besides

  • go back to OH 2.1
  • replace all functions by copy and paste

But NO, I’m not a copy-and-paste-programmer.

Or secret option 3, write your Rules more generically so there is no need for either the lambdas or duplicated code.

That is what I’ve been trying to say all along. There should only rarely be a need for a lambda. In all of my production Rules code I have zero lambdas (not counting lambdas passed to createTimer, forEach, and the like). And I have zero duplicated code. I don’t need lambdas and I don’t need Locks because I write one Rule to handle all my lights, one Rule to handle all my door reminders, one rule to handle presence detection, and so on.

And no, I’m not writing thousand line long Rules. My longest Rule is around 100 lines. It is relatively easy to write Rules like this in Rules DSL, but it requires thinking about the problem differently from what you may be used to. And a lot of OH users, particularly those who already know how to program, can not or will not do that. And that is fine too because there is the very excellent JSR223 add-on that lets you write your Rules in Jython, JavaScript, or Groovy. You are not required to use the Rules DSL. Another alternative is to use NodeRed to drive your Rules.

You may not be a Java programmer but I bet you are a C#, C, C++, or Python developer and you are used to breaking your problems up in a certain way. Well, your usual approach leads to excessively long, brittle, and overly complicated Rules code when applied to the Rules DSL.

So I’d recommend reviewing the Design Patterns and figure out alternative approaches to solve your problems in a way the Rules DSL supports better, or consider switching to JSR223 where you will have all the features and support for your usual approaches to solving programming problems. If you want help with Rules DSL I’m more than happy to provide what help I can. I frequently provide such help and recommendations.

Just as conclusion.

My OpenHab installation is now running stable with version 2.3. But I had to change some code!

I had to protect lambda function with locks:

var ReentrantLock gFormatTempHumi  = new ReentrantLock()

rule formatTempHumiNorthSide
when
    System started
    or Item valTempNorthSide changed
    or Item valHumiNorthSide changed
then
    val String logClass = "rrinfo"
    val String logKey = "formatTempHumiNorthSide - "

    gFormatTempHumi.lock
    try {
        formatTempHumi.apply(showTempHumiNorthSide, valTempNorthSide, valHumiNorthSide)
    } catch (Throwable t) {
        logError(logClass, logKey + t.toString)
    } finally {
        gFormatTempHumi.unlock
    }
end

Minimizing exceptions: Before doing something with item state I now check for NULL or type. (The exception handling doesn’t seem to be waterproof.) I my case there are lots of invalid states at startup I have to handle.

if (itemTemp.state instanceof Number)
    temp = String::format("%.1f °C", (itemTemp.state as Number).floatValue())
1 Like

Highlighting this as the key to success.

In my view, exception handling in lambdas is umm, flakey. So that makes Try/catch/finally handling unreliable. As a consequence of that in turn - when there’s a Lock involved, that will mess up too.

It can all work reliably - if you write bullet-proof code that does not give the exceptions in the first place.
(and that requires some thought, because the exceptions don’t give clear reports. Gahh!)

Very frustrating when you fall into this trap, but once you have established the programming habits to deal with it, all is well.

Not quite. I think you mean the right thing but fail to be explicit enough.The message rather needs to be:

Don’t use exceptions.
Instead, write code that does not fail.
If you do, you don’t need to use any exceptions, and it is of no relevance if they work or not.

Change your way of thinking and coding rather than to expect OH to deliver the methodology and toolset you are used to.

Yeah I know no programmer wants to be told that. It’s still meant to be friendly and valuable advice if the two of you (OH/Xtend and yourself) want to get married to happily live with each other at least some distant day …

That’s pure theory without practical use!

Without exception/finally handling you cannot use reliably lambda functions. See my other thread https://community.openhab.org/t/lambda-functions-fail-not-thread-safe/51696/2. The bug is confirmed by other users.

I expect documented language features to be working and not get indoctrinate by people who are not used to that a little bit “more advanced” features.

The exception handling worked more stable in OH 2.1 than in OH 2.3.

The key is in the second sentence: write code that does not fail.
You don’t need to use exceptions when your code is bullet-proof.
You don’t even need to use lambdas.
It’s just that you want to use both of them because you are used to that way of programming.

You’re actually the only person I have seen to date to use exceptions in OH rules.
Bottom line is it doesn’t work. Ok. So you can either keep expecting and insisting on this to be a bug and wait for it to get fixed some day [or never because some developer has a different opinion on whether that is supposed to work at all - I don’t know if it is a bug and in fact I don’t care].
Or you change your fundamental approach and refactor your code to get the job done as a number of experienced people here are effectively advising you to.

Not necessarily true. It depends on how the lambda is written and what, if any shared resources the lambda uses.

And when you read that thread you will find that even WITH the exception handling and the locks you can’t reliably use lambda functions because you cannot guarantee that the finally will execute every time. Some errors will kill the Rule and cannot be caught.

We should only rarely be using lambdas in the first place. Lambdas were never a documented feature of the Rules DSL. The were something that was discovered, not written with Rules DSL in mind.

Lambdas are NOT documented anywhere in the OH Docs (I just did a search, the only mention of lambda is a placeholder in the beginners tutorial). The only documentation for them are in forum postings.

Global lambdas are not recommended for general use (I should add that warning to the lambda example posting). They should only be employed in very rare circumstances. If you can’t or won’t consider an approach that doesn’t require lambdas, you should consider switching to JSR223. Otherwise you should write your Rules so they are more generically (i.e. rather than many Rules that call a common lambda, have one Rule) or trigger other Rules to implement the common code by generating an event (see Separation of Behaviors DP).

I find a bit of a stretch to call an undocumented feature of the language not working as expected being a bug. It’s an undocumented feature of the language in the first place and as the person who has probably written the most text on what lambdas are and how they work on this forum I’m saying you shouldn’t be using them. Lambdas were never really designed for our use in Rules. They just happened to be there. We shouldn’t be surprised that they don’t work well.

No it didn’t. All the same problems with exception handling and the finally clause not executing every time, particularly in lambdas, has existed since at least OH 1.8.1. It may be the case that the stricter checking introduced between OH 2.2 and 2.3 is causing more things that have always been problems to result in exceptions instead of just silently failing, but the problem with exceptions remains the same.

I’ve seen them lots, but it is almost always when the user is also using ReentrantLocks. If you are using locks, it is almost a must to use try/catch/finally. However, because you can’t trust that the finally will always be called and therefore the lock unlocked because of how some errors kill the Rule, locks, like lambdas, should only be used sparingly and everything possible must be done in such Rules to prevent exceptions from occurring. I strongly recommend against using locks in lambdas.

If I were to guess, I’d say that lambdas are inherited from Xtend but they were never intended to be used in a multi-threaded context like what exists in OH Rules DSL. Looking at all of the examples I can find, a new instance of the lambda is created for each thread. With a separate lambda per thread there is no need for the locks and no need for thread safeness. But if we create a new instance of the lambda in each Rule then there is no point in using the lambda in Rules DSL in the first place.

1 Like

To be fair, I’ve got try/catch exception handling in my lambdas.
But it’s there as belt and braces, as I have written the code so that the exceptions don’t arise. (so far as I can make it!!)

(belt and braces = English idiom meaning protected by redundancy against trousers falling down)

As you say, this a practical approach to make a working system, whether or not there are bugs in this area of DSL. I expect it to continue working whether or not those bugs get fixed one day.

Documented in the OH doc is the “finally” handling. See here:
https://www.openhab.org/docs/configuration/rules-dsl.html#concurrency-guard
But if there is a “finally” than the “catch” should work too (technically).

Just to make clear: I don’t claim exceptions should be used as substitution for carefully programming. But exception catching and logging is very helpful for debugging and finding own bugs.

In my OH installation there are a lot of invalid items state at startup. I have no real change to debug these rule (only by a restart). At least I can see that something was going wrong…

@mstorm: If you don’t use exception blocks and logging, you can’t know what’s going wrong in your system - that could lead to feeling very comfortable. But in professional environments that is an antipattern for good reason.

Topic documentation

In my eyes OH does not have an own documentation about rule syntax, there are just some examples.

https://www.openhab.org/docs/configuration/rules-dsl.html#the-syntax

Note: The rule syntax is based on Xbase and as a result it is sharing many details with Xtend, which is built on top of Xbase as well. As a result, we will often point to the Xtend documentation for details.

So I looked in Xtend docu for language features too. And there you can find lambda functions and try-catch-constructs.

[quote=“raulix, post:24, topic:51431”]
@mstorm: If you don’t use exception blocks and logging, you can’t know what’s going wrong in your system - [/quote]

Not true as a general statement.
There’s debuggers and static code analysis tools, just to name some alternative approaches.
And logging is available in OH.

Wrong, you do. There’s multiple options:

  • you can check for NULL/invalid state in your code before using (that’s part of the ‘write code that does not fail’ approach)
  • you can initialize items on system startup using ‘System started’ trigger (also part of that approach)
    You can even use persistence on items.
    FWIW, it helps to make sure your system loads items before starting on rules execution.
    There’s threads on this in the forum.

I know debugger and static code analysis tools for several other programming languages, but there are no such tools documented for OH rule files. (VSCode with LSB is something else, but helpful though.)

By “Checking for NULL states” you don’t suppress/exclude other programming mistakes.

Items can be NULL despite of persistence. See here: https://www.openhab.org/docs/configuration/persistence.html#startup-behavior

Persistence services and the Rule engine are started in parallel.

I rather feel you just want to avoid going down that route at all cost.
This is getting annoying so I’ll stop trying to convince you. Take the advice or leave it.

Only if the exception is thrown back into the Rule. It appears that in some cases the exception takes place in and is aught by the Rule Engine itself and instead of rethrowing the exceptions back up the stack trace and into the Rule it gets caught outside your Rule and the Rule gets terminated. And this problem is exacerbated by lambdas for some reason. I’ve seen, but not been able to study, that a exception that happens in a Rule can be caught in the Rule, but if it happens in the lambda it kills the Rule.

This is the root to why locks + lambdas + exception handling is such a dangerous thing in OH.

There are some times when a lock cannot be avoided, just like there are times where a lambda or a Thread::sleep cannot be avoided. But because of the way the Rules DSL works, all of these should be treated as a last resort. It is worth going to extraordinary lengths to avoid their use.

I’ll be the first to admit that there are some very significant problems with the Rules DSL. Error handling is perhaps its weakest point. I like the Rules DSL because it is easier for new users to become productive in it compared to more feature rich languages, but I would never ever recommend it for use in a “professional environment.” If you are used to and expecting something like that, I again implore you to use JSR223 instead.

Because of the limitations of exception handling outlined above, relying on exception handling in your Rules gives you a similar false sense of comfort. I think the counter argument is that the Rules DSL isn’t a processional environment. It’s not intended to be. And because of the limitations of exception handling one could make a good argument that within the Rules DSL using try/catch, and especially try/catch/finally would be an anti-pattern.

For the most part, the number of things that can cause an exception in a Rule is relatively low.

  1. The biggest one is type problems (e.g. an Item is NULL or UNDEF but the Rule just assumes it is a Number without checking first).
  2. The next biggest one is trying to parse a String to a Number or a DateTime that is not correct for that type.
  3. Next comes trying to sendCommand or postUpdate an invalid state to an Item (e.g. a Number > 100 to a Dimmer Item).

1 is the most troublesome because this exception usually just kills the Rule and cannot be caught.
3 is actually just a special case of 1.

I can’t think of others off the top of my head. I’m sure there are more. But all three of the above can be handled quite effectively by using standard defensive programming techniques (i.e. checking the type of an Item before you try to use it). And I believe both 1 and 3 can’t be caught in your Rule anyway. Your Rule will just stop.

We would welcome any contributions you can make to improve them.

There are also language features like arrays and classes which do not exist in the Rules DSL in the Xtend docs. Just because Xtend supports it doesn’t mean Rules DSL does.

Personally, I have high hopes for the Experimental Rules Engine. Not only does it introduce concepts that will make certain types of Rules really easy to create and it gives the non-technical users a nice UI based way to create Rules, but under the covers it appears to support JSR223 Rules. By default we can enter JavaScript in the UI but 5iver has discovered that the ERE can call Jython and I would assume Groovy written Rules as well. I’m hoping that a lot of these sorts of issues will simply go away.

My solution to these problems: an external rule engine based on Python 3 and the REST API.

Features:

  • connects to OpenHAB via REST API
  • gets updated by change notifications about state changes of items, groups, things
  • reads and caches states of items, groups and things.
  • change states via COMMANDs and UPDATEs
  • write your own rule classes, subscribe for state changes and cron events
  • runs as daemon (command line interface: --start/–stop)
  • programming language: Python 3 (use your favorite tools incl. debugger, unittests, code inspection)

Homepage: https://github.com/rosenloecher-it/prend