Unreliable rule triggering (cron) with openhab 2.3

So the question is, if I can rely on catch or finally!

Your answer sounds like: No !?

That’s quite interesting…

By the way, the lock with try, catch,finally is a openhab design pattern (https://community.openhab.org/t/design-pattern-rule-latching/32748

Your code works without problems and errors in the log

but I think, it’s not your whole code, because line 621.
Check this line.
I would use

or Time cron "23 0/5 * ? * * *"

Thanks Harry, but the cron rule is working. The rules get called, but I could not see that before, because I had my log-calls at the wrong places (after .lock()).

(I would also recommend: or Time cron “23 0/5 * * * ? *” # The last star is for the optional year.)

The problem is, if I can rely on this pattern for synchronization:

import java.util.concurrent.locks.ReentrantLock;
var ReentrantLock lockLedStatus  = new ReentrantLock()

logDebug(logClass, logKey + "in")
lockLedStatus.lock()
try {
	// do some stuff
	    
    logDebug(logClass, logKey + "out")
}
catch (Throwable t) {
    logError(logClass, logKey + t.toString)
}
finally {
	// has to be called always
    lockLedStatus.unlock()
}

Even in the posted logs, there is an error which got not catched by my own catch. This is weird and dangerous.

I can’t say for certain that the finally can be reliably relied upon or not. I have some anecdotal evidence dating back to the 2.0 time frame that you can’t, but I’ve no recent experiences to say whether this is still the case and I didn’t have the time or expertise back when I did see this to fully explore the problem.

Anyway, I agree with Markus. You can program Fortran in any language and this looks like programming Java in Rules DSL. I can’t say if they are causing any problems per se, but that does stand out. And if you are running on an RPi there is some evidence that using primitives (.intValue) greatly increases the amount of time it takes to parse the .rules files for some reason (NOTE: I have more evidence for this then I do for finally not always running).

Anyway, as I documented here (Why have my Rules stopped running? Why Thread::sleep is a bad idea), any time I see Thread::sleep, locks, executeCommandLine, or sendHttp*Request calls I immediately think we have a running out of threads problem.

Another problem could be that you have too many cron triggered rules trying to trigger at the same time. There are only two threads available for cron triggered Rules so that could be a problem.

But why are you using a lock in a cron triggered Rule in the first place? What problem are you trying to solve with this lock? Are these the only two Rules that use this lock?

It isn’t. There really is no way the lock could be persistent across reloads of the same .rules file let alone a reboot of OH. Well, actually there is one way I know of. I’ve seen it reported that sometimes Timers that are created and then the .rules file is reloaded the Timers stick around and still trigger. So if you had a reference to your lock inside the body of such a Timer then it could stick around on the reload of the .rules file, but not a restart of OH.

I suspect what is happening is there is some edge case or unexpected data taking place during the Rules that trigger on System started in those cases that is preventing the lock from becoming unlocked.

Personally, were this my code, I’d focus on writing this in a way that either doesn’t require the locks in the first place or in a way that centralizes the lock in one Rule (see Design Pattern: Gate Keeper for example). The code will be less brittle and less complex overall as a result.

If you do want to debug this problem you can first look to see if you are in fact running out of cron triggered Rules threads (you only get 2 by default). After doing what Markus recommends you should be able to see if Rules are trying to trigger or not. Then you can see how many of your threads are in use using the commands Scott posted here.

Why did this code work before and doesn’t work now? I cannot say. Maybe quartz had more threads on those earlier versions and 2 is just not enough the way this code executes. Maybe the finally isn’t running (in which case I’d be very interested to know that is the case) so the lock isn’t being unlocked. Maybe it is just a change in timing and this has always been a problem but the timing is different when later versions of OH run.

hi guys,

thank you very much. my system is working now with 2.3 and it survives restarts.

I changed:

  • reduced the amount of ReentrantLock signifiantly
  • reduced the remaining locks (ReentrantLock) to the minimal scope
  • removed Thread:sleeps

Nevertheless I’m still suspicious, that this special situation shouldn’t had happened. I could see an error in the logs (RuleEngineImpl - Error during the execution), which should have been catched (and logged) be a catch in my lambda function or by a catch in my rule. Now I cannot reproduce this error any more.

But I see that the error handling is not working properly in lamda functions. The error is catched in the rule, not in the lambda function:

val Functions.Function1<NumberItem, Number> determineFactorX =
    [ NumberItem itemScaleFactor |
        val String logClass = "rrinfo"
        var String logKey = "determineFactorX"
        var Number powerScaleFactor = 0.0  // multiply with

        try {
        	powerScaleFactor = (NULL as Number).doubleValue()   // ERROR  !!!!!!!

        	// ...
        } catch (Throwable t) {
            logError(logClass, logKey + t.toString)
        }
        return powerScaleFactor
    ]


rule testCatches
when
    Time cron "11/4 * * * * ?"
then
    val String logClass = "rrdebug"
    val String logKey = "testCatches - "
    try {
        logInfo(logClass, logKey + "in")
        val Number scaleFactor = determineFactorX.apply(mbRawPvMpptPowerSf)
        // ...
        logInfo(logClass, logKey + "out")
    }
    catch (Throwable t) {
        logError(logClass, logKey + t.toString) 	// ERROR get catch here
    }
end

Log output:

// mostly
... testCatches - org.eclipse.smarthome.model.script.engine.ScriptExecutionException: Could not cast NULL to java.lang.Number ...
// sometimes (but I get this error also at calling other lambdas randomly)
... testCatches - java.lang.IllegalStateException

Filed an issue:
https://github.com/eclipse/smarthome/issues/6218

There is also a problem with lambda functions, which seems not to be thread safe.
https://community.openhab.org/t/lambda-functions-fail-not-thread-safe/51696

You’re obviously a Java programmer, but to clearly state this again: the rules DSL is NOT Java.
That implies a lot, namely that you mustn’t apply Java concepts. You expect everything to be threadsafe and capable of handling exceptions, and while it would of course be nice if that was the case, there should be nowhere that is promising you it is.
Sorry to be harsh, but get that Java picture off your head.

PS: yes still file that issue, please. Developers welcome any hints. Ask them if try/catch was supposed to work at all (I don’t think it ever has been, let alone being threadsafe).

Please read this thread again. I don’t expect thread-safe-ness. I don’t mind to use locks (ReentrantLock) to cope with threads. I removed all my locks from my OH 2.1 code, because they were not working properly at restarts (locked sections).

And no, I’m not a java programmer. Writing function and not coping code over and over again has nothing to do with java-ism!

But the functions are called within a threaded environment, all in OH is threaded. And when lambda functions are not thread safe, so I need working locks.

I don’t get out of this circle, besides

  • go back to OH 2.1
  • replace all functions by copy and paste

But NO, I’m not a copy-and-paste-programmer.

Or secret option 3, write your Rules more generically so there is no need for either the lambdas or duplicated code.

That is what I’ve been trying to say all along. There should only rarely be a need for a lambda. In all of my production Rules code I have zero lambdas (not counting lambdas passed to createTimer, forEach, and the like). And I have zero duplicated code. I don’t need lambdas and I don’t need Locks because I write one Rule to handle all my lights, one Rule to handle all my door reminders, one rule to handle presence detection, and so on.

And no, I’m not writing thousand line long Rules. My longest Rule is around 100 lines. It is relatively easy to write Rules like this in Rules DSL, but it requires thinking about the problem differently from what you may be used to. And a lot of OH users, particularly those who already know how to program, can not or will not do that. And that is fine too because there is the very excellent JSR223 add-on that lets you write your Rules in Jython, JavaScript, or Groovy. You are not required to use the Rules DSL. Another alternative is to use NodeRed to drive your Rules.

You may not be a Java programmer but I bet you are a C#, C, C++, or Python developer and you are used to breaking your problems up in a certain way. Well, your usual approach leads to excessively long, brittle, and overly complicated Rules code when applied to the Rules DSL.

So I’d recommend reviewing the Design Patterns and figure out alternative approaches to solve your problems in a way the Rules DSL supports better, or consider switching to JSR223 where you will have all the features and support for your usual approaches to solving programming problems. If you want help with Rules DSL I’m more than happy to provide what help I can. I frequently provide such help and recommendations.

Just as conclusion.

My OpenHab installation is now running stable with version 2.3. But I had to change some code!

I had to protect lambda function with locks:

var ReentrantLock gFormatTempHumi  = new ReentrantLock()

rule formatTempHumiNorthSide
when
    System started
    or Item valTempNorthSide changed
    or Item valHumiNorthSide changed
then
    val String logClass = "rrinfo"
    val String logKey = "formatTempHumiNorthSide - "

    gFormatTempHumi.lock
    try {
        formatTempHumi.apply(showTempHumiNorthSide, valTempNorthSide, valHumiNorthSide)
    } catch (Throwable t) {
        logError(logClass, logKey + t.toString)
    } finally {
        gFormatTempHumi.unlock
    }
end

Minimizing exceptions: Before doing something with item state I now check for NULL or type. (The exception handling doesn’t seem to be waterproof.) I my case there are lots of invalid states at startup I have to handle.

if (itemTemp.state instanceof Number)
    temp = String::format("%.1f °C", (itemTemp.state as Number).floatValue())
1 Like

Highlighting this as the key to success.

In my view, exception handling in lambdas is umm, flakey. So that makes Try/catch/finally handling unreliable. As a consequence of that in turn - when there’s a Lock involved, that will mess up too.

It can all work reliably - if you write bullet-proof code that does not give the exceptions in the first place.
(and that requires some thought, because the exceptions don’t give clear reports. Gahh!)

Very frustrating when you fall into this trap, but once you have established the programming habits to deal with it, all is well.

Not quite. I think you mean the right thing but fail to be explicit enough.The message rather needs to be:

Don’t use exceptions.
Instead, write code that does not fail.
If you do, you don’t need to use any exceptions, and it is of no relevance if they work or not.

Change your way of thinking and coding rather than to expect OH to deliver the methodology and toolset you are used to.

Yeah I know no programmer wants to be told that. It’s still meant to be friendly and valuable advice if the two of you (OH/Xtend and yourself) want to get married to happily live with each other at least some distant day …

That’s pure theory without practical use!

Without exception/finally handling you cannot use reliably lambda functions. See my other thread https://community.openhab.org/t/lambda-functions-fail-not-thread-safe/51696/2. The bug is confirmed by other users.

I expect documented language features to be working and not get indoctrinate by people who are not used to that a little bit “more advanced” features.

The exception handling worked more stable in OH 2.1 than in OH 2.3.

The key is in the second sentence: write code that does not fail.
You don’t need to use exceptions when your code is bullet-proof.
You don’t even need to use lambdas.
It’s just that you want to use both of them because you are used to that way of programming.

You’re actually the only person I have seen to date to use exceptions in OH rules.
Bottom line is it doesn’t work. Ok. So you can either keep expecting and insisting on this to be a bug and wait for it to get fixed some day [or never because some developer has a different opinion on whether that is supposed to work at all - I don’t know if it is a bug and in fact I don’t care].
Or you change your fundamental approach and refactor your code to get the job done as a number of experienced people here are effectively advising you to.

Not necessarily true. It depends on how the lambda is written and what, if any shared resources the lambda uses.

And when you read that thread you will find that even WITH the exception handling and the locks you can’t reliably use lambda functions because you cannot guarantee that the finally will execute every time. Some errors will kill the Rule and cannot be caught.

We should only rarely be using lambdas in the first place. Lambdas were never a documented feature of the Rules DSL. The were something that was discovered, not written with Rules DSL in mind.

Lambdas are NOT documented anywhere in the OH Docs (I just did a search, the only mention of lambda is a placeholder in the beginners tutorial). The only documentation for them are in forum postings.

Global lambdas are not recommended for general use (I should add that warning to the lambda example posting). They should only be employed in very rare circumstances. If you can’t or won’t consider an approach that doesn’t require lambdas, you should consider switching to JSR223. Otherwise you should write your Rules so they are more generically (i.e. rather than many Rules that call a common lambda, have one Rule) or trigger other Rules to implement the common code by generating an event (see Separation of Behaviors DP).

I find a bit of a stretch to call an undocumented feature of the language not working as expected being a bug. It’s an undocumented feature of the language in the first place and as the person who has probably written the most text on what lambdas are and how they work on this forum I’m saying you shouldn’t be using them. Lambdas were never really designed for our use in Rules. They just happened to be there. We shouldn’t be surprised that they don’t work well.

No it didn’t. All the same problems with exception handling and the finally clause not executing every time, particularly in lambdas, has existed since at least OH 1.8.1. It may be the case that the stricter checking introduced between OH 2.2 and 2.3 is causing more things that have always been problems to result in exceptions instead of just silently failing, but the problem with exceptions remains the same.

I’ve seen them lots, but it is almost always when the user is also using ReentrantLocks. If you are using locks, it is almost a must to use try/catch/finally. However, because you can’t trust that the finally will always be called and therefore the lock unlocked because of how some errors kill the Rule, locks, like lambdas, should only be used sparingly and everything possible must be done in such Rules to prevent exceptions from occurring. I strongly recommend against using locks in lambdas.

If I were to guess, I’d say that lambdas are inherited from Xtend but they were never intended to be used in a multi-threaded context like what exists in OH Rules DSL. Looking at all of the examples I can find, a new instance of the lambda is created for each thread. With a separate lambda per thread there is no need for the locks and no need for thread safeness. But if we create a new instance of the lambda in each Rule then there is no point in using the lambda in Rules DSL in the first place.

1 Like

To be fair, I’ve got try/catch exception handling in my lambdas.
But it’s there as belt and braces, as I have written the code so that the exceptions don’t arise. (so far as I can make it!!)

(belt and braces = English idiom meaning protected by redundancy against trousers falling down)

As you say, this a practical approach to make a working system, whether or not there are bugs in this area of DSL. I expect it to continue working whether or not those bugs get fixed one day.

Documented in the OH doc is the “finally” handling. See here:
https://www.openhab.org/docs/configuration/rules-dsl.html#concurrency-guard
But if there is a “finally” than the “catch” should work too (technically).

Just to make clear: I don’t claim exceptions should be used as substitution for carefully programming. But exception catching and logging is very helpful for debugging and finding own bugs.

In my OH installation there are a lot of invalid items state at startup. I have no real change to debug these rule (only by a restart). At least I can see that something was going wrong…

@mstorm: If you don’t use exception blocks and logging, you can’t know what’s going wrong in your system - that could lead to feeling very comfortable. But in professional environments that is an antipattern for good reason.

Topic documentation

In my eyes OH does not have an own documentation about rule syntax, there are just some examples.

https://www.openhab.org/docs/configuration/rules-dsl.html#the-syntax

Note: The rule syntax is based on Xbase and as a result it is sharing many details with Xtend, which is built on top of Xbase as well. As a result, we will often point to the Xtend documentation for details.

So I looked in Xtend docu for language features too. And there you can find lambda functions and try-catch-constructs.

[quote=“raulix, post:24, topic:51431”]
@mstorm: If you don’t use exception blocks and logging, you can’t know what’s going wrong in your system - [/quote]

Not true as a general statement.
There’s debuggers and static code analysis tools, just to name some alternative approaches.
And logging is available in OH.

Wrong, you do. There’s multiple options:

  • you can check for NULL/invalid state in your code before using (that’s part of the ‘write code that does not fail’ approach)
  • you can initialize items on system startup using ‘System started’ trigger (also part of that approach)
    You can even use persistence on items.
    FWIW, it helps to make sure your system loads items before starting on rules execution.
    There’s threads on this in the forum.

I know debugger and static code analysis tools for several other programming languages, but there are no such tools documented for OH rule files. (VSCode with LSB is something else, but helpful though.)

By “Checking for NULL states” you don’t suppress/exclude other programming mistakes.

Items can be NULL despite of persistence. See here: https://www.openhab.org/docs/configuration/persistence.html#startup-behavior

Persistence services and the Rule engine are started in parallel.