Rules and concurrency

In JRuby we create and use the same timer as the one in RulesDSL, namely org.openhab.core.model.script.actions.Timer

It was created using org.openhab.core.model.script.actions.ScriptExecution.createTimer() function, so it would create a TimerImpl instance.

Is that not thread-safe?

No, it returns the same TimerImpl I was just referring to:

    @Override
    public Timer createTimer(@Nullable String identifier, ZonedDateTime zonedDateTime, Runnable runnable) {
        return new TimerImpl(scheduler, zonedDateTime, runnable::run, identifier);
    }

But, I don’t think it takes much to make it thread-safe.

edit: I did a quick PR, I can’t see why this change could lead to any new issues:

Concerning transformations never executed in parallel, for one and the same transformation, I proposed at transformations: are executed sequentially by dilyanpalauzov ¡ Pull Request #2570 ¡ openhab/openhab-docs ¡ GitHub updating Transformations | openHAB.

[

I would expect that to work actually as JRuby doesn’t have the single threaded access to context limitation. What happens if the timer is created in JS and cancelled from JS and from JRuby?

Unfortunately I’m on a short get-away in the mountains until Tuesday and won’t have a chance to test it myself until then.

I understand it that the feats of GraalVM JRE/JDK are going to make it into OpenJDK and hence GraalVM JRE/JDK is discontinued. This would actually be really nice, since I‘d expect the Graal compiler to be available in OpenJDK then, meaning we could run all Graal languages in compiled and not interpreted mode only on a stock OpenJDK.

Access to the context is fully synchronised, at least I am not aware of any places we miss synchronisation. But if you cancel a timer while its executing I would expect a InterrputedException (if that really occurs depends on how cancel is implemented, if it waits for execution to finish or if it aborts it). EDIT: Seems that is not the case and no IE is thrown, see Jim‘s message.

I don‘t its something automation add-ons have to care about. When using managed rules (or hopefully soon those defined in YAML when Nadar‘s PR gets merged), core handles that for you. A script action can return a value, that is then returned from ScriptActionHandler::execute and I think the RuleEngineImpl passes that value to the next module. If the next module is a script action, the value can be simply injected into the context if core wants to do that.
I will probably give this a try later today.

The automation add-on is mainly reliable for providing the ScriptEngine to execute code, and in case of file-based scripts, informing core about the paths, the file extension etc.

No idea though what‘s the state of that all when using Rules DSL.

I do think if they are identical they get the same engine. The engine ID depends on the hash code of the script, and that should be identical in case the script code is identical.

I think its implementation choice. It is clear that some kind of synchronisation is required and that we need an engine to execute the transformation code, so using one engine per transformation makes sense, it‘s straight forward to implement.
If we want to allow for truly parallel execution of the same transformation, I could imagine switching from the one engine per transformation script mechanism to having a pool of ScriptEngines for each language and executing in ScriptEngines that are currently free. Basically like a thread pool.

Explain that to the average user, I guess you‘ll have a lot of fun doing that.
SCRIPT transformation is meant to be easy to use, if you want to squeeze out maximum performance, you can provide transformation from a script yourself by implementing the Java interfaces and registering as OSGi service, see Transformations / Profiles | openHAB.

The engine context that‘s set their not all the context of the engine, it is only event information of the event triggering the rule execution and stuff like that. The context of the engine itself holds way more stuff than what‘t set and reset there, e.g. defined classes, functions, global vars.

You still need to lock at least for Graal, the requirement there is much deeper than our code.

Remember: Constant engine recreation and setting and resetting context was there years before Graal arrived, only thing we added for Graal is to optionally synchronise if the ScriptEngine implements Lock. So there must have been a reason to do it that way, at least I‘d expect that.
Back then I think there only were NashornJS and Rules DSL, both don‘t „care“ about multiple threads entering the same script context at the same time.

As long as you don‘t change the script, each script will get its own engine which will be created and used until openHAB is stopped.
I do not think we could really run everything on one JSR223 engine, though I think the only „up-to-date“ language that‘s well maintained and is not Graal is Ruby, so probably its best to check if that‘s possible there.

That step is done in JS, GraalJS provides utilities to the JS runtime to check whether something is Java or not, see openhab-js/src/cache.js at 3ee500aeb6d0deb7b3a8d2a553bfddaaa2ec8496 ¡ openhab/openhab-js ¡ GitHub.

And this gets even funnier when using languages that don‘t really know about threads (JS).

:+1:

I think the main reason to have it was to provide a common way of retrieving engines for the various languages, as it kept track of ScriptEngineFactories.

About the TimerImpl:
JS Scripting also uses the one provided by core when using createTimer, all we do is put locking around the timer callback.

..or better: Stop using many ScriptEngine instances altogether :wink: As long as transformations don’t need to “share data” between each other, and they shouldn’t as that would be a minefield, a fresh context could be provided for each run, and they could run in parallel all they want :wink:

Yes, but this could be “copied” from the default context when the context was first created (and the extra stuff then added). I’m sure is possible to find a good way to handle that, since I’m pretty sure this is the way it’s intended to be used by the authors both of JSR223 and Graal.

Not according to graalvm.org:

Multithreading is supported when running JavaScript in the context of Java interoperability. The basic model of multithreaded execution supported by GraalVM is a “share-nothing” model that should be familiar to any JavaScript developer:

  1. An arbitrary number of JavaScript Contexts can be created, but they should be used by one thread at a time.
  2. Concurrent access to JavaScript objects is not allowed: any JavaScript object cannot be accessed by more than one thread at a time.
  3. Concurrent access to Java objects is allowed: any Java object can be accessed by any Java or JavaScript thread, concurrently.

A JavaScript Context cannot be accessed by two or more threads, concurrently, but it is possible to access the same Context from multiple threads using proper syncronization, to ensure that concurrent access never happens.

I think that the reason that you have “experienced” that Graal needs so many locks is because you have been using the default context. I could be wrong of course, but I’m reasonably confident that this is how it is.

I know that this “design” predates Graal - but that doesn’t convince me that this is the way it’s intended to be done. There are a lot of details to keep track of when trying to put together something like that (integrate scripting support into a Java program), and you can’t really expect that all aspects are done with 20/20 vision. Some details might have slipped past the original author(s).

To me, the java-scriptengine is important too, if I have to move from Rules DSL, this is where I most likely want to go.

I think you can run everything from one JSR223 engine, and I think performance will improve by doing that (as a result of less recycling/recreation).

My PR should “plug the last holes” in TimerImpls thread safety itself, but the callback/runnable is of course left alone and would still have to be wrapped by JS.

For traditional JSR223 yes, for Graal based stuff no, as Graal wants to be synchronised.

GraalJS ScriptEngine implementation is a wrapper around the Graal Polyglot Context, see graaljs/graal-js/src/com.oracle.truffle.js.scriptengine/src/com/oracle/truffle/js/scriptengine at 9ca3c580641ca94e00d1bbb5dacdaf6f225a0965 · oracle/graaljs · GitHub. There’s not much code there. And as written down in your post:

  1. An arbitrary number of JavaScript Contexts can be created, but they should be used by one thread at a time.

We create one JavaScript Context per file, per script action, per script condition, see openhab-addons/bundles/org.openhab.automation.jsscripting/src/main/java/org/openhab/automation/jsscripting/internal/OpenhabGraalJSScriptEngine.java at 9b5d188a4e631480ce9fc1b554f9ed0e1beaed4b ¡ openhab/openhab-addons ¡ GitHub. And as they should only be used by one thread at a time, we need to synchronise.

Sorry, I only keep track of the official add-ons, that’s what I see when checking our website.

Your PR’s important when sharing timers across languages or scripts, in a single JS everything behaves like “thread-safe” since we only have a single thread :wink:

This is how I see this, and that I’ve been trying to point out:

Once upon a time, JSR223 was taken in use in OH where engine instances were kept separate. It might have been by mistake/lack of overview, or it might have been for good reason that either doesn’t apply anymore, or that still does (although I have no idea what that reason could be):

Google says:

JSR-223 (Scripting for the Java™ Platform) enables multithreading by allowing a single engine instance to support concurrent evaluations by multiple threads, though the exact behavior depends on the engine’s implementation. JSR-223-compatible engines, such as Groovy in Apache JMeter, are designed to perform well under multithreaded conditions, contrasting with less efficient languages like BeanShell for high-load scenarios.

How JSR-223 Handles Multithreading

  1. Concurrent Execution: The JSR-223 specification allows a single script engine to be evaluated by multiple threads simultaneously.
  2. Engine Dependency: The specific level of multithreading support, such as thread-safe operations and caching, is an implementation detail of the particular script engine (e.g., Groovy, JavaScript).
  3. Performance: Languages like Groovy are generally more performant and have less overhead in multithreaded environments compared to languages like BeanShell, making them a better choice for high-load testing scenarios in tools like JMeter.

Later, when Graal came along, it was probably found to be too inefficient to keep recreating the engine, so the “hack” of mapping the “engine” to a context object was introduced.

This means that Graal based scripting languages don’t suffer as much, but JSR 223 languages does.

But, this also complicates things, since in Core, both these two very different “concepts” must be treated using the same code, where under Graal, the context is “impersonating an engine instance”. The more functionality and performance you try to get into this solution, the harder it will get, because you’re actually dealing with two very different things, and one set of “rules” that work well for both doesn’t actually exist.

My suggestion is therefore that, assuming that we can verify that a JSR223 engine instance actually behaves as I believe, and as I find documented, the “hack” is removed, and engine means engine and context means context throughout their life in Core.

Whether this requires a new “layer of interfaces” or not, I’m not sure of. If needed, there must be “parallel paths” for Graal and JSR223 based languages, if not, they can keep coexisting “as JSR223”.

My claim is that this will “solve some knots” by itself, some things that are “tricky to balance” or leads to restrictions/things that can’t be done today, will just solve themselves, because they are a result of the mixing of instance types/concepts. Further, I’m convinced that it will improve performance for JSR223, and that it will make it easier to avoid bugs in the future, because anyone touching the code don’t have to know all the details of what’s “actually happening” despite appearances.

I get that it might be a bit of work to do all the changes in the corresponding places, but I think that the longer it is postponed, the harder it gets, and the more complicated things become in the meanwhile.

edit: My confidence in the above just got a serious dent. I read some JavaDocs in the ScriptEngine (JSR223) interface:

    /**
     * Causes the immediate execution of the script whose source is the String
     * passed as the first argument.  The script may be reparsed or recompiled before
     * execution.  State left in the engine from previous executions, including
     * variable values and compiled procedures may be visible during this execution.
     *
     * @param script The script to be executed by the script engine.
     *
     * @param context A <code>ScriptContext</code> exposing sets of attributes in
     * different scopes.  The meanings of the scopes <code>ScriptContext.GLOBAL_SCOPE</code>,
     * and <code>ScriptContext.ENGINE_SCOPE</code> are defined in the specification.
     * <br><br>
     * The <code>ENGINE_SCOPE</code> <code>Bindings</code> of the <code>ScriptContext</code> contains the
     * bindings of scripting variables to application objects to be used during this
     * script execution.
     *
     *
     * @return The value returned from the execution of the script.
     *
     * @throws ScriptException if an error occurs in script. ScriptEngines should create and throw
     * <code>ScriptException</code> wrappers for checked Exceptions thrown by underlying scripting
     * implementations.
     * @throws NullPointerException if either argument is null.
     */
    public Object eval(String script, ScriptContext context) throws ScriptException;

I’m thinking of “State left in the engine from previous executions, including variable values and compiled procedures may be visible during this execution.” in particular. Also, the “engine global” context values:

    /**
     * Sets a key/value pair in the state of the ScriptEngine that may either create
     * a Java Language Binding to be used in the execution of scripts or be used in some
     * other way, depending on whether the key is reserved.  Must have the same effect as
     * <code>getBindings(ScriptContext.ENGINE_SCOPE).put</code>.
     *
     * @param key The name of named value to add
     * @param value The value of named value to add.
     *
     * @throws NullPointerException if key is null.
     * @throws IllegalArgumentException if key is empty.
     */
    public void put(String key, Object value);

Without proper isolation between executions, an engine can’t “safely” be reused for different purposes, obviously. I guess this explains a lot of the things I’ve found “strange”.

I still think that there might be room for “script engine” and “context” concepts in OH that aligns more with Graal, that would make things simpler and “cleaner” to handle. Maybe a JSR223 script engine should “pose as a context” instead of the other way around..?

edit2: ScriptEngineFactory further confirms this:

    /**
     * Returns an instance of the <code>ScriptEngine</code> associated with this
     * <code>ScriptEngineFactory</code>. A new ScriptEngine is generally
     * returned, but implementations may pool, share or reuse engines.
     *
     * @return A new <code>ScriptEngine</code> instance.
     */
    public  ScriptEngine getScriptEngine();

So, I have probably “made noise” for no good reason. Let me just add that this was a really stupid design (by JSR223).

We revisited the whole architecture, so there is absolutely nothing bad about that „noise“. I think it would be good to extend the JavaDoc of the relevant core interfaces (like ScriptEngine) with the knowledge we gained, so it’s not forgotten (probably again) and we or someone else ask the same questions again in a few years.

2 Likes

At least I think I figured out the origin of ScriptEngineManager. It “shadows” javax.script.ScriptEngineManager and performs a similar role.

1 Like

I’ve always wondered why it uses the same name

I think I said further up that the shared cache implementation itself was thread-safe. It turns out that there are two, ValueCacheImpl and TrackingValueCacheImpl, and only TrackingValueCacheImpl is actually thread-safe :face_with_raised_eyebrow:

edit: I think I found the explanation for that:

    @Override
    public @Nullable Object get(String scriptIdentifier, String type) throws IllegalArgumentException {
        if (SHARED_CACHE_NAME.equals(type)) {
            return new TrackingValueCacheImpl(scriptIdentifier);
        } else if (PRIVATE_CACHE_NAME.equals(type)) {
            return privateCaches.computeIfAbsent(scriptIdentifier, ValueCacheImpl::new);
        }

        return null;
    }

TrackingValueCacheImpl is used for the shared cache, and ValueCacheImpl for the private caches.

1 Like

At concept/rules: state that a rule cannot be executed in parallel by dilyanpalauzov ¡ Pull Request #2573 ¡ openhab/openhab-docs ¡ GitHub I proposed spelling out that the execution of one rule cannot be started, as long as the same rule is currently running.

But if you cancel a timer while its executing I would expect an InterrputedException

If Thread.sleep() is in a rule file, the file is changed, while sleep is working, then there is an exception. E.g. DSL logs “Script execution of rule with UID ‘a-1’ failed: sleep interrupted in …” and from JS it is

 Failed to execute rule ABC-4110964a-e42b-4f68-afa7-ddd4ab70af93: java.lang.InterruptedException: sleep interrupted: sleep interrupted
    at java.lang.Thread.sleep0
    at java.lang.Thread.sleep (Thread.java:509:0)
    at execute (a.js:9:247-287)
    at doExecute (openhab.js:2:58571-58601)

as demonstrated at rules_overview.md: elaborate about concurrency support in rule engines by dilyanpalauzov ¡ Pull Request #2565 ¡ openhab/openhab-docs ¡ GitHub

A script action can return a value, that is then returned from ScriptActionHandler::execute and I think the RuleEngineImpl passes that value to the next module. If the next module is a script action, the value can be simply injected into the context if core wants to do that.

Is that value passed by inputs["result"] (parameter of SimpleRule.execute(action, inputs)`) to the next action, or by injections of terms of ScriptEngine.put()?

If we want to allow for truly parallel execution of the same transformation, I could imagine switching from the one engine per transformation script mechanism to having a pool of ScriptEngines for each language and executing in ScriptEngines that are currently free. Basically like a thread pool.

I think it is sufficiant to state, that no global variables should be used. In case of JS this probably means wrapping everything in { ... } or in IIFE function( ) { }(). Writing transformations means that, unless absolutely necessary, no variables should be used, which outlive the current script execution.

I think the only „up-to-date“ language that‘s well maintained and is not Graal is Ruby

What are the problems with Groovy, except [Groovy] TriggerBuilder not in RuleSupport preset since OH 4.3 ¡ Issue #18396 ¡ openhab/openhab-addons ¡ GitHub (importPreset not importing types)?

I don’t actually think this is a problem, I saw that InterruptedExceptions are thrown by blocking rules also when OH is shut down for example. It is the intended way in Java to abort execution.

What I think could be changed is that InterruptedException weren’t logged like any other exception, but instead simply logged like “x was interrupted and aborted execution” or something like that. It’s not an error per se, so it probably shouldn’t be logged as one.

I’ve been wrong on this so many times already, so I probably shouldn’t speak without checking thoroughly, but I think that in a way it does both. The return value probably ends up as outputs["result"], and if the “automatic mapping” connects this to the inputs of the next module, it would end up in its inputs. The inputs are then added to the script’s context before the script is executed, which effectively does the same as ScriptEngine.put() as far as I can understand.

Or use the cache. But there has to be a way to keep the cache variables separate which usually means passing in some variable (using the url arguments) so each instance uses unique keys. I do this for the units transformation I posted to the marketplace.

If the PR gets merged that passes the item name as well as input gets merged that shouldn’t be needed anymore and we could use that in most cases to keep the keys unique per use.

As for passing return values from scripts, where that would be most useful would be when using non-script actions and conditions. If one is using a script, they are mostly going to just have everything in that one script. Where it becomes most useful is in cases like calling some action (let’s say the Astro sun angle action). and use that value plus the triggering item’s state in the send notification Action.

Doesn’t that mean that these cache values will never be cleaned up, and stay there “for eternity”? Maybe (shared) cache entries should have a TTL option?

I’d think that this was possible already, depending on how the “auto mapping” of inputs and outputs works. If you make a non-script action that runs first (I’m not entirely sure if the order in which they run is predictable, per action id or not), and then inspect/debug print the event object you receive in the second, scripted action, I wouldn’t be too surprised if it turns out to already be there.

TTL is not a good idea.

I haven’t been following this thread so I don’t know whether you have already known this: a shared cache entry will be removed when all the scripts accessing it were unloaded (which also happens when they’re reloaded).

I’ve seen that there is a tracking mechanism, but I didn’t know that it deleted entries when the scripts were “unloaded” (whatever that means). For non-SimpleRule “scripts”, wouldn’t that mean that they would be removed as soon as the execution stopped?

scripts don’t get unloaded when the execution stopped, so the cache entry remains. They (the old version) will get unloaded when you saved a new version of it. If that script is the only one that ever accessed the cache entry, then the entry will be removed. If the new script still access the same cache key, a new value gets inserted to the cache (since obvioulsy the previous entry was already deleted upon unload).

You can test this behaviour yourself very easily.

I think you’re still talking about SimpleRules. When you say “save a new version of it”, it sounds like you’re referring to script language rule creation scripts that create rules and where their “context” keeps on living until you either stop OH or change the file so that it’s reparsed/loaded.

What I mean is if you make a normal rule in the UI, using either JavaScript or Ruby or something else. If that action stores something in the cache when it runs, will it be around the next time the action runs?

I can probably test it, but not very easily. Graal doesn’t work when not running in Karaf, Ruby works but is very tricky to get to run. It will usually throw a lot of exceptions and I’ll have to keep rebuilding both core and addons until suddenly it works. I don’t know what exactly happens, but there clearly is something that is “out of sync”.

Once I get Ruby to work, I have to figure out the syntax to use the cache from Ruby.

Jython and Groovy works pretty reliably, but documentation for them is very sparse. I have no idea how to even access the shared cache from them.

So, it’s certainly much easier if somebody that knows how it works can tell me. Other than that, I think I’ll prefer reading the cache code over getting a test scenario up and running.