Rules and concurrency

There has been various discussion about how rules/scripts handles concurrency and parallelism in OH lately. Quite a lot of it ended up in a documentation PR, while the topic has also been covered in this marketplace topic, and I’m sure other places as well.

The idea is to continue this here, as a place where it’s easier to find, both now and in the future. As I see it, there are two main motivations behind the discussion: To help clarify how things actually works and take the guesswork out of it, and perhaps find to find areas that can be improved or handled differently.

There are different terms used, threads, concurrency, parallelism, thread-safety, locks, synchronization. It can be a handful just to keep track of what people mean by the different terms, I’m not sure that everybody’s definitions are the same.

I’ll try to define some terms, both what I can find of online definitions and perhaps my interpretation if these differ.

  • Thread: Wikipedia: “In computer science, a thread of execution is the smallest sequence of programmed instructions that can be managed independently by a scheduler, which is typically a part of the operating system.[1] In many cases, a thread is a component of a process.”

    I’ll add that threads are OS entities, and that although there can be minor differences in their properties between OSes, they generally behave very similarly and can be thought of as the same concept. Threads generally belongs to a process, and they all share access to the memory allocated to the process.
  • Process: Wikipedia: “In computing, a process is the instance of a computer program that is being executed by one or many threads. There are many different process models, some of which are light weight, but almost all processes (even entire virtual machines) are rooted in an operating system (OS) process which comprises the program code, assigned system resources, physical and logical access permissions, and data structures to initiate, control and coordinate execution activity. Depending on the OS, a process may be made up of multiple threads of execution that execute instructions concurrently.”

    I’ll add that like threads, processes are OS entities, and they generally don’t share memory with other processes (although it’s possible to do so under certain circumstances). Processes can’t communicate with each other unless they implement some way to do that (through network, pipes, sockets etc.).
  • Fork/forking: Wikipedia: “In computing, particularly in the context of the Unix operating system and its workalikes, fork is an operation whereby a process creates a copy of itself. It is an interface which is required for compliance with the POSIX and Single UNIX Specification standards. It is usually implemented as a C standard library wrapper to the fork, clone, or other system calls of the kernel. Fork is the primary method of process creation on Unix-like operating systems.”

    I’ll add that forking usually means to create a new process, unfortunately, I’ve also seen it used to describe operations with threads.
  • Concurrency: Wikipedia: “In computer science, concurrency refers to the ability of a system to execute multiple tasks through simultaneous execution or time-sharing (context switching), sharing resources and managing interactions. Concurrency improves responsiveness, throughput, and scalability in modern computing”

    I’ll add that that I’ve seen some use concurrency about things that are scheduled to run one after each other, like in the event-loop in JavaScript. This doesn’t correspond with my understanding of the concept. In a single CPU system, the OS will schedule different threads to run at different times, so none of the threads actually do anything simultaneously. The scheduling is handled by the OS kernel, and execution of one thread doesn’t prevent the execution of another, the scheduler will switch to the next one regardless of whether one thread is finished or not. This is very different from how e.g the even-loop works. Since it runs in a single thread, any holdup/loop/bug can grind everything to a halt.

    Also essential, to me, is that you never know (nor care) if threads actually run simultaneously or not. You know that this depends on the hardware that runs the code, and if there are multiple cores available, multiple threads can (and probably will) run simultaneously. As a consequence, you take precautions as if they run simultaneously regardless. This is also very different from the event-loop situation, where you know that things will never happen simultaneously, so you don’t have to take precautions to prevent problems that can occur if multiple threads tires to access the same memory at the same time.
  • Parallelism: Wikipedia: “In computer science, parallelism and concurrency are two different things: a parallel program uses multiple CPU cores, each core performing a task independently. On the other hand, concurrency enables a program to deal with multiple tasks even on a single CPU core; the core switches between tasks (i.e. threads) without necessarily completing each one. A program can have both, neither or a combination of parallelism and concurrency characteristics”

    I don’t have much history in using the term, so my definition is somewhat vague. I see that some use it to define when things happen simultaneously, while others seem to mean that it implicates several running processes that “work towards a common goal”. I guess I’d say that if two threads don’t work with any shared objects, but runs something external like a script engine that is only known by each individual thread, I could say that the “threads work in parallel”. That doesn’t fall under concurrency to me, because the core of the concurrency concept is the utilization of shared objects/memory.
  • Thread-safety: Wikipedia: “In multi-threaded computer programming, a function is thread-safe when it can be invoked or accessed concurrently by multiple threads without causing unexpected behavior, race conditions, or data corruption.[1][2] As in the multi-threaded context where a program executes several threads simultaneously in a shared address space and each of those threads has access to every other thread’s memory, thread-safe functions need to ensure that all those threads behave properly and fulfill their design specifications without unintended interaction.”

    I don’t have much to add, I’d say that thread-safe code means code that, through any available means, is made in such a way that it can be used by multiple threads at once without potentially behaving “unexpectedly”. Care has been taken to make sure that things can’t go wrong, not merely that they just most likely won’t go wrong.
  • Lock/Mutex and synchronization: Wikipedia: “In computer science, a lock or mutex (from mutual exclusion) is a synchronization primitive that prevents state from being modified or accessed by multiple threads of execution at once. Locks enforce mutual exclusion concurrency control policies, and with a variety of possible methods there exist multiple unique implementations for different applications.”

    “Lock” and “Mutex” are the same thing in principle, but one or the other word might be more commonly used in certain contexts. “Synchronization” is a wider term, but is often used to describe a lock/mutex, particularly among Java developers because Java has a keyword “synchronized” that activates an intrinsic lock. The keyword “synchronized” is the by far most common way to invoke a lock in Java, which is why they tend to be used more or less like synonyms.


    Generally though, “synchronization” only means that some mechanism is used to make sure that things happen in a certain order, usually serially (not concurrently). Locks/mutexes is one way to achieve this, because a lock/mutex can be acquired by a thread, and any other thread that attempts to acquire the same lock/mutex must wait until the previous thread has released the lock/mutex before they can continue. The effect is thus that access to the code inside a lock/mutex, is “synchronized” is that threads have to run that code in turn. It is used to ensure that only one thread can access a certain part of memory at the same time, if that part of memory can be changed (is mutable). Parts of memory that can’t change (are immutable) can be safely accessed by multiple threads concurrently without any form of synchronization.

There is a lot more to be said about this topic, but I deem that this post is long enough as it is now.

2 Likes

@rlkoshak I’ll reply to this comment here, in an attempt to move the discussion to the forum.

I’ve found the implementation in Core, it’s called ValueCache. I can see that the cache itself is thread-safe (retrieval and storage is protected by locks), but the objects themselves aren’t (they aren’t cloned, and frankly, can’t easily be cloned), so that if one script changes the object, it will potentially change the object in the cache directly, without any concurrency protection. I don’t know the mechanism that sits between the scripting engines and this cache though, I kind of doubt that the Java objects are useful as they are. So, there probably is a conversion along the way, which would “break the link” to the actually cached object, and make it thread-safe because changing it from a script would then only change the converted object. If so, changes wouldn’t be reflected without putting it back to the cache after changing it. The only way this can be “unsafe” is if the object is used directly, as-is.

I’m not familiar with that discussion, but it seems to me like such a cache could be very useful. It’s probably too late to change the existing cache in any drastic way now, but an additional cache could always be added, with different rules. I was thinking about for example serializing everything to JSON in the cache, and let the individual langauge add-on transform the objects to/from something that is useful for that language. It could then become the ObjectCache :wink:

As stated above, I found it. I’d have to see the exact discussion where it was rejected to understand exactly what was rejected, I don’t understand how the ability to share language objects wouldn’t be usefult.

ValueCache is backed by a Map<String, Object>.

I must have missed something then. I could only see one full rule quoted. But, I believe you when new script engines are created for everything.

It just didn’t make any sense to me, why make a ScriptEngineManager if you’re just going to produce new ones? But, given this explanation:

..it starts to make some sense to me. I’m speculating now of course, but what I see could have happened in that originally, ScriptEngineManager did manage script engine instances. But, because of the… “arrangement” with bridging Polyglot Engine and Context through JSR223 where a Polyglot Context becomes a ScriptEngine, this wouldn’t have worked. So, ScriptEngineManager has probably been “tweaked” into producing new engine instances, “because they are cheap in reality as they are just contexts”. “Real” JSR223 implementations will suffer of course, because they actually have to create a whole new engine each time.

Maybe this isn’t how it happened, but at least it’s a model where the pieces can start fitting together in my head.

To me, this seems like a suboptimal design, and I’m questioning if it wouldn’t be better to make something in Core that would allow scripting either through “Polylot” or “JSR223”. But, there might be lots of hurdles in the way, making it difficult.

The details and limitations of this conversion would be of interest to the user of any particular scripting language, and should probably be documented to some extent. It still can’t understand that such conversion doesn’t mean that a new object is created, breaking the link to the “live” cached object. But, the devil might be in the details, maybe GraalVM attempts to do some kind of “live” conversion, where the converted object would reflect changes in the original and vice versa. If so, concurrency would obviously be an issue.

The fact that the timers cause exceptions indicates that they aren’t converted, but are somehow forwarded as Java objects?

I’m not so sure that it means that there’s a reference from the timer to the context, I read somewhere that GraalVM explicitly forbids objects that might cause concurrency issues, so the exception might simply be the result of this: GraalVM deduces that this object can cause concurrency issues, so it isn’t “allowed” - throw exception instead.

I think that approach would open up potential to TOCTOU problems.

That would be disastrous for something like Timers. One reasonable use case is to create a timer from rule and cancel it from another rule.

Obviously that use case isn’t handled today for JS, but is supported for the other languages.

I’m wary of going overboard to accommodate a limitation in only one of the automation add-ons.

https://github.com/openhab/openhab-core/issues/4413

That’s because I can’t create a full YAML version of a Script in MainUI. Even Scenes have a code tab but Scripts do not.

I could handwritten the YAML or pulled the Script from the JSONDB. But it’s just an empty rule consisting of just a single Script Action. But I don’t feel compelled to do that to compensate for a limitation in MainUI (which I think I opened an issue for many years ago) which doesn’t actually add any additional information.

I want to point out though that the likelihood someone wants to pass an arbitrary Object created in one language to another is very unlikely. And theoretically, passing pure Java Objects should be OK, right? All the scripting languages support Java.

The problem with Timers is that they reference a function/lambda and that function/lambda is created in the language the Timer was created. Often that function/lambda also accesses/uses variables that exist in that context (otherwise what’s the point?). So it may not be that complicated. It just fails when the timer function is executed by more than one thread at the same time.

I’m not sure we know if it’s a problem when the conversion takes place. I certainly have never done a test like that. I think it’s a problem with Timers because those are already Java Objects so there is nothing to convert. However, they carry a bit of the script’s context with them and it’s that which causes problems.

They are Java to begin with so there’s no need for a conversion. But like I said, they have a function passed into them which is not Java and which references variables which are not Java.

As I understand it, the automatic conversion stuff only occurs for primitives and standard collections of primitives (maps and arrays) and I think ZonedDateTime.

The exception doesn’t occur until two separate threads actually attempt to use anything from a single context at the same time. If only one thread is accessing the data, no problem occurs. that works. You can put anything into the private cache because the rule locking ensures that only one thread can ever access that data at a time. GraalVM doesn’t do anything to try and predict this. It just prevents it from happening with an exception at runtime.

This is a bit complicated, but yes, you have a slight window from the object is retrieved from the cache until it is converted that it is “unsafe”. I wouldn’t call it TOCTOU, because by the time you can actually check/evaluate it, it has already been converted. It’s fundamentally “unsafe” to hand out mutable objects the way ValueCache does today. That’s irrespective of the scripting language or even Java itself retrieving it.

To effectively protect a mutable object, there must be “an agreed upon” lock that protects it. In Java, you can attempt to achieve this using JavaDocs etc., but you’re still trusting that everybody respects those rules. When shared with scripting languages that can’t even access these locks, it can’t be done safely. It, on the other hand, the object is either immutable (but that’s ultimately an illusion, RAM is by definition writable, and would be very hard to enforce across languages), or the object itself is thread-safe (that it has the necessary internal protection so that all publicly available fields and methods can safely be accessed by multiple threads), it’s perfectly safe to use the cache as it is.

The problem, is that since the cache accepts Object, it’s impossible to enforce either immutability or thread-safety. It will simply accept anything it’s asked to store.

If Clonable had been chosen instead of Object, it would have been possible to make it safe. The cache could then have cloned all objects both on the way in and out. Wasteful, but not any worse than many other things that has become “common practice” today (copy-on-write, streams, most of the “functional programming” stuff etc. creates a lot of unneccesary, short lived objects. So, it would probably have been a competely acceptable compromise. However, it would limit what could be put into the cache - not everything in Java is Clonable, and not everything can easily be made Clonable.

It wasn’t really a suggestion that this should be “solved” by creating a copy in a conversion process, it’s more how I imagine it already works for most types. But, ultimately, I don’t know how this is handled, but the fact that primitives are accepted indicates that there is at least some conversion.

I haven’t looked into what the problem with the Timers really is, but I suspect that the problem is a different one than what it appears to be. There is something called org.openhab.automation.jsscripting.internal.threading.ThreadsafeTimers where you have methods like createTimer() and setInterval(). I haven’t checked that these are actually thread-safe, but if we assume that the name can be “trusted”, they should be. I would assume that these are the Timers made available to JS code. In any case, it should really be difficult to make a thread-safe Timer, one whose methods can safely be called from multiple threads. A Timer doesn’t normally give access to the “task”/payload/runnable it carries, so I don’t think that aspect should be a problem. It would only need to make sure that cancel(), reschedule(), isActive() etc. were thread-safe, which should be trivial.

More likely is it that when you create a Timer from JS using a function, that function embeds the context of the script in which it is created. When the timer then attempts to execute this function from another thread, GraalJS protests.

I think this could be solved, but I’m not sure if my idea could be implemented, so I’ll leave it at that for now.

That’s the problem with concurrency in general: You often aren’t bitten if you do “unsafe” things, because it all depends on timing and as such appears to be pretty random when it “bites”. And, when it does bite, it usually doesn’t present clear symptoms that make it obvious that this is the problem. Instead, you get a system where “strange things happens at times” without anybody being able to explain why, or to reproduce it (in many cases).

So, you can call it “supported for other languages” all you want, but it won’t change the fact that sharing mutable objects via the cache is “a game of russian roulette”. I can’t promise that it won’t work, I can’t promise that it will work, all I can promise is that, if you do it enough times, some of them will fail, probably in a non-obvious way.

I get that there’s “no way back” for the current cache, it has been there too long and too much relies on it. But, I think it would be nice if safe alternatives could be found, so that those that wanted to protect against “random failures” could choose to use different techniques.

That wasn’t my idea at all. Maybe there’s no need for this, I don’t know, but I was thinking of a way you could share basically everything that could be serialized to JSON, in a thread-safe manner. It wouldn’t only benefit JS, but all scripting add-ons. It probably wouldn’t be very “cross-language capable”, it would be up to the individual scripting language add-on to decide if it could deserialize any given object, or just reject it as “uncomprehendable”. I think most people would use it to share objects between scripts of the same langauge anyway. But, it would potentially have a “TOCTOU trap” in that if one script made decisions based on the instance it deserialized from the cache, another one could update it in the meanwhile, so that by the time the first script took action, it was no longer the right thing to do.

Ultimately, there is only one way to solve that kind of problem that I know of: Locks. When you need to make sure that the decision is made and action initiated before anything else “interferes” with the factors the decision is based on, you must protect the entire “decision process” with a lock, so that other participant will either see the situation as it was before the decision process was started, or after the decision had been made. I simply don’t see how this kind of “transactional logic” can be achieved with the tools we have to work with here.

I understand now :+1: (but it took its time)

Got it, I’m betting on the context being the issue then.

I haven’t read through it all yet, but I already see some of the things I’ve been saying being mentioned:

IMO that’s wrong usage of the cache. You should store unmodifiable objects there, and instead of modifying an existing object create a modified copy and put that to the cache.

The problem is that people can all have their ideas about how something should be used, if it isn’t enforced, or at least strongly “recommended”, that won’t be what happens.

Further:

I think the issue really is that you try to use an object that itself does not support multi-thread access. It’s not an issue of the cache. Try something simple, put an integer to the cache, start with 0, increase after every read and put the new number to the cache. This should work perfectly fine.

Again, the same idea, as I’ve said, this cache is only safe under two circumstances: Immutable objects and thread-safe objects. But this isn’t enforced, or even properly warned about, so people will put mutable objects there. There is no way that can be done safely, and I think GraalJS is right to refuse to do it, instead of just going along with the “russian roulette”.

This might indicate what I’d call a “serious blunder”, if this is actually recommended/documented:

Note that my original repro is based on an example from the documentation, which uses an object with a property that serves as a counter. This suggests that this is a totally normal thing to do and should not result in exceptions. See here: JavaScript Scripting - Automation | openHAB

You can’t use a shared cache without locking for “counting logic” even with primitives, it will never do what you expect. You’d need to use something like AtomicInteger to do that with consistency, simply because two threads can be updating the counter at the same time, and one will “overwrite” the increment done by the other.

This is actually an interesting case, because it has a built-in solution that nobody seems to use or know how to use:

I have a case where the only reason I’m using the shared cache is so the rule condition and action can share some data. It can never be the case that both will run at the same time because they are part of the same rule. Can the warning log statement be made smarter to detect a case like this where the shared cache is being used in a way that is safe even when using JS Objects? I think all it would take is looking to see which rules reference the entry and if it’s only one rule it’s not a problem. I don’t know if that info is available though.

There are inputs and outputs to/from modules (triggers, conditions, actions), that are meant to do exactly this - pass data between them. I don’t know exactly how to use it either, but I doubt that it’s rocket science. The question is how many things have been done to break this in the meanwhile, since people haven’t been aware that it’s there.

Regarding the issue (Using cache.shared in JavaScript rules can lead to unexpected java.lang.IllegalStateException: Multi threaded access requested by thread #4413) as a whole, I’d say that neither 1), 2) or 3) are viable. You simply can’t rely on GC, or any other “implied mechanism” to handle the locking and unlocking for you, it will be way too slow and lock the objects for far too long.

I do see one way this particular functionality could be done, but it with bring with the risk that users could cause deadlocks by mishandling the locking in their scripts: You could create a (3rd ?) shared cache with dedicated locking mechanisms, where you’d either have to “lock the key” before you could retrieve its value, or where retrieving it automatically locked it. But, the cache would then also have to have an unlock(<key>) method that you had to call using “try.. finally”, or a corresponding mechanism in the language in question, as soon as you were done dealing with that object.

It would allow doing “shared counters” and other things like that, but it would fail spectacularly when users didn’t understand the locking/unlocking, and I can’t see that it’s very easy to “warn” if they do it wrong. But, it’s the only way I can see that you could actually achieve what was asked for in a thread-safe, not “by luck”, manner.

Answering to your post Java223 Scripting - Script with Java on openHAB [5.0.1.0,6.0.0.0) - #75 by Nadahar in the old thread:

Yes, the synchronisation requirement is language specific. Partly, synchronisation is done inside the OpenhabGraalJSScriptEngine implementation and the add-on code, but in case of the Script***Handler, we need to synchronise in core to synchronise the full block of setting execution context, execute, resetting execution context. Please note that that locking mechanism in core is fully optional, you have to implement Lock with your ScriptEngine to enable it. If you don‘t want it, simply don‘t implement the Lock interface.

Yes. I originally meant that ScriptEngine and Polyglot Engine are different contexts, but yes, you also have the JSR223 and the GraalVM Polyglot Context. But the contexts correspond to each other/are the same from the point of view of what‘s (e.g. variable values) in the environment of the script.

Do they really do? E.g. for NashornJS, you only get a ScriptEngine implementation from the JDK (or now, the additional package) and don‘t know about its internals. You only have the JSR223 context for Nashorn.
The main difference for Graal is, that you don‘t simply get a ScriptEngine instance (if Graal even provided such a wrapper for its Context), but you have access to the underlying mechanics, which are the Graal Polyglot Context and Graal Polyglot Engine.

I think directly handling Graal contexts in core would allow for getting rid of the ScriptEngine wrappers wrapping Graal contexts, but I don‘t think that would really change much. It would mainly allow us to get rid of wrappers like openhab-addons/bundles/org.openhab.automation.pythonscripting/src/main/java/org/openhab/automation/pythonscripting/internal/graal/GraalPythonScriptEngine.java at 4440af9d1ac4a43ffd0df4cd8a57a6eaea52dcff · openhab/openhab-addons · GitHub,

I think its existence might make more sense when checking where and how it is used in other places, but I am out when its comes to answering that question. That code likely is as old or even older than my openHAB user experience …

I can‘t recall such a change, and the Git history also included nothing obvious here: openhab-core/bundles/org.openhab.core.automation.module.script/src/main/java/org/openhab/core/automation/module/script/internal/ScriptEngineManagerImpl.java at 51110e96cb882f2fa0f35523561c254df7fa81de · openhab/openhab-core · GitHub & History for bundles/org.openhab.core.automation.module.script/src/main/java/org/openhab/core/automation/module/script/internal/handler/AbstractScriptModuleHandler.java - openhab/openhab-core · GitHub

I would expect that new engines have always been created for every script, otherwise you‘d need to manually save and restore the JSR223 context (variable states for example) when executing different scripts with a single engine, which probably would‘t work as it would require handling native language objects in Java.

With that reasoning I don‘t think that JSR223 engines can be reused to execute different scripts, so it also doesn‘t make sense to split core handling for JSR223 ScriptEngine and Graal Polyglot Context. If my assumption is wrong, it could make sense to split, but TBH I don‘t want to go down that rabbit hole.

Due to a multi-threading limitation in GraalJS (the JavaScript engine used by JavaScript Scripting), it is not recommended to store JavaScript objects in the shared cache. Multi-threaded access to JavaScript objects will lead to script execution failure! You can work-around that limitation by either serialising and deserialising JS objects or by switching to their Java counterparts.

Timers as created by createTimer can be stored in the shared cache. The ids of timers and intervals as created by setTimeout and setInterval cannot be shared across scripts as these ids are local to the script where they were created.

(From JavaScript Scripting - Automation | openHAB)

It seems I recalled wrongly when stating that timers can‘t be put into the shared cache, sorry. (The above war written by me, so I think I have reasonably verified it back then and only my memory was not correct about it.)

The JS API to the cache prints a warning to the log when putting a JS object into the shared cache.

I agree. When wanting to pass a collection of key value pairs from JS to Python (for example), JS objects obviously aren‘t supported in Python, so I would very likely decide to use a Java Map, as both languages can use this. Then I will be fine. When going with the JS object, Graal will convert it to a Map, we will probably have multi-threading problems, and I still have to use a Map in Python — so I can simply use the Map in JS as well, as I already know how to use it.

Timers created in JS are fully synchronised, and FYI Graal already throws the famous multi-thread exception when simply trying to run anything in parallel, you don‘t need to explicitly access a var as you are already in the same context.

You are right. If we were to make the shared cache safe for that, we‘d need to clone objects, but then we‘d need another way of managing timers across multiple scripts …

Always open for that, but keep in mind users need to actively make a switch and given the amount of push back more minor changes often receive, I think that will be quite difficult.

@dpa-openhab Please continue the discussion in this thread.

I was thinking of this announcement:

https://blogs.oracle.com/java/post/detaching-graalvm-from-the-java-ecosystem-train

Reading it again, I realize that I might have misunderstood it. It might be GraalVM as a general JVM that is being discontinued, with a renewed focus on GraalJS and GraalPy. I’m not sufficiently informed about the “Graal ecosystem” to always know what is referred to and not.

Yes, I did mean “Core”. I’m not sure what you mean by “installed indirectly” - maybe I’m just using the wrong terms again, I’m talking about (I think) GraalJS and GraalPy - I’m not talking about the complete JVM. These terms just confuse me, because I don’t know where the lines are drawn.

If I still didn’t understand what you meant, please clarify :wink:

I didn’t mean to move any code into Core. What I tried to suggest that if the concepts in JSR223 and “whatever we want to call the Graal thing, Polyglot?” mismatch, maybe it would be better to make “a second provider of scripting functionality” then trying to force everything through the “signatures” of JSR223.

JSR223 is still very useful, it can provide for support for many other languages than the two provided by Graal, so I’d hate for the needs of “the Graals” to diminish/limit the possibilities of actual JSR223 implementations, in an attempt “to make both fit” through “the same door”.

Regarding the languages server and the IDE bundles, I’m not sufficiently informed about those topics to understand what you mean.

In openhab-cli console (Karaf) openhab> bundle:list -s | grep -i graa shows nothing, until MQTT, JavaScript or Python are installed. That is, any of the latter three Add-Ons does download and install GraalVM in openHAB.

And as far as I am aware nothing actually implements this in a way that end users can use them.

Within the same rule. But what if I access the Timer from the shared cache from another rule. for example, attempt to cancel it while it’s running? I’d expect a multithreraded exception in that case, right? Or are you saying that even that scenario would be handled. If that is handled, that’s good to know.

That isn’t related to what I was talking about, I was talking about making a second “pathway for supporting scripting” that was tailored to Graal and not bound by JSR223, so that this could be two different, parallel concepts that didn’t interfere and limit each other.

I’m not sure how much is missing, I suspect that it’s only the UI that doesn’t “support” it. It seems to me that it’s present pretty much throughout Core. But, since I’ve never sat down and figured out exactly how to use it, I haven’t tested it and don’t know what obstacles might have appeared. But, it’s quite obvious to me that it did work once upon a time.

This PR does at least support it to the extent it is involved:

https://github.com/openhab/openhab-core/pull/4633

1 Like

On the topic “openHAB and concurrency” why one and the same transformation, implemented by a rule engine, cannot run in parallel? That is, the transformation has to finish, before it is executed again. Preventing one and the same rule to run at the same time kind of makes sense, but why the same restriction is applied for transformations?

I’m probably not the correct person to answer this, but I don’t think this applies to transformations in general - but only script transformations. And, the reason is probably the same as we’ve discussed with rules, that the context (for JS and Pyhon at least) must be used synchronized/non-concurrently.

I’m pretty sure none of the automation add-ons nor Rules DSL support it. That had to happen before we even begin to worry about managed rules.

I too cannot speak from the code but I can speak from observation. I’ve done excessive testing in this area about a year and a half ago. And @florian-h05 mentioned some reliable info also.

Each transformation serving in a .js file gets it’s own ScriptEngine, same as for rules.

There can only be one transformation defined per .js file .

No matter how many times you apply that transformation, you are running in the same ScriptEngine.

For JS you can’t do that except sequentially. But it’s a problem for the other languages too.

Let’s say my transformation is complicated enough to define some variables. One use of the transformation starts running and sets the variables. Before it can use those variables another use starts running and overwrites the values in those same variables. Now the first use of the transform will continue running with the wrong data.

Adding the same transformation in two places does not create a new copy of the transformation. It’s closer to adding a new trigger to a rule.

Ways to achieve parallelism with script transformations include:

  • use inline transformations which each get their own ScriptEngine even if they are identical
  • create multiple copies of the transformation and only use each one in one place at a time

I cannot speak as to whether these are technical limitations or implementation choices. I can only speak to how it works right now.

I can’t answer this any further without digging in, all I can say is that there’s quite a lot of handling in place to make passing values between modules possible - so it’s hard to believe that this have never had any use.

For me a transformation is just like a function in any programming language. This function can define local variables, which are valid only during one function execution. It can also use variables, which outlive the function execution (global, static variables). The rule language, where the function/transformation is implemented, offers both local and global (static) variables. (openhab-core offers in addition sharedCache).

So when somebody implements a transformation, that person has to take care that the transformation (function) is multi-threading capable: it either uses only local variables (or does not use variables), or synchronizes the access to global variables.

openHAB shoud just permit parallel execution of the transformation, assuming that protection from race conditions is implemented in the transformation.

For JS this will still be serial execution, but for other JSR223 addons, there will be parallelism.

I’ve been using OH since version 1.6 and I’ve never seen an instance where it’s been used outside of some very recent developments in UI rules which can call Thing Actions directly. And I’m not sure even these use this same mechanism or something else.

There has never been a way to return something from a rule. In managed rules (I e. in the UI) there was talking of want to be able to get a return value from and action and pass it to the next. Apparently some work was done to super this or it was a feature planned for from the start that never got implemented.

As of today, there is no way for an end user to pass data and use them between actions and conditions using a return value.

It would be incredibly useful, but it’s never been finished.

1 Like

This is some of what I’m trying to point a finger at. I don’t think that is the correct way to do it. You don’t need to “set the engine context”, then execute, then reset. That’s only as long as you rely on the engine’s default context. I don’t think that’s multithreading is supposed to be done with the script engines.

Although it’s not exposed by AbstractScriptModuleHandler, the underlying ScriptEngine has:

    public Object eval(String script, ScriptContext context) throws ScriptException;

So, it doesn’t matter what the default engine context is. It should probably only be used in very simple scenarios where the engine only does one thing. Instead, the context should be kept together with the “task”, whether that is a transformation or a rule script, and then sent together with the “script” itself to eval().

That way you could drop a lot of locking, and use a single engine instance. What would need to be protected/synchronized/locked is access to the context object only.

I expect that the default context is only ever used if eval() is called without an explicit context, and that it is this whole process of setting the default/executing/unsetting that introduces the requirement for the engine lock in the first place.

Ok, I’m slightly confused about the relationships here now, because you said that one “Graal script engine instant” was in fact more like a context, and that they all shared the same script engine “under the hood”. Because of that, creating “new engines” constantly wasn’t so expensive, and because of this, they could share Source libraries as well.

But, if this is how it is done for Graal, it doesn’t mean that it’s the same for “real” JSR223 implementations, and there creating an engine probably actually means creating an enging - which is likely to be expensive, and not something you should keep recreating and throwing away.

This is why I said that if this “must be like that” because of how Graal is organized, it might be better to deal with them as two different entities, so that you wouldn’t have to force JSR223 implementations to do the constant engine creation.

I can’t really answer if they do or not, I would have to dig, But conceptually, I would think that the same should apply whether it’s Nashorn or Graal: The engine is an expensive resource, and can handle multiple threads. It should ideally be created once only. The context belongs to the task/script, and can have many instances. They are what “isolates the environment” for each task/script, and they may or may not handle multiple threads depending on guest language and implementation.

The main benefit I was thinking of was to stop recreating engines for JSR223 languages. I can’t understand anything other than that this is inefficient and expensive. I bet keeping the engine could make JSR223 languages quite a lot more “responsive”.

Ok, it was just a hypothesis anyway. I just can’t get past the constant recycling of script engines :wink:

No, that is my point. Forget the default context :wink:

How do you tell the difference between a Java object and a JS object from Java?

The problem is really that you can put thread-safe objects in the cache without “being unsafe”, but users generally have no way to know what is and isn’t thread-safe.

This is a not-well-thought-through idea and a long-shot, but maybe if concurrency annotations were included in Core, it would be possible to use these annotations to somehow inform what objects are and aren’t thread-safe (and as a result “shared-cache safe”):

https://github.com/openhab/openhab-core/issues/5033

I don’t quite see why there would be pushback for additions - they don’t break anything, and nobody forces you to use them, but they give additional options for those that want them. The ObjectCache I was talking about would essentially do what you already suggest that users does “manually”, serialize an object and then put the string in the shared cache. Having the serialization and deserialization taken care of should make doing this smoother.

edit: Regarding the origin of ScriptEngineManager, it seems that you’re right. It stems from here and originally looked like this:

I don’t quite get what it “manages”, but never mind that. I think, it, or something else, should make sure that an only single instance of each ScriptEngine is created and used.

For just some of the code handling this that i mentioned, take a look at this:

That said, what I see implemented is not a connection from one execution of the rule to the next (as far as I can understand). It’s a connection between the different “modules” of the rule, and I expect that this is “within the same run” only. Generally, only Conditions and Actions supports inputs, but they can have an “arbitrarty” number of inputs, limited of course by how many “source outputs” are available within the rule. Each module can only have one output, since the return value of “executing” the module “is” the output - and you can only return one entity. That said, it might potentially be possible to “cheat” here by returning some kind of container that contains multiple values.

This limitation means that Conditions can’t have outputs, because they are mandated to return a Boolean that indicates it condition is fulfilled or not. This means that there’s “no room” for them to return anything else, and the Action have no use for this information, because the Action would never run in the first place if the Condition didn’t evaluate to true.

So, it’s primarily a way to connect Trigger outputs/calculated values to Actions, and potentially information from one Action to another (although I’m not sure if Actions are executed in a predictable order if there are multiple, potentially limiting the usefullness of this).

With all these “restrictions”, basically only TriggerAction and TriggerCondition, where triggers aren’t usually scripted and thus probably don’t return anything of interest, the value of this whole system might be somewhat marginal. I’m not sure, there must have been some intended use case that was envisioned.

What I’ve made strikethrough above was a misinterpretation on my part. After further study, it seems like any Module can have multiple Outputs, but that they are supposed to be statically defined in the ModuleType. So that for example and Item state trigger could output the Item and the state, I assume.

This would make it useless for script use, as the number of and types of outputs would have to be defined once for all scripts through whatever ConditionType and ActionType they would use.

I can’t find any trace of this actually being defined anywhere though, not even for “built in” triggers. So, I suspect that there’s a whole lot of code and fields here that don’t do anything other than to complicate and confuse, and that I can’t quite imagine how could be utilized in a useful way.

Lots of strikethrough today, it seems I was wrong again. I haven’t managed to find where these outputs are defined, but it turns out that they are. Checking modules types in the REST API explorer is all it takes:

{
  "outputs": [
    {
      "name": "newStatus",
      "type": "org.openhab.core.thing.ThingStatus",
      "tags": [],
      "label": "New Status",
      "description": "The new status of the thing."
    },
    {
      "name": "oldStatus",
      "type": "org.openhab.core.thing.ThingStatus",
      "tags": [],
      "label": "Old Status",
      "description": "The old status of the thing."
    },
    {
      "name": "event",
      "type": "org.openhab.core.events.Event",
      "tags": [],
      "label": "Event",
      "description": "The event which was sent.",
      "reference": "event"
    }
  ],
  "uid": "core.ThingStatusChangeTrigger",
  "visibility": "VISIBLE",
  "tags": [],
  "label": "a thing status changes",
  "description": "This triggers the rule if a thing status has changed.",
  "configDescriptions": [
    {
      "context": "thing",
      "description": "The UID of the thing.",
      "label": "Thing",
      "name": "thingUID",
      "required": true,
      "type": "TEXT",
      "readOnly": false,
      "multiple": false,
      "advanced": false,
      "verify": false,
      "limitToOptions": true,
      "options": [],
      "filterCriteria": []
    },
    {
      "description": "The previous status of the thing.",
      "label": "Previous Status",
      "name": "previousStatus",
      "required": false,
      "type": "TEXT",
      "readOnly": false,
      "multiple": false,
      "advanced": false,
      "verify": false,
      "limitToOptions": true,
      "options": [
        {
          "label": "UNINITIALIZED",
          "value": "UNINITIALIZED"
        },
        {
          "label": "INITIALIZING",
          "value": "INITIALIZING"
        },
        {
          "label": "UNKNOWN",
          "value": "UNKNOWN"
        },
        {
          "label": "ONLINE",
          "value": "ONLINE"
        },
        {
          "label": "OFFLINE",
          "value": "OFFLINE"
        },
        {
          "label": "REMOVING",
          "value": "REMOVING"
        },
        {
          "label": "REMOVED",
          "value": "REMOVED"
        }
      ],
      "filterCriteria": []
    },
    {
      "description": "The status of the thing.",
      "label": "Status",
      "name": "status",
      "required": false,
      "type": "TEXT",
      "readOnly": false,
      "multiple": false,
      "advanced": false,
      "verify": false,
      "limitToOptions": true,
      "options": [
        {
          "label": "UNINITIALIZED",
          "value": "UNINITIALIZED"
        },
        {
          "label": "INITIALIZING",
          "value": "INITIALIZING"
        },
        {
          "label": "UNKNOWN",
          "value": "UNKNOWN"
        },
        {
          "label": "ONLINE",
          "value": "ONLINE"
        },
        {
          "label": "OFFLINE",
          "value": "OFFLINE"
        },
        {
          "label": "REMOVING",
          "value": "REMOVING"
        },
        {
          "label": "REMOVED",
          "value": "REMOVED"
        }
      ],
      "filterCriteria": []
    }
  ]
}

It also turns out that when the inputs aren’t explicitly defined, the rule engine “automatically maps” inputs and outputs. So, the question might rather be when it is useful to explicitly specify the mapping, but there’s probably a use case for that too.

I discovered that the “event object” that ends up in a SimpleRule Action/execute() is actually a map of the inputs to this Action. So, these values are already passed from the trigger to the action, and used by scripts :wink:

1 Like

I just tested this scenario:

  • A JRuby script created a long-running timer (loop 100 times, sleeping 1 minute inbetween loop executions. Note the timer itself is not looping / rescheduling. It runs only once). This timer is saved into shared cache
  • A jsscripting script accesses this timer through shared cache and cancels it

It worked just fine.

Ruby script:

shared_cache["shared_timer1"] = after 1.second do |timer|
  1.upto(100) do |i|
    logger.info "loop #{i}"
    sleep 1.minute
    break if timer.cancelled?
  end
end

JS:

if (cache.shared.exists('shared_timer1')) {
  console.log('shared timer exists')
}

const timer = cache.shared.get('shared_timer1', () => 'none') 
if (timer === 'none') {
  console.log("no timer")
} else {
  console.log(`timer cancelled: ${timer.isCancelled()}`)
  timer.cancel()
  console.log(`timer cancelled: ${timer.isCancelled()}`)
}
2 Likes

Is the Timer created by Ruby thread-safe? The JavaScripting add-on has its own, thread-safe implementation, from what I understand. The standard Core TimerImpl is not thread-safe, even though it has some synchronized keywords in it. It should be easy to make it thread-safe though.