(OH 1.x and OH 2.x Rules DSL only] Why have my Rules stopped running? Why Thread::sleep is a bad idea

rlkoshak · January 20, 2019, 4:59pm

Normally I agree 100%, but with Rules DSL, for many users, particularly developers, what makes sense didn’t work well in Rules DSL or is impossible. For example, a developer’s first inclination will be to break a problem up into functions. But Rules DSL’s support for functions are problematic (e. g. not thread safe, don’t handle errors well). A programmer would be inclined to build data structures but Rules DSL has no real support for that except faking it using maps and lists.

This is why I always push developers to JSR223. What makes sense to them is either a bad idea or not possible in Rules DSL.

That doesn’t mean that Rules DSL isn’t capable. But it requires an approach to coding that didn’t come naturally to developers.

tillykeats · January 14, 2020, 7:28pm

As someone who codes in embedded systems , as well as open systems but clearly embedded requires robust concrete design decisions consequent to limited and finite resources. This has allowed me to both experience and apply a number of techniques, mostly in design and deployment of run-time systems for embedded applications, process management, watchdogs, interrupt handlers, message passing system, distributed processing semaphores and so on.

I cant help feeling that the root cause here is the DSL model, and i do not claim to be any expert in this. However, let us explore the synopsis provided above.

The DSL allows 5 rules to be executed “concurrently”, i use the quotes because unless using transputer architecture and a codification in something like OCCAM, there’s no such thing as concurrency , moreover, proccesses are time sliced either by software (govenor/supervisor/kernel) or by hardware (interrupts).

So accepting this paradigm, we observe that DSL is acting as a govenor and not relying on hardware interrupts, which it probably could not use as it is unlikely to run as supervisor code, but usermode. Notwithstanding, DSL appears to define an arbitrary “concurrent thread” count of 5 with a FIFO backing queue. There appears to be no prioritisation of these “threads” or inherent process attributes which can be configured and managed at a supervisor level. In other words , the above synopsis would seem to me to infer that Thread::sleep is a blocking operation, is it really? Surely the operation would put a process to backing store (whatever that is in DSL/OH and underlying hardware, and implement some method of waking the process. But from reading the above, perhaps incorrectly, it would seem this is not the case.

DSL granted may not have been designed with full blown process management, I don’t know, but to me it seems that it’s weakness is that very same point, and that is driving design decisions in OH rules. Surely this is less than ideal. Users should be abstracted from underlying design layers and reasonably expect that a process management (rules scheduling) is constrained not by the user design, but rather the resources allocated to the user either directly through codification, or implicitly by a robust process manager capable of effective management of potentially unlimited processes (rules).

I repeat i am no expert in DSL but the very fact that this OP exists to guide users through “best practice” is in itself testament that the process engine isnt the best and could OH be improved by migrating to an alternative rules based applications?

I think the USA conjured up an apt and accurate paradigm named “duckduck” that refers to the simpe fact that if a system possesses weaknesses then “ignorant” users will still find them and break the system, typically unwittingly.

just a thought

rlkoshak · January 14, 2020, 7:42pm

5 was chosen as a happy medium in terms of required resources and OH’s ability to handle a typical amount of events in a timely fashion.

And I don’t think the queue is FIFO as if your events come in too fast the order of processing is not guaranteed.

Yes, Thread::sleep blocks the thread the Rule is running in, preventing that thread from being returned to the thread pool until the sleep completes and the Rule exits.

This is indeed not the case. And this is implemented by the underlying Java and Java Runtime Environment. There are constructs that do put the process “offline” but Thread::sleep isn’t one of them. It is possible to interrupt a sleeping thread, but you need to have a handle to that thread in the first place and we don’t have access to any of that stuff in the Rules.

Rules DSL is to be deprecated in OH 3. The default will be Scripted Automation using Python. None of the limitations above apply to Scripted Automation. You can sleep all day long in a Rule and it will have no impact on any of your other Rules, though it will consume resources and eventually you may run out of RAM if you have hundreds or thousands of Rules sleeping at the same time.

A thought we’ve had for years and are finally able to implement. This is why I am actively pushing people to use Scripted Automation over Rules DSL (see the reply I just made to another one of your posts).

The problem is known. A solution is mostly implemented. The solution will become the default in about a year when OH 3 comes out. You don’t have to wait that long though. You can start using it now.

tillykeats · January 14, 2020, 8:07pm

ah…
Python, well a version of it running over the JVM? as per your other post to me Ritch.

I see

The penny is dropping bud, i see the light. I’ll look at joining the beta tester programme as i do have some years of Python experience albeit hush hush from my university who better not find out I’ve “joined the dark side” ,

nelson.aponte · March 15, 2020, 3:23pm

I learned from somebody else about an alternative way to implement timers; however, I’m wondering which option is more efficient in terms of system performance, that alternative or Timers?

The alternative way works like this:

When Action 1 takes place (e.g., motion detected) a DateTime item is updated.
A recurring rule checks whether it has been X /seconds/minutes/hours/… after Action 1 took place (i.e., after the time set in the DateTime item).

rule "Motion detected"
when
    Member of group_Sensors_Motion_Triggered changed from OFF to ON
then
     //Control variable is updated with the timestamp of moment when Action 1 took place
     MBR_Sensor_Motion_LastActive.state = MBR_Sensor_Motion_LastUpdate.state
end


rule "Check motion"
when
    Time cron "0 0/5 * ? * *"
then
     if (now.minusMinutes(3).isAfter((MBR_Sensor_Motion_LastActive.state as DateTimeType).zonedDateTime.toInstant.toEpochMilli))
                    {
                    //Action 2 after desired time has elapsed
                }
end

The Check Motion rule below checks every 5 minutes if it has been 3 minutes after the timestamp set on item MBR_Sensor_Motion_LastActive.

Something I like about this approach is that it can be made resistant to system reboots by checking if the Item MBR_Sensor_Motion_LastActive is NULL and assigning the current time. That way, next time the periodic rule runs, it will work without problems and without having to take any manual actions.

However, there are 2 trade-offs:

A periodic rule is required
The action (2) to be taken after the desired time has elapsed will not be executed exactly 3 minutes after Action 1 took place. It will be executed between 3 and 7 (wanted time [3] + cron time [5]) minutes after Action 1 took place.

richaardvark · March 16, 2020, 10:12am

Is it possible to use something like:

createTimer(now.plusMilliseconds(1000)

or

createTimer(now.plusSeconds(.5)

?

I tried decimals and there were errors. And the API doesn’t mention anything about Milliseconds, so I would assume the answer is no. Sometimes I just need a quick pause between actions in a rule to allow hardware/processes to catch up, for example when I’m sending IR signals to turn down the TV volume a specific number of times. It takes too long/is weird when I have to wait a second in between stepping the volume levels up or down to a specific level automatically. But I can’t just send the signals out immediately, one after another, without it slipping up and missing the mark. I just need a very brief, < 1 second pause to allow things to catch up. Is there a code that will work here?

An alternative is for me to use the JSR223/Jython/ECMAScript/“Next-Generation” rule engine (why does everything have to be so complicated/have four names?? ), and to use this sleep/pause code:

java.lang.Thread.sleep(5000);

which does allow for milliseconds…but then I’m not able to make use of variables/other basic data transformations/conversions/manipulations in my rule, right?

rossko57 · March 16, 2020, 11:05am

Yes, but the method is .plusMillis()

No, the method requires an integer argument.

It’s really worth using VSCode editor + openHAB extension for rules editing. Not only does it validate and highlight errors like this, it autosuggests appropriate methods as you type.

richaardvark · March 16, 2020, 11:34am

Thank you for this helpful info! I’m so excited about .plusMillis() !!

I’m using VSCode right now as I write this and believe I’ve mostly figured out how to use many of the helpful features, but not entirely. I’m familiar with/am a fan of the error high-lighting/little squiggly orange underlines and red circles next to rules, and I have kind of figured out the auto-suggest feature, but I wish there was like an index or a more helpful wizard or something. I know I can begin to type and it will load possible command snippets/variables which I can select with enter… and I think I can “query” it with ctrl-space, but I don’t feel like I’m seeing a list of all possible strings/actions. Like, I wish I could see in detail every possible API function/call/action/transformation/manipulation, etc. directly in VSCode… with a brief explanation + example/link to more details for each item, in a hover-caption or index or something. I also see my openHAB server folder structure/files + Items/Things and their status in the side-panel, and I know I can right-click items and make rules/follow the link to their Paper UI home which is cool, but it’s all still not quite completely intuitive for me yet. Wish there were more wizards/auto-generator tools/drag-and-drop type interfaces/etc.

Also, I think supposedly I’m able to run openhab-cli commands directly from VSCode, and also am supposedly able to see the log somewhere in VSCode, but I haven’t for the life of me been able to figure this out. Can I edit HABPanel in VSCode, or preview it live? I know I can edit CSS for HABPanel vis VSCode, which has been helpful.

rlkoshak · March 16, 2020, 3:18pm

There is a “Terminal” tab at the bottom.

That gives you a command prompt on the machine on which VSCode is running. You can do anything there you can do in a command prompt in a terminal. If you don’t see that option, go to the “View” menu and choose “Terminal”.

richaardvark · March 16, 2020, 8:17pm

That’s pretty cool - thanks for pointing me in the right direction! I don’t mean to hijack this thread/can ask my question(s) in the proper channels, but is there any way to set this terminal to connect to the machine housing my openHAB server? I use VSCode on both a Windows machine and the Linux machine where openHAB resides - it would be so great to be able to run Linux commands remotely! I’ve tried to figure out the “Exec” binding for like six months now and have more or less given up at this point

rlkoshak · March 16, 2020, 8:27pm

Run ssh from that terminal to log into the remote machine. Search Google for how to set up ssh on a Windows machine.

noppes123 · March 16, 2020, 10:26pm

Do you run a desktop on the Linux system with VSCode or are you using the Remote-SSH extension? If not, I can recommend you to look it up.

martiniman · April 25, 2020, 5:53pm

Hi! Please help convert my rule right way using timers.
I need to flash by oled display On and OFF for 4 times:

		if (Alarm.state == ON) {
				(1..4).forEach[
					if (Alarm.state == ON){
						ESP_Lamp_Inverse.sendCommand(ON)
						Thread::sleep(500)
					} 	
					ESP_Lamp_Inverse.sendCommand(OFF)
					if (Alarm.state == ON){
						Thread::sleep(500)
					} 	
				]
			}

Udo_Hartmann · April 25, 2020, 7:42pm

As your code is not complete, I can only guess, but maybe this is what you want:

// define global vars outside the rule on top of file
var Timer tOled = null
var int iOled = 0
...


// inside the rule
...
if(Alarm.state == ON) {
    tOled?.cancel                                          // cancel any existing timer
    iOled = 0                                              // initialize counter
    tOled = createTimer(now.plusMillis(10), [ |            // initialize timer
        iOled ++                                           // count up
        // odd -> ON, even -> OFF
        ESP_Lamp_Inverse.sendCommand(
            if(ESP_Lamp_Inverse.state != ON) ON else OFF)  // toggle light
        if(iOled < 8)                                      // 2 times 4 
            tOled.reschedule(now.plusMillis(500))          // next step
        else 
            ESP_Lamp_Inverse.sendCommand(OFF)              // ensure light is OFF
    ])
} else {
    tOled?.cancel
    ESP_Lamp_Inverse.sendCommand(OFF)
}

martiniman · April 27, 2020, 5:15pm

Thank you!

Jagohu · May 4, 2020, 3:02pm

A little update on the “Aspirin Fix” for OH 2.5 - as I have been struggling to fix it in the past couple of days and it worked out, although it got a bit tricky.

I have added the following to /etc/openhab2/services/runtime.cfg

org.eclipse.smarthome.threadpool:thingHandler=50
org.eclipse.smarthome.threadpool:discovery=20
org.eclipse.smarthome.threadpool:safeCall=50
org.eclipse.smarthome.threadpool:ruleEngine=10

…but the number of threads did not actually increase, despite the change being reflected in the /var/lib/openhab2/config/org/eclipse/smarthome/threadpool.config file.

My /var/lib/openhab2/config/org/eclipse/smarthome/threadpool.config file contained the following lines by default as a fresh OH2.5 install (apart from the numbers):

:org.apache.felix.configadmin.revision:=L"13"
RuleEngine="10"
discovery="20"
safeCall="50"
service.pid="org.eclipse.smarthome.threadpool"
thingHandler="50"

…and it didn’t work, which was apparent both by general behaviour and by checking in the console (openhab-cli console) - it was only showing the same default 5 threads (and yes, it should show all of them even if they’re not in use, they just should be in the state TIMED_WAITING - it’s something else I didn’t know)
shell:threads --list |grep -i "ruleEngine"

Solution:
I had to change my /var/lib/openhab2/config/org/eclipse/smarthome/threadpool.config to this based on the suggestion from this GitHub thread - where the most important change is the change of the line from RuleEngine to ruleEngine:

:org.apache.felix.configadmin.revision:=L"17"
discovery="20"
org.quartz.threadPool.threadCount="20"
ruleEngine="10"
safeCall="50"
service.pid="org.eclipse.smarthome.threadpool"
thingHandler="50"

After startup the “Aspirin” now works perfectly, and additionally there’re 20 “openHAB-job-scheduler_Worker-” instances, which also helps (thanks to the line org.quartz.threadPool.threadCount=“20”).
I have also increased the thread count to 150 in the end in the ruleEngine parameter, my OpenHAB consumes 404 Mb of memory this way according to htop.

Please shut down OpenHAB completely, make the changes and restart it to make it work.

On the same note - in order to avoid Threads being overloaded on startup, try to make use of the “SystemStarting” switch as described here.
Additionally if you try to avoid rules with purely “changed” or “received update”, it helps a lot on startup as well. Use instead wherever possible

Itemname changed to 1

or

Itemname received update "true"

To me these are the things which worked - I’m running a system on a 4Gb RPI4 with 1213 items, 480 rules and 60 things with MySQL as persistence. Before my system took about 30 minutes to start up fully and now it takes 10 and there’re no error messages, no need to move/rename rules.

A big thanks to all the people who helped me to get to this solution and good luck to anyone who stumbles upon this topic in the future!

rlkoshak · May 4, 2020, 3:22pm

This is still way to long on an RPi 4. Do you have lots of primitives or unnecessary defining types in your Rules? The Rules parser has a really hard time with primitives and a somewhat harder time when you specify type unnecessarily. My theory is that when you do so the parser needs to do way more work at load time in order to ensure that that type is allowed in that context. At one point we were able to come up with one line of code that took several minutes to load on a fast Intel processor with plenty of RAM. When you avoid specifying type, those checks will wait for runtime vastly increasing your OH boot time.

As for editing the config file directly, I believe you can do this be creating /etc/openhab2/services/threadpool.cfg and populate it with those settings (don’t leave any out). That should cause OH to see the changes and overwrite threadpool.config with the changes.

The danger with editing the config file directly is that it’s an automatically generated file and it will become overwritten on the next upgrade of OH and you’ll have to make the change again.

I don’t know for certain that this will work but it’s worth a try. I’ll try it myself when I can get to my system.

JimT · March 28, 2021, 3:59am

With openhab 3.0’s removal of thread pooling, would it still be a bad idea to do a long (e.g. minutes) sleep or processing in a rule, assuming the system has a lot of memory (>= 16GB)

rossko57 · March 28, 2021, 10:29am

It depends on what you’re doing. During the long period, the rule cannot run again - but you can get more triggers queued up so that it executes again immediately after it’s finally finished. You can’t control that. Nor have you any way to cancel or abort the long wait.

rlkoshak · March 29, 2021, 6:55pm

To give a tl;dr to rossko57’s spot on reply, it’s less bad in OH 3. But it’s still not a great approach.

And there are rumblings of bringing back the thread pool so it would be best to avoid long sleeps where feasible.