(OH 1.x and OH 2.x Rules DSL only] Why have my Rules stopped running? Why Thread::sleep is a bad idea

rlkoshak · July 12, 2018, 2:00pm

The call to createTimer completes almost instantly and the rest of the Rule will run immediately without waiting. That is sort of the point.

Any code you have in your Rule that depends on the results from executeCommandLine, that code must be put inside the timer’s body.

morph166955 · July 12, 2018, 6:39pm

So given that everything ultimately needs to live on a thread somewhere, what is the logic of how the timer gets around this issue? Does it just push it to one of the other pools? If so, which pool? Also, what impact does that have on whatever else was supposed to live in that other pool? The last time I checked, the other 3 pools only have 5 or 10 threads each as well so aren’t we just shifting the problem around?

rlkoshak · July 12, 2018, 7:52pm

I know that Timers get executed by Quartz. But I don’t know if it uses a thread from the quartz thread pool or whether it creates a new thread. I suspect it creates a new thread because the default pool is 2 for quartz. (/usr/share/openhab2/runtime/etc/quartz.properties) and I’m certain I’ve had more than two running at the same time.

The only things that live in that pool are Timers (if it does use threads from that pool) and cron triggered Rules. Sometime this weekend I’ll try to set up an experiment to verify this. I’m just assuming based on the behavior I’ve see in my system so far.

We are not just shifting the problem around, we are shifting the problem to a place where it causes less of an impact. Assuming the Timers do run in a thread from the Quartz thread pool, then if we run out of those threads then the impact is only Timers and cron triggered Rules stop running rather than ALL Rules stop running.

We are also lessening the amount of time that the Threads are tied up because unlike a Thread::sleep that uses up a thread doing nothing, a Timer doesn’t use any Thread until it is actually running. So, if you look at the complex example, the while loop uses a Rule thread non-stop from the first movement detection until morning. With the Timer implementation it is only using a Quartz thread (assuming that it does use a thread from this pool) for a handful of milliseconds every minute, freeing up that thread for use the rest of the time.

So in many cases we are decreasing the likelihood that we will run out of threads because we are not tying one up doing nothing but waiting around, and we are lessening the impact should we run out of threads.

I’d be really interested in hearing if anyone encounters problems with Timers that may be caused by running out of Threads in the Quartz pool. If that does occur then we should file an issue to address that problem.

dan12345 · July 12, 2018, 8:01pm

Rich, this is all really helpful. Is there an easy way to see how many threads are running at any given time?

morph166955 · July 12, 2018, 8:11pm

This is an interesting approach. I’ve definitely run into this issue as I have rules that take a long time to run (mostly HTTP queries) as well as rules that try to run simultaneously (e.g. audio notifications for things like doorbells that go off in multiple rooms). It’s more noticeable to the simultaneous rules because you can hear the notification play at different times (in some cases as long as 45 seconds later) across the hosue. I dialed my rules threadpool much higher and it resolved my issues. I would be curious to know if there is a real/noticable difference between increasing the number of threads that the rules can use versus pushing it over to the Quartz pool. I’m not overly worried about hardware resources, the computer my OH runs on has plenty to spare. If so I’ll retool my rules to push those over.

rlkoshak · July 13, 2018, 4:39am

There is a way to hook up a development too called a profiler to OH and it will tell you how many threads are active. Beyond that I don’t think there is. Worrying about threads isn’t usually something users have to worry about.

There might be a way to get some logs from Quartz that might be informative.

But beyond knowing these things are possible I’m not much help. I haven’t used a profiler in a decade.

Karaf may have something built in.

michaeljoos · July 13, 2018, 7:41am

Hi Rich

Thanks for this, really helpful! One question…

Is it a difference between this rule:

rule "My sleeping rule"
when
    // and event
then
    // do some stuff
    createTimer(now.plusSeconds(1)),  [ |
        // do some more stuff
    ]
end

and this (different round brackets):

rule "My sleeping rule"
when
    // and event
then
    // do some stuff
    createTimer(now.plusSeconds(1),  [ |
        // do some more stuff
    ])
end

I mean for both cases the “// do some more stuff” is executed after one second, right?

Thanks
Michael

vzorglub · July 13, 2018, 8:16am

The second one is the recommended one in the docs.
See: https://www.openhab.org/docs/configuration/actions.html#timers

I know the the first one works but then we have a lambda declaration floating in the code after a comma. Doesn’t make sense to me.
I prefer the second one, the syntax makes more sense.

rlkoshak · July 14, 2018, 3:18am

Both are technically equivalent. As far as the Rules DSL is concerned they are identical.

The underlying Xtend language provides a little “syntatic sugar”. For method calls that take a lambda as it’s final argument, you can put the lambda definition outside the parens.

I don’t like this and recommend against it in the OH context because most people don’t realize they are creating a lambda Object and passing it to the createTimer method in the former case, whereas in the latter case that is made just a little more explicit since you are putting the lambda definition inside the parens of the method call.

In short, like Vincent says, the later example makes more sense to more people as it is more consistent with the rest of the Rules DSL.

5iver · July 14, 2018, 7:08am

You can use jconsole to view the threads, in real-time…

I find this easier to get some quick results…

shell:threads --list |grep "RuleEngine" |wc -l
shell:threads --list |grep "safeCall" |wc -l
shell:threads --list |grep "discovery" |wc -l
shell:threads --list |grep "thingHandler" |wc -l

Just remember the grep will show in the results, so subtract 1.

dan12345 · July 14, 2018, 11:54am

thanks - what exactly do these show? My results are 1, 6, 11 and 6…

Dan

5iver · July 14, 2018, 11:58am

Look at just the shell:threads --list. It will shw you all threads. Adding the grep “thingHandler” will show just the lines containing thingHandler. The wc -l provides a line count (number of threads). The counts change quickly… or should! In theory, the ruleTimer should be the important one for rules.

dan12345 · July 14, 2018, 12:00pm

I have 337 items in shell:threads. Surely that’s not right…

5iver · July 14, 2018, 12:03pm

That sounds about right. I have 382. There’s a lot going on back there!

Master79 · July 15, 2018, 10:08am

Hi Rich.
Want to ban my thread::sleeps in my rules.
If i have more than 1 sleeps, i use a timer in timer?

Situation:

Rollershutter up needs 20 seconds -> than
TTS notification for 13 seconds -> than
TTS notification for 10 seconds.

rule "My sleeping rule"
when
    // and event
then
    // rollershutter up
    createTimer(now.plusSeconds(20),  [ |
        // TTS notification1
      createTimer(now.plusSeconds(13),  [ |
          // TTS notification2
      ])
    ])
end

Same problem by switching 10 lights to on. To prevent an overflow at the lights hub, after every switch of a light, i use a sleep function by 25 mseconds.

And why this is okay too, without the “|”

var Timer timer = null

rule "my rule"
when
    Member of MyGroup changes
then
    if(timer !== null) {
        timer = createTimer(now.plusSeconds(1), [   // no need of "|" ?
            //Find the lowest and do what ever you do
            timer = null
        ])
    }
end

Greetings,
Markus

rlkoshak · July 15, 2018, 10:15pm

That is probably what I would do. There is a limited number of threads available to the Timers as well (maybe) so you don’t want to use one of them up doing nothing.

Though they don’t necessarily have to be nested.

createTimer(now.plusSeconds(20),  [ |
    // TTS notification1
])
createTimer(now.plusSeconds(33),  [ |
   // TTS notification2
])

See Design Pattern: Gate Keeper and look at the second to last section above. It is not always practical to remove thread sleeps. In those cases you just have to be more careful.

Because if you don’t have any arguments to pass to the lambda, the | is optional. I always include it to be consistent. It’s the same reason I always put the lambda inside the parens.

Pedals2Paddles · July 17, 2018, 10:27am

Thanks for this. I had a lot of Thread::sleep all over multiple rule files, to include in the startups. All ranging from 2 to 60 seconds and some within while loops. I knew it was probably not a great thing, but didn’t realize how harmful it could be until stumbling across this thread. I’ve since removed all the thread::sleep instances and reworked everything as described here. And the while loops are now timer loops. The only thing I need to do still is put some kind of sanity limit on the loops otherwise a edge case could make them loop for hours.

morph166955 · July 18, 2018, 12:35pm

Is there any benefits or difference between having rules all in one file versus spread across many?

Pedals2Paddles · July 18, 2018, 1:03pm

Organization and troubleshooting. A problem in one rule file will not necessarily affect all the others. For example, all the rules and items for my Z-Wave garage door openers are in their own item and rule files. They are picky and complicated devices and the rule file is quite long, so it’s best to keep them siloed. The Weather Underground item file is huge and that’s in it’s own for the same reason.

rlkoshak · July 18, 2018, 4:24pm

At runtime there is no difference between the two. Once OH loads them they are all in memory and behave the same.

So the only place where it matters to OH is when it loads the file. And here the main practical difference is that System started Rules fire when a .rules file loads so if you have everything in one .rules file then all of your System started Rules will fire whereas if you have is split into multiple files only the those in that one file will fire.

Beyond that I don’t think there are any practical differences as far as OH loading of files is concerned.

So the biggest practical difference between having multiple .rules files versus one big file is that global val/vars are only global to that one .rules file. So if you have a variable that needs to be used by multiple Rules, they all need to be in the same file.

But for you the human, working with thousands of lines long files (not an unheard of size for lots of OH systems) is awkward. So it makes sense to split up the files. There are several strategies people use. I prefer and recommend splitting both your .items and your .rules files up by function (e.g. lighting, weather, hvac, etc).This provides a logical organization for the files and decreases the likelihood that you will need to deal with the scope of global variables issue mentioned previously. But ultimately, you are the human who has to deal with all this stuff. Do what makes the most sense for you.