[SOLVED] Rule execution delayed

rlkoshak · February 9, 2018, 10:28pm

I had no idea this was possible. Hurray! I learned something today!

Thanks for posing that. Another tool added to be tool box.

I would expect the amount of RAM needed by OH would go up a little bit and if you have a deadlock somewhere in your rules you might not discover that as easily or quickly but other than that I can’t think of anything else negative about upping it.

Alex2016 · February 9, 2018, 10:32pm

Thank you for your answers Rich and Mark. I will try this and give feedback. Sounds like a good solution for my problem.

mhilbush · February 9, 2018, 10:37pm

Truth be told, this was a new learning for me as well (hence the reason why I’m currently testing with a higher number of threads).

@Alex2016 Also note that when you are editing rules, whenever you save a rule and for the time it takes for openHAB to reload the rule (sometimes several seconds depending on the size of the rule and the power of the hardware), other rule executions are blocked. For example, I have one rule that takes about 12 seconds to reload, so there are situations where it can take longer than you might think.

Alex2016 · February 10, 2018, 9:30am

It looks great! The rules are now working as expected. Thank you both so much!

mhilbush · February 10, 2018, 1:49pm

Good news! I’m relieved to hear this resolved the issue. I wasn’t so sure this would do it.

Gives us more confidence that tweaking the rule engine threads using org.eclipse.smarthome.threadpool:RuleEngine can be another tool to resolve rule performance issues.

@Alex2016 Can you let us know what value you ended up using for RuleEngine threads? It would be good to get another data point besides the value I’m using.

Alex2016 · February 10, 2018, 7:36pm

Hi Mark,

I use

org.eclipse.smarthome.threadpool:RuleEngine=30

because I think with my rule setup I need so many parallel tasks. I have not noticed any performance issues or any other bad side effects with “30”.

But also the answer of Rich pointed me to my other mistake. I used Thread::Sleep a bit too intensive

omatzyo · February 10, 2018, 7:44pm

I’ve been at 20 since last night on a pi3. Working well with a similar situation to Alexander. Is there a reason it’s defaulted to 5?

Alex2016 · February 10, 2018, 7:52pm

This is a really good question. I also would suggest to use a higher number as standard.

But I think the really important point is, that this feature has to be in the runtime.cfg so anyone can find it.

I dont know anything about github. Is it right that I have to make a suggestion for adding this to the file on the following page:
https://github.com/openhab/openhab-distro/blob/master/distributions/openhab/src/main/resources/conf/services/runtime.cfg

mhilbush · February 10, 2018, 8:22pm

There’s already an open issue on github related to rule threads. I’ll probably update that issue with the results from this thread.

5 is the default number of the threads when you ask for a thread pool. Unless another number is specified, that’s what you get.

The-Elk · June 28, 2018, 7:31pm

If you want to know more what RuleEngine etc., means:

github.com/eclipse-archived/smarthome

Allow extra execution threads for use on large systems hosted on powerful hardware

opened 08:44PM - 06 Feb 18 UTC

closed 07:07PM - 14 Feb 18 UTC

sheppy99

question

I've just migrated a large OpenHAB system from 2 x Raspberry Pi's onto one i3-71…00u mini PC with 8GB RAM and after consolidating the rules from both machines I ran into issues with rules taking many seconds to fire due, I think, to awaiting spare program threads to use. Most of the time the Java process uses less than 5% of one CPU core and less than 12% memory, yet I seem to run out of resources for rules. There are no warnings in the log for this condition, rules just don't trigger for several seconds. I'm currently running Snapshot 1203 in case its relevant and had the same issues in V2.2 stable As an enhancement can a setting be added to expand the resources of OpenHAB? Apologies if I have posted this in the wrong place!

The safeCall thread pool mainly controls the concurrency for handling commands and state updates.

Every ThingHandler#handleCommand and ThingHandler#handleUpdate is run through the SafeCaller in its own Thread.

The ruleEngine thread pool controls the concurrency for rule execution, basically how many rules can be executed at the same time.

For tasks which are scheduled from a ThingHandler (like script execution from the Exec binding or periodic polling) there is a separate thread pool thingHandler which can be configured.

Every day I learned one more things.

Now I have no more delays on my rules.

THX

stefanbode · August 22, 2018, 7:43am

This seems to be the most experienced tread about delayed rule execution even if the thread is a bit old. I recently updated my openhab 1.8 to the new version 2.3 and experienced quite a lot of problem. Therefore still 1.8 is the active one and 2.3 is only running as a shadow as long as not all problems are solved. I configured a really simple system, but because of homematic binding with lots of things and items. For me, after doing many changes it looks like disabling rrd4j persistence to write every minute all values solves the problem (682 files in rrd4j). Any idea why there seems to be a relationship. I still have influxdb ON with the rule everyChange. Changing any of the threadpool did not help

waspie · August 22, 2018, 9:58am

I had a similar problem with many seconds of delay that ended up being traced back to rrd4j. I don’t have an answer for you. I limited what I wanted rrd4j to persist and started using mapdb for persisting * and restore on start.

stefanbode · August 22, 2018, 4:00pm

I found out that my rrd4j files are on a synology NAS quite a recent one. On the disk I have high load usage. this was caused 99% by rrd4j. Putting it on a single disk (no redudancy) and ext4 instead of btfrs the load is much lower. Then also the delay goes down. Without rrd4j everything is fine also on btfrs. By the way the influxdb is on the ext4 partition.

waspie · August 22, 2018, 6:50pm

Mine was on a Synology as well. I moved it for a few months to a NUC with just a regular HDD thinking something on the Syno was causing the issue but the issue moved. After a lot of watching processes and logs I narrowed it down to rrd4j. Once I stopped trying to persist everything the problem went away and I recently moved it all back to the Syno. The NUC was running ubuntu with ext4 (where the problem existed also).

Carywin · August 23, 2018, 1:37pm

Every time my rules engine stalls out I increase the thread pool a little, I’m up to 100 threads now and it’s still happening. This tells me I have a rule or rules that are not releasing their threads properly. I really need a tool that will let me see which rules are holding threads open.

rlkoshak · August 23, 2018, 3:30pm

I know of no tool that will tell you that short of logging.

That being said, there are a number of generic advice and things to look for here: Why have my Rules stopped running? Why Thread::sleep is a bad idea

tl;dr look for Thread::sleeps, long running Actions like executeCommandLine and sendHttp*Request, locks, and inadvertent feedback loops.

If you are running out of threads, you have at least one Rule that is taking longer to run than it is being invoked, or you have at least one Rule that is failing to exit.

Carywin · August 24, 2018, 4:37am

Thanks Rich,

I have already extensively audited my rules for the issues mentioned, and that did solve the problem for a while; but now I’m at the stage where staring at code and noodling with things isn’t helping any more, and I need some more visibility on the issue. The highest frequency rules run every 5 seconds and none of those should take any longer than some milliseconds to run, so I don’t think it’s an execution speed issue; I suspect some rules are not exiting properly.

My experience with Java is extremely limited, but with Python it’s quite easy to maintain a register of which threads are open and for what purpose. I would like to see that for the rules engine, with a command I can run in karaf to display the current rules engine thread table. It’s really quite poor design to have a major function of the system (rules engine) just silently stop working, at the very least we should be seeing warnings and errors in the logs about it.

Dixon · August 24, 2018, 9:03am

How do yo do that? I read on some posts that default thread pool was default to 5, but never found the way to change it…

mhilbush · August 24, 2018, 10:41am

rlkoshak · August 24, 2018, 1:42pm

It is easy to see how many threads are open. See New Add-on bundle for Prometheus health Metrics or more simply Why have my Rules stopped running? Why Thread::sleep is a bad idea (post 12) but there is no way I know of to tie those threads to a specific Rule. All you know is they are active.