I had no idea this was possible. Hurray! I learned something today!
Thanks for posing that. Another tool added to be tool box.
I would expect the amount of RAM needed by OH would go up a little bit and if you have a deadlock somewhere in your rules you might not discover that as easily or quickly but other than that I can’t think of anything else negative about upping it.
Truth be told, this was a new learning for me as well (hence the reason why I’m currently testing with a higher number of threads).
@Alex2016 Also note that when you are editing rules, whenever you save a rule and for the time it takes for openHAB to reload the rule (sometimes several seconds depending on the size of the rule and the power of the hardware), other rule executions are blocked. For example, I have one rule that takes about 12 seconds to reload, so there are situations where it can take longer than you might think.
Good news! I’m relieved to hear this resolved the issue. I wasn’t so sure this would do it.
Gives us more confidence that tweaking the rule engine threads using org.eclipse.smarthome.threadpool:RuleEngine can be another tool to resolve rule performance issues.
@Alex2016 Can you let us know what value you ended up using for RuleEngine threads? It would be good to get another data point besides the value I’m using.
If you want to know more what RuleEngine etc., means:
The safeCall thread pool mainly controls the concurrency for handling commands and state updates.
Every ThingHandler#handleCommand and ThingHandler#handleUpdate is run through the SafeCaller in its own Thread.
The ruleEngine thread pool controls the concurrency for rule execution, basically how many rules can be executed at the same time.
For tasks which are scheduled from a ThingHandler (like script execution from the Exec binding or periodic polling) there is a separate thread pool thingHandler which can be configured.
This seems to be the most experienced tread about delayed rule execution even if the thread is a bit old. I recently updated my openhab 1.8 to the new version 2.3 and experienced quite a lot of problem. Therefore still 1.8 is the active one and 2.3 is only running as a shadow as long as not all problems are solved. I configured a really simple system, but because of homematic binding with lots of things and items. For me, after doing many changes it looks like disabling rrd4j persistence to write every minute all values solves the problem (682 files in rrd4j). Any idea why there seems to be a relationship. I still have influxdb ON with the rule everyChange. Changing any of the threadpool did not help
I had a similar problem with many seconds of delay that ended up being traced back to rrd4j. I don’t have an answer for you. I limited what I wanted rrd4j to persist and started using mapdb for persisting * and restore on start.
I found out that my rrd4j files are on a synology NAS quite a recent one. On the disk I have high load usage. this was caused 99% by rrd4j. Putting it on a single disk (no redudancy) and ext4 instead of btfrs the load is much lower. Then also the delay goes down. Without rrd4j everything is fine also on btfrs. By the way the influxdb is on the ext4 partition.
Mine was on a Synology as well. I moved it for a few months to a NUC with just a regular HDD thinking something on the Syno was causing the issue but the issue moved. After a lot of watching processes and logs I narrowed it down to rrd4j. Once I stopped trying to persist everything the problem went away and I recently moved it all back to the Syno. The NUC was running ubuntu with ext4 (where the problem existed also).
Every time my rules engine stalls out I increase the thread pool a little, I’m up to 100 threads now and it’s still happening. This tells me I have a rule or rules that are not releasing their threads properly. I really need a tool that will let me see which rules are holding threads open.
tl;dr look for Thread::sleeps, long running Actions like executeCommandLine and sendHttp*Request, locks, and inadvertent feedback loops.
If you are running out of threads, you have at least one Rule that is taking longer to run than it is being invoked, or you have at least one Rule that is failing to exit.
I have already extensively audited my rules for the issues mentioned, and that did solve the problem for a while; but now I’m at the stage where staring at code and noodling with things isn’t helping any more, and I need some more visibility on the issue. The highest frequency rules run every 5 seconds and none of those should take any longer than some milliseconds to run, so I don’t think it’s an execution speed issue; I suspect some rules are not exiting properly.
My experience with Java is extremely limited, but with Python it’s quite easy to maintain a register of which threads are open and for what purpose. I would like to see that for the rules engine, with a command I can run in karaf to display the current rules engine thread table. It’s really quite poor design to have a major function of the system (rules engine) just silently stop working, at the very least we should be seeing warnings and errors in the logs about it.