tl;dr look for Thread::sleeps, long running Actions like executeCommandLine and sendHttp*Request, locks, and inadvertent feedback loops.
If you are running out of threads, you have at least one Rule that is taking longer to run than it is being invoked, or you have at least one Rule that is failing to exit.
I have already extensively audited my rules for the issues mentioned, and that did solve the problem for a while; but now I’m at the stage where staring at code and noodling with things isn’t helping any more, and I need some more visibility on the issue. The highest frequency rules run every 5 seconds and none of those should take any longer than some milliseconds to run, so I don’t think it’s an execution speed issue; I suspect some rules are not exiting properly.
My experience with Java is extremely limited, but with Python it’s quite easy to maintain a register of which threads are open and for what purpose. I would like to see that for the rules engine, with a command I can run in karaf to display the current rules engine thread table. It’s really quite poor design to have a major function of the system (rules engine) just silently stop working, at the very least we should be seeing warnings and errors in the logs about it.