Thanks Rich,
I have already extensively audited my rules for the issues mentioned, and that did solve the problem for a while; but now I’m at the stage where staring at code and noodling with things isn’t helping any more, and I need some more visibility on the issue. The highest frequency rules run every 5 seconds and none of those should take any longer than some milliseconds to run, so I don’t think it’s an execution speed issue; I suspect some rules are not exiting properly.
My experience with Java is extremely limited, but with Python it’s quite easy to maintain a register of which threads are open and for what purpose. I would like to see that for the rules engine, with a command I can run in karaf to display the current rules engine thread table. It’s really quite poor design to have a major function of the system (rules engine) just silently stop working, at the very least we should be seeing warnings and errors in the logs about it.