So first of all, I am back to business:
I still need to re-incorporate 2-3 rules and run some experiments.
So far my take-aways are:
- upgrade => clear cache
- High CPU =>
top -H -p $(pgrep -x java)
to dig in the specific process that goes rogue - issues with rules => Move them all to a
disabled
folder and bring them back in small batches until it breaks again - do monitor your cpu/load/temp to get an early alert if a “bad” rule is introduced
On the last point, I am using GIT to manage changes, this is very useful to keep track ot “what has changed recently”. I realize however that I almost never spot newly introduced issues right away unless the changes have a big impact (such as increasing the load significantly as shown on the graph above).