I’ve looked over most of what is on here, there are a few more things for me to go through. Thank you to everyone who provided info. As it sits, I have a few theories and I think I have produced a way to test this all so we can try to get some answers.
As posted above, I created [automation] Make rule engine threading configurable by morph166955 · Pull Request #2182 · openhab/openhab-core · GitHub in an attempt to resolve this. There are three jar files to be replaced to test this. I would GREATLY appreciate anyone with this issue to apply these jars and associated configs listed below. A few things could happen:
- Enabling threadpools totally resolves this (unlikely, but possible). I believe the threadpools will definitely help this situation on lower memory systems. I’m concerned that the Pi4/4GB models are having this issue, I would have expected the memory on those platforms to seriously help this issue.
- Additional debugs added to the code may help identify where the memory leak is happening (more likely).
- Absolutely nothing is resolved and the debugs are worthless (highly unlikely, but it will at least tell us that the issue is not with the thread creation or the addition of [automation] Cache parsed script in order to improve performance by kaikreuzer · Pull Request #2057 · openhab/openhab-core · GitHub in 3.0.1 (highly unlikely that we will have complete worthless failure of this effort).
- Your system becomes completely unstable and you have to revert. I HIGHLY doubt this will happen. I’m running these jars on my system and have been for days with no issues. In the event this happens, just clear cache and it should revert this out and restore stability.
So, how to test.
Step 1: restart OH to make sure you have as stable of a system as possible. I REALLY don’t want anyone doing this on a system that’s been spinning for a while in the event that the updates push you over the edge. Let’s give this the best chance of success
Step 2: download the 3 jars from: Release Automation ThreadPool Test - Rev15 · morph166955/openhab-core · GitHub
Step 3: Login to your karaf console.
$ ssh -p 8101 openhab@localhost
openhab> bundle:list -s | grep org.openhab.core.automation
132 │ Active │ 80 │ 3.1.0.202102071659 │ org.openhab.core.automation
openhab> bundle:list -s | grep org.openhab.core.model.script.runtime
177 │ Active │ 80 │ 3.1.0.202102082247 │ org.openhab.core.model.script.runtime
openhab> bundle:list -s | grep org.openhab.core
128 │ Active │ 80 │ 3.1.0.202102061843 │ org.openhab.core
openhab> bundle:update 177 file:///home/openhab/org.openhab.core.model.script.runtime-3.1.0-SNAPSHOT-threadpool15.jar
openhab> bundle:update 132 file:///home/openhab/org.openhab.core.automation-3.1.0-SNAPSHOT-threadpool15.jar
openhab> bundle:update 128 file:///home/openhab/org.openhab.core-3.1.0-SNAPSHOT-threadpool15.jar
We need to replace the three files in sequence. Your ID numbers may be different than mine. Please be careful to update the specific bundles to match. To note, this (specifically the last one) WILL cause your OH to completely lose it’s mind so as soon as the updates are done, shutdown (do not restart, we have to make some config changes before you start it again) the OH service.
If you get a completely unstable system, you can run “sudo openhab-cli clean-cache” and it will clear all these jars out and load the stock/stable files. Again, we’re not making super major changes here so I doubt you will get a completely unstable system if you update.
Step 4: Shutdown OH if you haven’t…
Step 5: Make the following config changes:
in /etc/openhab/services/runtime.cfg
Add “org.openhab.threadpool:automation=30” to the end of the file.
in /var/lib/openhab/etc/log4j2.xml Add:
<RollingRandomAccessFile fileName="${sys:openhab.logdir}/threadpoolmanager.log" filePattern="${sys:openhab.logdir}/threadpoolmanager.log.%i" name="THREADPOOLMANAGER">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} [%-5.5p] [%-36.36c] - %m%n"/>
<Policies>
<OnStartupTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="16 MB"/>
</Policies>
</RollingRandomAccessFile>
<RollingRandomAccessFile fileName="${sys:openhab.logdir}/threadpoolqueue.log" filePattern="${sys:openhab.logdir}/threadpoolqueue.log.%i" name="THREADPOOLQUEUE">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} [%-5.5p] [%-36.36c] - %m%n"/>
<Policies>
<OnStartupTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="16 MB"/>
</Policies>
</RollingRandomAccessFile>
<RollingRandomAccessFile fileName="${sys:openhab.logdir}/automation.log" filePattern="${sys:openhab.logdir}/automation.log.%i" name="AUTOMATION">
<PatternLayout pattern="%d{yyyy-MM-dd HH:mm:ss.SSS} [%-5.5p] [%-36.36c] - %m%n"/>
<Policies>
<OnStartupTriggeringPolicy/>
<SizeBasedTriggeringPolicy size="16 MB"/>
</Policies>
</RollingRandomAccessFile>
And then further down:
<Logger additivity="false" level="TRACE" name="org.openhab.core.common.ThreadPoolManager">
<AppenderRef ref="THREADPOOLMANAGER"/>
</Logger>
<Logger additivity="false" level="TRACE" name="org.openhab.core.common.QueueingThreadPoolExecutor">
<AppenderRef ref="THREADPOOLQUEUE"/>
</Logger>
<Logger additivity="false" level="DEBUG" name="org.openhab.core.automation">
<AppenderRef ref="AUTOMATION"/>
</Logger>
<Logger additivity="false" level="TRACE" name="org.openhab.core.model.script.runtime">
<AppenderRef ref="AUTOMATION"/>
</Logger>
This hopefully will let us capture any failures or memory/thread issues.
Step 6: Start OH. You may (likely will) have to restart once or twice if your system is acting funny. It will need to recompile the majority of the libraries and this can cause some odd things until everything is cached. You can confirm that the jars worked by logging back into karaf and comparing to:
openhab> bundle:list -s | grep org.openhab.core
128 │ Active │ 80 │ 3.1.0.202102141801 │ org.openhab.core
openhab> bundle:list -s | grep org.openhab.core.automation
132 │ Active │ 80 │ 3.1.0.202102141805 │ org.openhab.core.automation
openhab> bundle:list -s | grep org.openhab.core.model.script.runtime
177 │ Active │ 80 │ 3.1.0.202102141807 │ org.openhab.core.model.script.runtime
Step 7: Monitor! If this works, please let us know. If this fails, also please let us know. I would like to see the three log files created above in either case to validate that the code is working and (if there is a failure) to see what is breaking.