Memory Leak in OH 4.1.1 and/or InfluxDB Persistence?

Platform information: Raspberry Pi 4
Hardware: Arm / 4GB RAM / 64GB SD
OS: Raspian 11
Java Runtime Environment: openjdk version “17.0.9” 2023-10-17
openHAB version: 4.1.1 on Openhabian

Since I’m running a (fresh) OH 4.1.1 Openhabian installation, the system seems to leak on Memory.
the only thing I added on the new installation is InfluxDB Persistence, where I store a few values in an external InfluxDB.

it worked for weeks up to a point where InfluxDB did not receive any value. checking Openhab log, I saw lot’s of errors about “Out of Memory” related to InfluxDB, as shown below

checking on OH Memory, consumption was up to ~1.5GB (still lot’s of free mem available ??)
restart of InfluxDB did not fix the issue, hence I decided to restart OH.
after restart last saturday, Memory cosumption was at 450MB. I decided to monitor Memory consumption on OH Graph and I see memory consumption go up and up
2 days later consumption is at almost at 700MB from initial 450, still growing. → see attached screenshot

I wonder if there’s a memory leak related to the InfluxDB Persistence Service ?

I don’t have a lot of Bundles loaded and have not had the issue with the old installation without InfluxDB persistence running

openhab> bundle:list | grep Add-ons
249 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Automation :: JavaScript Scripting
250 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Persistence Service :: RRD4j
251 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Exec Binding
253 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: NTP Binding
254 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Network UPS Tools Binding
255 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: iCalendar Binding
258 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: KNX Binding
282 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: MQTT Broker Binding
283 x Active x  81 x 4.1.1                  x openHAB Add-ons :: Bundles :: MQTT EspMilightHub
284 x Active x  81 x 4.1.1                  x openHAB Add-ons :: Bundles :: MQTT Things and Channels
285 x Active x  82 x 4.1.1                  x openHAB Add-ons :: Bundles :: MQTT HomeAssistant Convention
286 x Active x  82 x 4.1.1                  x openHAB Add-ons :: Bundles :: MQTT Homie Convention
287 x Active x  82 x 4.1.1                  x openHAB Add-ons :: Bundles :: MQTT Ruuvi Gateway
290 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Netatmo Binding
291 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Network Binding
292 x Active x  75 x 4.1.1                  x openHAB Add-ons :: Bundles :: Transformation Service :: Map
300 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: IO :: openHAB Cloud Connector
301 x Active x  80 x 4.1.1                  x openHAB Add-ons :: Bundles :: Persistence Service :: InfluxDB
openhab>

openhab.log error output

2024-02-10 10:16:49.754 [ERROR] [.influxdb.InfluxDBPersistenceService] - bundle org.openhab.persistence.influxdb:4.1.1 (301)[org.openhab.persistence.influxdb.InfluxDBPersistenceService] : Cannot register component
java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached
	at java.lang.Thread.start0(Native Method) ~[?:?]
	at java.lang.Thread.start(Thread.java:809) ~[?:?]
	at java.util.Timer.<init>(Timer.java:188) ~[?:?]
	at org.apache.felix.scr.impl.ComponentRegistry.updateChangeCount(ComponentRegistry.java:735) ~[bundleFile:?]
	at org.apache.felix.scr.impl.ComponentRegistry.registerComponentHolder(ComponentRegistry.java:295) ~[bundleFile:?]
	at org.apache.felix.scr.impl.BundleComponentActivator.validateAndRegister(BundleComponentActivator.java:455) [bundleFile:?]
	at org.apache.felix.scr.impl.BundleComponentActivator.initialize(BundleComponentActivator.java:244) [bundleFile:?]
	at org.apache.felix.scr.impl.BundleComponentActivator.<init>(BundleComponentActivator.java:218) [bundleFile:?]
	at org.apache.felix.scr.impl.Activator.loadComponents(Activator.java:592) [bundleFile:?]
	at org.apache.felix.scr.impl.Activator.access$200(Activator.java:74) [bundleFile:?]
	at org.apache.felix.scr.impl.Activator$ScrExtension.start(Activator.java:460) [bundleFile:?]
	at org.apache.felix.scr.impl.AbstractExtender.createExtension(AbstractExtender.java:196) [bundleFile:?]
	at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:169) [bundleFile:?]
	at org.apache.felix.scr.impl.AbstractExtender.modifiedBundle(AbstractExtender.java:49) [bundleFile:?]
	at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:488) [osgi.core-8.0.0.jar:?]
	at org.osgi.util.tracker.BundleTracker$Tracked.customizerModified(BundleTracker.java:420) [osgi.core-8.0.0.jar:?]
	at org.osgi.util.tracker.AbstractTracked.track(AbstractTracked.java:232) [osgi.core-8.0.0.jar:?]
	at org.osgi.util.tracker.BundleTracker$Tracked.bundleChanged(BundleTracker.java:450) [osgi.core-8.0.0.jar:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl.dispatchEvent(BundleContextImpl.java:949) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:234) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.framework.eventmgr.ListenerQueue.dispatchEventSynchronous(ListenerQueue.java:151) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEventPrivileged(EquinoxEventPublisher.java:229) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:138) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxEventPublisher.publishBundleEvent(EquinoxEventPublisher.java:130) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxContainerAdaptor.publishModuleEvent(EquinoxContainerAdaptor.java:217) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.container.Module.publishEvent(Module.java:499) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.container.Module.start(Module.java:486) [org.eclipse.osgi-3.18.0.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxBundle.start(EquinoxBundle.java:445) [org.eclipse.osgi-3.18.0.jar:?]
	at org.apache.karaf.bundle.command.Restart.doExecute(Restart.java:51) [bundleFile:?]
	at org.apache.karaf.bundle.command.BundlesCommand.execute(BundlesCommand.java:55) [bundleFile:?]
	at org.apache.karaf.shell.impl.action.command.ActionCommand.execute(ActionCommand.java:84) [bundleFile:4.4.4]
	at org.apache.karaf.shell.impl.console.osgi.secured.SecuredCommand.execute(SecuredCommand.java:68) [bundleFile:4.4.4]
	at org.apache.karaf.shell.impl.console.osgi.secured.SecuredCommand.execute(SecuredCommand.java:86) [bundleFile:4.4.4]
	at org.apache.felix.gogo.runtime.Closure.executeCmd(Closure.java:599) [bundleFile:4.4.4]
	at org.apache.felix.gogo.runtime.Closure.executeStatement(Closure.java:526) [bundleFile:4.4.4]
	at org.apache.felix.gogo.runtime.Closure.execute(Closure.java:415) [bundleFile:4.4.4]
	at org.apache.felix.gogo.runtime.Pipe.doCall(Pipe.java:416) [bundleFile:4.4.4]
	at org.apache.felix.gogo.runtime.Pipe.call(Pipe.java:229) [bundleFile:4.4.4]
	at org.apache.felix.gogo.runtime.Pipe.call(Pipe.java:59) [bundleFile:4.4.4]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
	at java.lang.Thread.run(Thread.java:840) [?:?]

When you see OOM errors, the parts of the code complaining is not always the root source of the problem. It Object A is leaking, you’ll get an error when Object B tries to acquire some memory. Consequently no conclusion can be draw as to the source of the memory leak based on the source of the OOM errors.

Java works as follows (simplified for clarity, it’s actually a bit more complicated in practice). It will acquire a certain amount of memory when it starts. As it runs it will continue to acquire more memory until a maximum amount is acquired. At that point it performs “garbage collection” where it internally frees up some memory to be used elsewhere in the program. But it does not give that memory back to the OS.

So the typical memory usage chart for a Java program is going to show it start up with a minimum amount of memory, grow to the maximum, and then stay there.

OOM errors occur when the max memory is reached but there is nothing left to garbage collect.

You can control the minimum and maximum through a command line arguments to the java command that starts OH: e.g. “-Xms400m -Xmx650m”

That’s why there is lots of free memory available. A Java program doesn’t grow to consume all available memory. It only grows up to a certain point then works with what it has. You can still get OOM errors even if there is tons of free memory.

Could be but I doubt it as there are lots of users of this binding and I’d expect more problems being reported if that were the source. Though you might have some edge case that no one has seen yet so it cannot be ruled out.

It’s easy enough to test though. Disable or remove it and see if the memory leak goes away. If not you know that’s not it and you can move on to the next binding. If so you’ve found the culpret and narrowed down the source and can open an issue.

1 Like

openHAB and all java processes use the heap and not ram like a normal program.

You can graph this usage with the channel usedHeapPercent in the system info binding.
Because of the garbage collection, you need to be graphing it very often to get a true picture of what it is doing, so this places an extra load on your system and best not to leave it running full time like I do for the reasons of catching a bug in my bindings quicker.

Hi @rlkoshak ,
thanks for the good explanation. really apreciate it.
I monitored further and still “loosing” memory every day. I did some further investigations and figured the anonymous processes (ANON) under the main Openhab Java Process are increasing day by day, taking up all the memory.
interesting also, the RES memory in top/htop is meanwhile higher than the Xmx in the /etc/default/openhab Files, which is set to 768

EXTRA_JAVA_OPTS="-Xms192m -Xmx768m -XX:+ExitOnOutOfMemoryError"

as of today, htop shows 785M RES and 1796M VIRT, both increasing day by day.

using pmap “OH java process id”, i see the “anon” processes increasing every day. after 1 week+, i count ~5000 anon processes.
example :

b9f10000     12K -----   [ anon ]
b9f13000    308K rw---   [ anon ]

I’m not an expert in that area. but searching on Google, I understood this is somehow related a problem with threading ?!?
any idea what could be wrong ? my OH installation is really simple. it’s openhabian based with very little bindings as per previous bundle:list output and ~ 5 very simples rules.
thanks.

You face a thread leak. Over time your OH installation generates new threads without suspending/removing old ones. Over time it looses ability to allocate new threads and ends up with this error:

This is out of memory, however message clearly indicates that process can not create more threads. Yet, to make things harder - it doesn’t mean that influxdb addon is guilty for sure. It is affected cause it attempts to create a new thread, but this error might occur in any other place which attempts to start a new thread.

One thing which wonders me - do you restart periodically any of these addons? Cause error stack trace you have tells me you restarted addon.

Hi @splatch no, i don’t. usually, OH runs for month withouth any intervention. when influxdb stopped sending records, the 1st think i tried is to re-start the addon. that’s probably why you see this in the log. but since it did not fix the issue, i finally restarted OH.

any idea what could cause the behavior ? this is a complete new OH 4.1 installation. i have not had the issue with the old installation, which was running for a long time. reason for new install was, old instance was based on older OS, hence I decided to setup a complete new instance.

What is different between the two? Is it a pure reload of the backup into the new OS or did you change other stuff? Need to work out what has changed then then eliminate each change one by one till you work out the cuase. If it really is only the influxdb, then stop using it to see if the problem goes away then someone can look into why once it is narrowed down and doubly confirmed.

I’ve been using Influx with OH 4.1.1 since I moved from 3.4.5 and have seen no symptoms of a memory leak.

I run only OH on my Pi and run Influx on a separate machine.