Network Binding Causing OutOfMemoryError

CrazyIvan359 · January 4, 2019, 12:03am

I’ve been having issues with OutOfMemoryError coming up a couple of times a day on my openHAB server and I’m hoping someone can give me some idea where/how to look for what the root cause of this is.

Here is what I know and what I’ve tried so far:

The error is always the same (see below), its failing to spawn a new thread for the Network binding Presence Detection
Almost every time it happens, but not always, openHAB crashes and restarts automatically
I have tried turning on DEBUG level logging for the Network binding with log:set DEBUG org.openhab.binding.network but no entries appear in the log from the Network binding
I have tried doing a manual Heap Dump from the console, but the output does not include a .hprof file that I can open in the Eclipse MAT
I have also looked through the output of the manual dump and found nothing out of ordinary, although I was only able to generate the dump after openHAB had restarted because there is no warning when the error is about to happen
I have tried adding EXTRA_JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError" to my /etc/default/openhab2 file to get automatic dumps when the error happens, but I have seen no output in /var/lib/openhab2 which, as far as I can tell, is where I should see it

I realize this is a complex issue to resolve, and I am not looking for someone to do it for me, just looking for ideas from people with more experience with Java or just a fresh perspective that might see something I’ve missed.

Thanks
Mike

system info:
openHAB 2.3.0-1
debian linux 4.9.0-4-amd64
quad core intel 3gb ram
Java VM: Java HotSpot(TM) 64-Bit Server VM 25.191-b12
vendor: Oracle Corporation
version: 1.8.0_191

2019-01-03 08:07:29.110 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception
java.lang.OutOfMemoryError: unable to create new native thread
	at java.lang.Thread.start0(Native Method) ~[?:?]
	at java.lang.Thread.start(Thread.java:717) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1367) ~[?:?]
	at org.openhab.binding.network.internal.PresenceDetection.performPresenceDetection(PresenceDetection.java:266) ~[?:?]
	at org.openhab.binding.network.internal.PresenceDetection.lambda$4(PresenceDetection.java:479) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:?]
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:?]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:?]
	at java.lang.Thread.run(Thread.java:748) [?:?]

wborn · January 4, 2019, 12:25am

Upgrade to 2.4.0 which fixes some memory leaks.
If that doesn’t help, please do get a heapdump because logging won’t help. See How to find the cause of a OutOfMemoryError in OH2?.

David_Graeff · January 4, 2019, 12:28am

The network binding is leaking, that is true. But the source has not yet be found.
In another recent thread some developer discovered tons of suppressed Interrupted exceptions.
We are still on it.

But OH 2.4 is definitely a better choice then 2.3 for the binding.

CrazyIvan359 · January 4, 2019, 12:46am

@wborn

I will look into upgrading, thanks for the tip
I have already looked through that thread and am stuck at not being able to get a heap dump (see above). Maybe I’ve missed something?

@David_Graeff
That’s reassuring. Sorry, I forgot to indicate that I am running OH 2.3.0-1.
If you need I can provide any logs or dumps to help diagnose, though I am having a bit of trouble getting a dump to generate when the error happens

wborn · January 4, 2019, 11:20am

In 2.4.0 it will again add the the heap dump to the ZIP file when creating a dump on the Console:

openhab> dev:dump-create 
Created dump zip: 2019-01-04_111208.zip

The ZIP file is created in the userdata directory (e.g. /var/lib/openhab2/). If you extract that ZIP file, it will contain the heap dump as a heapdump.hprof file.

CrazyIvan359 · January 4, 2019, 11:30pm

Sorry, I should have been more clear. I am able to get a heap dump zip using that command, but it does not contain a heapdump.hprof file, maybe that is normal on 2.3?

I am hoping to have time to upgrade to 2.4 this weekend. The changes in MQTT mean I have a lot of work to do though. I will report back (hopefully with a heap dump in hand) if I am still having this issue with 2.4

Thank you

wborn · January 4, 2019, 11:39pm

Yes it’s a known issue with openHAB 2.3.0 (Karaf 4.1.5) and newer Java 1.8.0 update versions and fixed in 2.4.0 (Karaf 4.2.1), see KARAF-5796.

Overbryd · August 20, 2020, 1:01pm

This is still an issue in August 2020.

Openhab version 2.5.7

Having the network binding with only the following things configured causes it to rapidly exceed my docker memory limit (countless restarts throughout the day).
For now I removed (the very valuable) network binding, and things went back to normal.

Thing network:pingdevice:redacted [ hostname="redacted", retry=10, timeout=20000 ]
Thing network:pingdevice:redacted [ hostname="redacted", retry=10, timeout=20000 ]
Thing network:pingdevice:redacted [ hostname="redacted", retry=10, timeout=20000 ]
Thing network:pingdevice:redacted [ hostname="redacted", retry=10, timeout=20000 ]
Thing network:pingdevice:redacted [ hostname="redacted", retry=10, timeout=20000 ]
Thing network:pingdevice:redacted [ hostname="redacted", retry=10, timeout=20000 ]
Thing network:speedtest:local [ refreshInterval=3600, uploadSize=1000000, url="http://fra36-speedtest-1.tele2.net/", fileName="1GB.zip" ]

rossko57 · August 20, 2020, 2:40pm

A fairly obvious step would be to eliminate the speedtest Thing only, which seems to involve gigabytes, and see with just the the pings enabled.

What is the host OS here?