How to find the cause of a OutOfMemoryError in OH2?

NCO · September 22, 2018, 7:01am

One more rookie question.

Would it help to remove the bindings one by one to see if OH is coming back to life?
(I still get log entries in openhab.log every now and then)

wborn · September 22, 2018, 7:09am

If your whole OH doesn’t work at all, I think the OutOfMemoryError is just a side effect of some other issue.

If OH works fine for a couple of days and than crashes due to the OutOfMemoryError you can remove some bindings to find which binding is causing this. This will only work if the memory leak is caused by a binding. It could be anywhere, e.g. Eclipse SmartHome or your rules. Without analyzing the heap dump you are basically searching for a needle in a haystack.

NCO · September 22, 2018, 7:13am

Alright, thanks.
I though it might be worth playing around, but for activating the heapdump parameter in /etc/default/openhab2 I need to restart anyway…

wborn · September 22, 2018, 7:28am

Good! And always try updating to the latest SNAPSHOT release which should have all known memory leaks fixed. With the SNAPSHOT release you’ll only experience new and unknown memory leaks.

NCO · September 22, 2018, 7:41am

NCO · October 24, 2018, 1:24pm

I just got another memory leak and a java… file has been created.
So what to do with it?
for me it looks like crap

JAVA PROFILE 1.0.1^@^@^@^@^D^@^@^Af¦/e^Z^A^@^@^@^@^@^@^@-`I<83>^X(II)Ljava/time/temporal/TemporalAdjuster;^A^@^@^

EDIT:
Checked with Eclipse MAT:

But how to proceed from here?

wborn · November 17, 2018, 9:03am

It looks like there is some kind of memory leak in the sitemap events mechanism.

If Eclipse MAT shows the same “Problem Suspect” with recent openHAB 2.4.0 builds, we should create an issue for this so it is solved in Eclipse SmartHome.

NCO · November 17, 2018, 6:09pm

hmmm… sitemap… I don’t use any sitemap since I switched to HABpanel…!?

wborn · November 18, 2018, 11:09am

HABpanel is using the same Server Sent Event (SSE) mechanism to get notified of updates like Basic UI and Paper UI. According to the heapdump analysis it was consuming more than half of all the memory used by openHAB. Was that heapdump made with openHAB 2.4.0? Do you have a lot of devices running HABpanel?

NCO · November 19, 2018, 9:05am

Yes according to my notes this was with M4 - although I did have the heap dump issue with 2.3 stable as well.

I am running Habpanel just on my cell phone and a wall mounted tabled (different panel config).

Thanks for pointing this out. I will try to eliminate the habpanel access for both devices to nail the root cause down.

Patrick_Beauchamp · October 16, 2019, 1:22am

Hello,

info: openhab running on a pi3+

Recently, i’ve updated my openhab from 2.3to 2.4. I aslo i added spotify (+market) binding, adjusting stuff in habpanel and sitemap for the new binding and did some more rule. After a couple days i’ve notice that habpanel wouldn’t load anymore. I checked the log and see the famous java heap problem:

 2019-10-13 09:58:02.015 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin
2019-10-13 09:58:08.463 [INFO ] [.transport.mqtt.MqttBrokerConnection] - Starting MQTT broker connection to '**IPADDRESS**' with clientid paho79431135232 and file store '/var/lib/openhab2/mqtt/**IPADDRESS**'
2019-10-13 09:58:25.412 [INFO ] [mmandclass.ZWaveDoorLockCommandClass] - NODE 6: Door-Lock config report - timeoutEnabled=false timeoutMinutes=254, timeoutSeconds=254
2019-10-13 17:56:39.655 [ERROR] [o.client.mqttv3.internal.ClientState] - paho79431135232: Timed out as no write activity, keepAlive=60,000 lastOutboundActivity=1,571,003,534,821 lastInboundActivity=1,571,003,670,914 time=1,571,003,703,024 lastPing=1,571,003,534,821
2019-10-13 18:13:44.897 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@2d4e16 in QueuedThreadPool[HttpClient@e2509f]@1bad41e{STARTED,8<=8<=200,i=5,q=0}[ReservedThreadExecutor@12d80b{s=0/4,p=0}]
2019-10-13 18:25:36.709 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@8bcebf in QueuedThreadPool[HttpClient@1c2769d]@1c01699{STARTED,8<=8<=200,i=5,q=0}[ReservedThreadExecutor@b450f6{s=0/4,p=0}]
2019-10-13 18:25:28.228 [WARN ] [me.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.eclipse.smarthome.core.internal.items.ItemUpdater@16525e8' takes more than 5000ms.
2019-10-13 18:25:21.105 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1c5c303 in QueuedThreadPool[HttpClient@936858]@c4d222{STARTED,8<=8<=200,i=5,q=0}[ReservedThreadExecutor@1fa50a3{s=0/4,p=0}]
2019-10-13 18:26:49.040 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1a57005 in QueuedThreadPool[HttpClient@152e1d0]@436c3d{STARTED,8<=8<=200,i=5,q=0}[ReservedThreadExecutor@1629801{s=0/4,p=0}]
2019-10-14 03:00:45.261 [WARN ] [me.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.eclipse.smarthome.io.monitor.internal.EventLogger@b22679' takes more than 5000ms.
2019-10-14 03:08:12.668 [ERROR] [WaveSerialHandler$ZWaveReceiveThread] - Exception during ZWave thread. 
java.lang.IllegalStateException: Queue full
	at java.util.AbstractQueue.add(AbstractQueue.java:98) ~[?:?]
	at java.util.concurrent.ArrayBlockingQueue.add(ArrayBlockingQueue.java:312) ~[?:?]
	at org.openhab.binding.zwave.internal.protocol.ZWaveTransactionManager.processReceiveMessage(ZWaveTransactionManager.java:404) ~[240:org.openhab.binding.zwave:2.4.0]
	at org.openhab.binding.zwave.internal.protocol.ZWaveController.incomingPacket(ZWaveController.java:1062) ~[240:org.openhab.binding.zwave:2.4.0]
	at org.openhab.binding.zwave.handler.ZWaveControllerHandler.incomingMessage(ZWaveControllerHandler.java:643) ~[240:org.openhab.binding.zwave:2.4.0]
	at org.openhab.binding.zwave.handler.ZWaveSerialHandler$ZWaveReceiveThread.run(ZWaveSerialHandler.java:316) [240:org.openhab.binding.zwave:2.4.0]
2019-10-14 17:27:40.840 [WARN ] [org.eclipse.jetty.server.HttpChannel] - /habpanel/app/widgets/frame/frame.widget.js
java.lang.OutOfMemoryError: Java heap space
2019-10-14 17:27:40.840 [WARN ] [org.eclipse.jetty.server.HttpChannel] - /habpanel/app/widgets/chart/chart.widget.js
java.lang.OutOfMemoryError: Java heap space
2019-10-14 17:28:06.315 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - 
java.lang.OutOfMemoryError: Java heap space
//and so on...

It’s talking about Z-wave in this example, but in other it’s talking about MQTT and in another it’s the execution of a rule that normally is woking normally. I think it’s only a coincidence that this program call something and the java heap space is already full.

Since the first error, i’ve uninstall openhabcloud, market, spotify, network and astro bindings. I’ve commented all the code referring to those even in the sitemap. With no result at all.

I’ve doubled the max heap size and activate creation of .hprof in the file /etc/default/openhab2:

EXTRA_JAVA_OPTS="-Xms250m -Xmx700m"
EXTRA_JAVA_OPTS="-XX:+HeapDumpOnOutOfMemoryError"

The heap size hasn’t done anything, it’s still give me the error ususally a ocuple of hours after a reboot. Here’s the result of the .hprof analyser: ( I analyse 3 differrent .hprof with the same result)

But this i cant really make sense if someone could help me.

There is also this error file i found that give some guideline that i’ll to follow next but i don’t have much faith in that…

hs_err_pid30247.log (16.3 KB)

I’m at a point where i would just wipe out the pi and do it all over. It wouldn’t be that much work but i fear that problem would be back after a while…

So if someone could shed some light upon this i think it would help a lot of people

Thanks

rlkoshak · October 16, 2019, 4:21pm

There are reports of memory leaks in several threads but no one has yet been able to pinpoint the cause. We are under the assumption at this point that the source of the leak is not in OH core or else far more users would be reporting them so definitely focus you attention on the add-ons. I agree, the add-on that first encounters the OutOfMemoryError is not likely to be the source of the leak.

You have somewhat of an advantage as you run out of memory a couple of hours after reboot instead of once a month. So you are in a better position than others to systematically uninstall add-ons (don’t forget persistence, actions, and transformations) to see if one seems to make a difference in how quickly it runs out of memory. Given how fast it runs out I would not be surprised if you have more than one.

Patrick_Beauchamp · October 16, 2019, 10:43pm

Ok then let’s do it.

Won’t be long, this is all that is left

# A comma-separated list of bindings to install (e.g. "binding = sonos,knx,zwave")
binding = zwave,kodi,nest,mqtt,exec

# A comma-separated list of UIs to install (e.g. "ui = basic,paper")
ui = paper,habmin,habpanel,basic,classic

# A comma-separated list of persistence services to install (e.g. "persistence = rrd4j,jpa")
#persistence =

# A comma-separated list of actions to install (e.g. "action = mail,pushover")
action = mail

# A comma-separated list of transformation services to install (e.g. "transformation = map,jsonpath")
#transformation =

# A comma-separated list of voice services to install (e.g. "voice = marytts,freetts")
#voice =

# A comma-separated list of miscellaneous services to install (e.g. "misc = myopenhab")
misc =

Tonight is a fisrt i wasn’t even able to open a share or enter in the piwith ssh. No mention of java since my last reboot though:

2019-10-15 07:08:05.346 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - 
java.lang.OutOfMemoryError: Java heap space
2019-10-15 21:26:26.686 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
2019-10-15 21:26:27.213 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Time zone set to 'SOMEWHERE'.
2019-10-15 21:26:27.229 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Location set to 'SOMEWHERE'.
2019-10-15 21:26:27.238 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Locale set to 'SOMETHING'.
2019-10-15 21:26:46.883 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'nest.items'
2019-10-15 21:26:47.162 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'kodiSalon.items'
2019-10-15 21:26:47.319 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'spotify.items'
2019-10-15 21:26:47.354 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'astro.items'
2019-10-15 21:26:47.442 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'network.items'
2019-10-15 21:26:47.476 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'zwave.items'
2019-10-15 21:26:47.566 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'kodiSousSol.items'
2019-10-15 21:26:47.693 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'arduino.items'
2019-10-15 21:26:47.732 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'exec.items'
2019-10-15 21:26:58.002 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'kodiSalon.rules'
2019-10-15 21:26:58.919 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'notifications.rules'
2019-10-15 21:26:59.635 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'maison.rules'
2019-10-15 21:27:01.502 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'laveVaisselle.rules'
2019-10-15 21:27:01.612 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'spotify.rules'
2019-10-15 21:27:04.207 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'kodiSS.rules'
2019-10-15 21:27:04.391 [WARN ] [el.core.internal.ModelRepositoryImpl] - Configuration model 'spotify.rules' is either empty or cannot be parsed correctly!
2019-10-15 21:27:04.674 [INFO ] [thome.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
2019-10-15 21:27:05.349 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'devSitemap.sitemap'
2019-10-15 21:27:07.308 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Connecting to serial port '/dev/ttyACM0'
2019-10-15 21:27:08.577 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Serial port is initialized
2019-10-15 21:27:08.653 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller
2019-10-15 21:27:08.655 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.
2019-10-15 21:27:10.191 [INFO ] [.dashboard.internal.DashboardService] - Started Dashboard at http://IPADDRESS:8080
2019-10-15 21:27:10.195 [INFO ] [.dashboard.internal.DashboardService] - Started Dashboard at https://IPADDRESS:8443
2019-10-15 21:27:11.691 [INFO ] [arthome.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui
2019-10-15 21:27:11.870 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin
2019-10-15 21:27:19.382 [INFO ] [.transport.mqtt.MqttBrokerConnection] - Starting MQTT broker connection to 'IPADDRESS' with clientid ID and file store '/var/lib/openhab2/mqtt/IPADDRESS'
2019-10-15 21:27:23.220 [INFO ] [mmandclass.ZWaveDoorLockCommandClass] - NODE 6: DEVICE config report - timeoutEnabled=false timeoutMinutes=254, timeoutSeconds=254
2019-10-16 02:27:07.636 [WARN ] [.serialmessage.ZWaveCommandProcessor] - SerialMessage class null is not implemented!
2019-10-16 02:27:07.639 [WARN ] [.serialmessage.ZWaveCommandProcessor] - SerialMessage class null is not implemented!
2019-10-16 02:27:07.641 [WARN ] [ve.internal.protocol.ZWaveController] - TODO: Implement processing of Request Message = -- (0x0)
2019-10-16 05:24:31.780 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
2019-10-16 05:24:31.815 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Time zone set to 'SOMEWHERE'.
2019-10-16 05:24:31.849 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Location set to 'SOMEWHERE'.
2019-10-16 05:24:31.858 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Locale set to 'SOMETHING'.
2019-10-16 18:29:32.951 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'nest.items'

Yesterday, after the post, i’ve reduce heap size to:
EXTRA_JAVA_OPTS="-Xms64m -Xmx128m"

By default, it’s was
EXTRA_JAVA_OPTS="-Xms250m -Xmx300m"

And i tried without success:
EXTRA_JAVA_OPTS="-Xms250m -Xmx700m"

I’ll let it crash again to see if java is error come back, then i’ll begin to uninstall add-ons.

rlkoshak · October 17, 2019, 2:13pm

That points to something else might be going on on the machine itself that may be independent and in addition to a memory leak in OH. Though setting the max heap size to 700 might be way too high for an RPi which doesn’t have that much RAM to begin with. Assuming an RPi 3, that only leaves around 300 MB of RAM to run everything else on the machine.

Patrick_Beauchamp · October 17, 2019, 11:40pm

ok so the java error is back

2019-10-16 18:29:32.951 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'nest.items'
     *** some startup event ***
2019-10-17 02:39:00.574 [ERROR] [o.client.mqttv3.internal.ClientState] - paho8400789633: Timed out as no write activity, keepAlive=60,000 lastOutboundActivity=1,571,293,516,522 lastInboundActivity=1,571,293,657,254 time=1,571,293,980,058 lastPing=1,571,293,516,522
2019-10-17 02:39:05.710 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@124bd01 in QueuedThreadPool[HttpClient@7c76b]@59107{STARTED,8<=8<=200,i=5,q=0}[ReservedThreadExecutor@11b29fd{s=0/4,p=0}]
2019-10-17 02:39:04.032 [ERROR] [ntime.internal.engine.RuleEngineImpl] - Rule 'Lave Vaisselle PhotoValue String to Float': An error occurred during the script execution: Java heap space
2019-10-17 03:11:40.073 [ERROR] [org.apache.felix.fileinstall        ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
2019-10-17 03:11:36.322 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - 
java.lang.OutOfMemoryError: Java heap space

So rebooted at 18:29 and bug happen at 2:39 = took around 8 hours to bug

I’ll remove mail action first, i would be surprise it would causing that.

*** edit 2019-10-18***

2019-10-17 19:44:43.651 [INFO ] [arthome.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui
*** some startup event ***
2019-10-18 04:11:28.386 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.lang.OutOfMemoryError: Java heap space

19:44 to 04:11 = 8:30 to bug

Now i removed exec

*** edit 2019-10-19***

2019-10-18 11:24:09.949 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
*** some startup event ***
2019-10-18 19:52:03.405 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.lang.OutOfMemoryError: Java heap space

11:24 o 19:52 = 8:30 to bug

next mqtt

*** edit 2019-10-19***

2019-10-19 08:42:46.013 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
    *** some startup event ***
2019-10-19 17:22:27.350 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1715177 in QueuedThreadPool[HttpClient@9aed1c]@fede15{STARTED,8<=8<=200,i=5,q=0}[ReservedThreadExecutor@14ed701{s=0/4,p=0}]

8:42 to 17:22 = 8:40

next nest

Patrick_Beauchamp · October 20, 2019, 3:09pm

For info, it has been 12 hours without the nest binding and openhab is still running without problem. It’s been a long time since i saw that!

If it’s still running till monday night, i’ll begin to re-install the other bindings. Do you think it’s a good idea?

mstormi · October 20, 2019, 3:31pm

Well, it’s still autumn. Ask again in December

Patrick_Beauchamp · October 26, 2019, 2:10pm

Instead of installing my bindings back , i’ve decided to install some monitoring using another Pi and the TIG stack (telegraf - influxdb - Grafana). I was hoping to monitor java but in the end it seems a bit over my head. I know i can do it with jolokia and telegraf but i don’t have any idea how to install that jolokia thing. I think you have to install it in the java vm or app but i’m not a developer… If someone is able to implement that in openhab someway I would love to monitor that java stuff.

https://jolokia.org/index.html

anyway here what the monitoring look like in grafana. At least i got the memory of the pi:

All credit to the Grafana dashboard # 1375

As for the java problem it’s been over a week without problem. Over the next few days i’ll reinstall my bindings and run that without nest for a while. I’ll probably won’t reinstall nest but we’ll see.

I’ll come back with the results

rlkoshak · October 26, 2019, 3:44pm

If you go with Prometheus instead of Telegraph there is a binding that gives you all sorts of information about OH internals. See New Add-on bundle for Prometheus health Metrics

Patrick_Beauchamp · October 27, 2019, 2:22pm

I saw prometheus later after some research on snmp for my Q-nap, I’ll give it a try thanks.