OH runs out of memory

chill.kroete · December 18, 2022, 6:41pm

Hi,
since switching to a containerized version of OH with Docker, I encounter frequent out-of-memory scenarios.
While this might be related to JavaScript memory/stability issues - #2 by J-N-K, the fix by florian-h05 should’ve fixed that cause for a memory leak in the M6.

Platform information:
- Hardware: Raspi 4, 2GB
- OS: Debian GNU/Linux 11 (bullseye)
- Java Runtime Environment: Docker container
- openHAB version: 3.0.4 M6 (Build #3227)

Add-ons:

150 │ Active │  80 │ 3.4.0.202212150739     │ openHAB Core :: Bundles :: Marketplace Add-on Services
151 │ Active │  80 │ 3.4.0.202212150740     │ openHAB Core :: Bundles :: Community Marketplace Add-on Service :: Karaf
251 │ Active │  80 │ 3.4.0.202212160328     │ openHAB Add-ons :: Bundles :: Automation :: JavaScript Scripting
252 │ Active │  80 │ 3.4.0.202212160331     │ openHAB Add-ons :: Bundles :: Astro Binding
253 │ Active │  80 │ 3.4.0.202212160333     │ openHAB Add-ons :: Bundles :: Chromecast Binding
254 │ Active │  80 │ 3.4.0.202212160334     │ openHAB Add-ons :: Bundles :: Daikin Binding
255 │ Active │  80 │ 3.4.0.202212160335     │ openHAB Add-ons :: Bundles :: Dresden Elektronik deCONZ Binding
256 │ Active │  80 │ 3.4.0.202212160336     │ openHAB Add-ons :: Bundles :: Doorbird Binding
258 │ Active │  80 │ 3.4.0.202212160343     │ openHAB Add-ons :: Bundles :: Homematic Binding
259 │ Active │  80 │ 3.4.0.202212160343     │ openHAB Add-ons :: Bundles :: HTTP Binding
261 │ Active │  80 │ 3.4.0.202212160351     │ openHAB Add-ons :: Bundles :: Mail Binding
263 │ Active │  80 │ 3.4.0.202212160356     │ openHAB Add-ons :: Bundles :: Netatmo Binding
264 │ Active │  80 │ 3.4.0.202212160356     │ openHAB Add-ons :: Bundles :: Network Binding
265 │ Active │  80 │ 3.4.0.202212160356     │ openHAB Add-ons :: Bundles :: Network UPS Tools Binding
266 │ Active │  80 │ 3.4.0.202212160359     │ openHAB Add-ons :: Bundles :: OpenWeatherMap Binding
267 │ Active │  80 │ 3.4.0.202212160408     │ openHAB Add-ons :: Bundles :: Sonos Binding
268 │ Active │  80 │ 3.4.0.202212160409     │ openHAB Add-ons :: Bundles :: Telegram Binding
271 │ Active │  80 │ 3.4.0.202212160417     │ openHAB Add-ons :: Bundles :: IO :: openHAB Cloud Connector
272 │ Active │  80 │ 3.4.0.202212160418     │ openHAB Add-ons :: Bundles :: Persistence Service :: MapDB
273 │ Active │  80 │ 3.4.0.202212160418     │ openHAB Add-ons :: Bundles :: Persistence Service :: RRD4j
274 │ Active │  75 │ 3.4.0.202212160418     │ openHAB Add-ons :: Bundles :: Transformation Service :: JavaScript
275 │ Active │  75 │ 3.4.0.202212160419     │ openHAB Add-ons :: Bundles :: Transformation Service :: JSonPath
276 │ Active │  75 │ 3.4.0.202212160340     │ openHAB Add-ons :: Bundles :: Transformation Service :: Map
277 │ Active │  75 │ 3.4.0.202212160419     │ openHAB Add-ons :: Bundles :: Transformation Service :: RegEx
280 │ Active │  80 │ 3.4.0.202212160421     │ openHAB Add-ons :: Bundles :: Voice :: VoiceRSS Text-to-Speech

Logs:
Below are some log messages that appear various times before the out-of-memory state is reached and the OH instance doesn’t do anything anymore:

HTTP binding encounters timeouts:

2022-12-18 17:09:16.161 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://pizero/api/0C894FF2A2/lights/9' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 5000 ms elapsed
2022-12-18 17:09:28.246 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://pizero/api/0C894FF2A2/lights/12' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 5000 ms elapsed

Cloud connection breaks (logged about every 3 minutes):

2022-12-18 17:10:11.236 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = db...50, base URL = http://localhost:8080)
2022-12-18 17:11:16.132 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = db...50, base URL = http://localhost:8080)
2022-12-18 17:11:35.124 [WARN ] [io.openhabcloud.internal.CloudClient] - Error during communication: EngineIOException xhr poll error
2022-12-18 17:11:37.173 [WARN ] [io.openhabcloud.internal.CloudClient] - Socket.IO disconnected: transport error
2022-12-18 17:11:37.229 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = db...50, base URL = http://localhost:8080)

This indicates the beginning of the end:

2022-12-18 17:13:48.425 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@72739a31' takes more than 5000ms.
2022-12-18 17:14:01.240 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@72739a31' takes more than 5000ms.

Not sure about this one:

2022-12-18 17:24:04.525 [ERROR] [nternal.DiscoveryServiceRegistryImpl] - Cannot notify the DiscoveryListener 'org.openhab.core.config.discovery.internal.PersistentInbox' on Thing discovered event!

Timers from rules fail (unfortunately I cannot see what rule), logged quite frequently:

.
        at java.util.Timer.sched(Timer.java:398) ~[?:?]
        at java.util.Timer.schedule(Timer.java:194) ~[?:?]
        at org.openhab.core.storage.json.internal.JsonStorage.deferredCommit(JsonStorage.java:414) ~[?:?]
        at org.openhab.core.storage.json.internal.JsonStorage.put(JsonStorage.java:160) ~[?:?]
        at org.openhab.core.config.discovery.internal.PersistentInbox.internalAdd(PersistentInbox.java:290) ~[bundleFile:?]
        at org.openhab.core.config.discovery.internal.PersistentInbox.add(PersistentInbox.java:241) ~[bundleFile:?]
        at org.openhab.core.config.discovery.internal.PersistentInbox.thingDiscovered(PersistentInbox.java:422) ~[bundleFile:?]
        at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl$1.run(DiscoveryServiceRegistryImpl.java:260) ~[bundleFile:?]
        at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl$1.run(DiscoveryServiceRegistryImpl.java:1) ~[bundleFile:?]
        at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
        at org.openhab.core.config.discovery.internal.DiscoveryServiceRegistryImpl.thingDiscovered(DiscoveryServiceRegistryImpl.java:257) [bundleFile:?]
        at org.openhab.core.config.discovery.AbstractDiscoveryService.thingDiscovered(AbstractDiscoveryService.java:251) [bundleFile:?]
        at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.createDiscoveryResult(MDNSDiscoveryService.java:227) [bundleFile:?]
        at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.considerService(MDNSDiscoveryService.java:214) [bundleFile:?]
        at org.openhab.core.config.discovery.mdns.internal.MDNSDiscoveryService.serviceResolved(MDNSDiscoveryService.java:207) [bundleFile:?]
        at javax.jmdns.impl.ListenerStatus$ServiceListenerStatus.serviceResolved(ListenerStatus.java:106) [bundleFile:3.5.8]
        at javax.jmdns.impl.JmDNSImpl$1.run(JmDNSImpl.java:911) [bundleFile:3.5.8]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]

First time the heap space is mentioned:

2022-12-18 17:28:25.711 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.lang.OutOfMemoryError: Java heap space

What follows are various errors caused by the heap space, logged very frequently at the end:

2022-12-18 18:03:02.091 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'airConditioners-5' failed: An error occurred during the script execution: Java heap space in airConditioners
...
2022-12-18 18:13:15.958 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.lang.OutOfMemoryError: Java heap space
...
2022-12-18 18:24:24.624 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'deconz-1' failed: Java heap space in deconz

Then nothing happens anymore, no logs are written, no item states updated (neither via automation nor via HabPanel or the mobile app).

Any help is greatly appreciated as it drives me crazy that I have to restart OH every time. Sometimes even the whole Raspi becomes unresponsive and I cannot SSH into it anymore, causing me to pull its plug in order to get it working again.

J-N-K · December 18, 2022, 6:59pm

It’s nearly impossible to see from the log what the problem is. A heap dump with VisualVM might give a better hint, but it’s hard to explain what to look for.

mstormi · December 18, 2022, 7:52pm

If moving to Docker is all you changed, the OH version cannot be the reason. The cause will be in your Docker and OS setup.
Docker means overhead, on a severely resource restricted system like any Raspi that ain’t a clever thing to do, let alone all the rest of OS setup where there’s much room for mistakes to make. Why don’t you use openHABian instead?

chill.kroete · December 18, 2022, 8:05pm

I read that OH requires about 800 MB of memory, so I thought that this might be sufficient. I came from openHABian but that was also not very stable and updates caused a lot of trouble - that’s why I decided to go with Docker. And based on the overall process so far I feel like that’s the better way to go for me (e.g. easy version switching).
So would you assume that with stronger hardware I would not encounter the issues? While it seems obvious that more RAM would mean better performance, a memory leak or some misconfiguration might still cause issues imo.

mstormi · December 18, 2022, 8:28pm

I cannot comment on your old setup other than that openHABian is known to be a stable basis
serving many happy openHAB users. Also, I haven’t seen any post of yours on this so your motivation and attempts at analyzing and fixing your issues with it (if broken at all) can’t have been all that comprehensive.

Memory usage isn’t 800 MB but variable and a complex topic of its own.
And openHABian even has a menu offering to switch versions.
Sorry for being frank, but your knowledge on this looks too superficial to me than to make trying to roll your own system look like a good idea. This is why I recommended to go with openHABian.
Good luck if you still think it’s the better way to pursue.

chill.kroete · December 18, 2022, 8:49pm

Ok, I guess there’s not much for me to achieve here.
Altough one last question remains: why is openHABian considered the better choice (besides the missing Docker overhead)? You sound like openHABian is for beginners and Docker for experts, but I cannot see why.

mstormi · December 18, 2022, 9:09pm

openHABian is the right choice for everyone including expert users.
It offers a comprehensive system setup to contain everything you need to jumpstart your installation, provides many features, it’s tested and in widespread use.
Going Docker means you have to implement (and debug) everything yourself. That approach is mostly of benefit only to some developers.

chill.kroete · December 27, 2022, 8:10pm

Hi,
I took some time and moved OH back to a designated host, Raspberry Pi 4 (2 GB RAM) with openHABian. While I did not notice any logs containing a “Java heap space” message, I can see that the service restarts on its own without any announcement. This is better because at the end the service is up most of the time, but still quite annoying to have it unavailable for a certain time at random.

I would love to receive a hint on how to debug this in order to see what causes this weird behavior.

florian-h05 · December 27, 2022, 8:21pm

Do the log files provide some information?
It might be possible that Frontail doesn’t display enough log lines to see something there, so consider to download the openhab.log and look through it.
I don’t know how experienced you are, theoretically you could attach something like VisualVM to check for heap usage etc.

chill.kroete · December 27, 2022, 8:33pm

Last lines of the old log:

2022-12-27 20:41:58.813 [INFO ] [ort.loader.AbstractScriptFileWatcher] - Loading script '/etc/openhab/automation/js/environment.js'
2022-12-27 20:42:15.427 [INFO ] [.openhab.automation.openhab-js.rules] - Adding rule: Environment: Set dark outside (event)
2022-12-27 20:42:15.442 [INFO ] [.openhab.automation.openhab-js.rules] - Adding rule: Environment: Set dark outside / Night (restart)
2022-12-27 20:42:15.454 [INFO ] [.openhab.automation.openhab-js.rules] - Adding rule: Environment: Set not dark outside (event)
2022-12-27 20:42:15.464 [INFO ] [.openhab.automation.openhab-js.rules] - Adding rule: Environment: Set night
2022-12-27 20:42:15.474 [INFO ] [.openhab.automation.openhab-js.rules] - Adding rule: Environment: Set day
2022-12-27 21:03:17.482 [INFO ] [ort.loader.AbstractScriptFileWatcher] - Loading script '/etc/openhab/automation/js/ccu3.js'

First lines of the new one:

2022-12-27 21:04:10.550 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Time zone set to 'Europe/Berlin'.
2022-12-27 21:04:10.632 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Location set to '51.40595735299591,6.649402924638057'.
2022-12-27 21:04:10.635 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Locale set to 'de_DE'.
2022-12-27 21:04:28.344 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'general.test.items'
2022-12-27 21:04:28.997 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'remotes.items'
2022-12-27 21:04:29.125 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'room.garage.items'
2022-12-27 21:04:29.164 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'radio.items'
2022-12-27 21:04:29.210 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'motion.items'
2022-12-27 21:04:29.444 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'environment.items'
2022-12-27 21:04:29.508 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'network.items'
2022-12-27 21:04:29.540 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'googleAssistant.items'
2022-12-27 21:04:29.719 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'mycroft.items'
2022-12-27 21:04:29.806 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'ccu3.items'

As you can see there is nothing indicating the restart. I just noticed that the tail (not frontail but actual tail -f command) stopped printing new lines.

Unfortunately I do not have any deeper knowledge when it comes to Java, but I will have a look at VIsualVM.

florian-h05 · December 27, 2022, 8:40pm

What is in your ccu3.js??

chill.kroete · December 28, 2022, 10:52am

I removed the ccu3.js file, but the issue persist.

-rw-r--r-- 1 openhab openhab     32K Dec 28 11:47 openhab.log
-rw-r--r-- 1 openhab openhab    5.4K Dec 28 09:53 openhab.log.1.gz
-rw-r--r-- 1 openhab openhab    6.8K Dec 28 10:10 openhab.log.2.gz
-rw-r--r-- 1 openhab openhab    5.5K Dec 28 10:24 openhab.log.3.gz
-rw-r--r-- 1 openhab openhab    5.7K Dec 28 10:42 openhab.log.4.gz
-rw-r--r-- 1 openhab openhab    5.8K Dec 28 11:01 openhab.log.5.gz
-rw-r--r-- 1 openhab openhab    7.6K Dec 28 11:21 openhab.log.6.gz
-rw-r--r-- 1 openhab openhab    5.0K Dec 28 11:42 openhab.log.7.gz

You can see it restarts multiple times per hour now and creates a new log file.
Only thing I do is moving .rules to .js. Sure there are some errors in the log now and then, but even if not, it still restarts.

chill.kroete · December 28, 2022, 12:03pm

I am starting to think it is more related to HTTP Binding, because I very often see:

2022-12-28 13:01:27.980 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://192.168.178.29:81/admin/api.php' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 3000 ms elapsed
2022-12-28 13:01:45.605 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://pizero/api/<XXX>/lights/6' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 3000 ms elapsed
2022-12-28 13:01:46.624 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@8896c4' takes more than 5000ms.
2022-12-28 13:01:59.602 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://pizero/api/<XXX>/lights/8' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 3000 ms elapsed
2022-12-28 13:02:10.589 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.metrics.EventCountMetric@8958fe' takes more than 5000ms.
2022-12-28 13:02:12.744 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://pizero/api/<XXX>/lights/5' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 3000 ms elapsed
2022-12-28 13:02:28.848 [WARN ] [p.internal.http.HttpResponseListener] - Requesting 'http://pizero/api/<XXX>/lights/10' (method='GET', content='null') failed: java.util.concurrent.TimeoutException: Total timeout 3000 ms elapsed
2022-12-28 13:02:32.123 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@8896c4' takes more than 5000ms.

SLC-Josh · December 28, 2022, 1:28pm

Quick test would be to disable the http binding temporarily to see if it continues. Also, I see in your logs that it is ending on scripts what are you calling in those scripts? Are you saving to any databases?

chill.kroete · December 28, 2022, 1:36pm

I will try.
Meanwhile I found the following in /etc/default/openhab:

EXTRA_JAVA_OPTS="-Xms192m -Xmx320m -XX:+ExitOnOutOfMemoryError"

Isn’t that drastically low? Since I am now having a dedicated host I should be able to allocate most of the 2 GB available, or not? Why is the default that low?

florian-h05 · December 28, 2022, 4:48pm

With -XmX320m (means heap max 320 MB) I am not wondering that it is unstable.
It requires some amount of heap.

chill.kroete · December 28, 2022, 8:15pm

I increased it to reasonable values and at least for the past hours everything works fine.
But the question remains why these low values were present at all. This was a fresh openHABian installation from a few days ago.

mstormi · December 28, 2022, 9:31pm

Of what version/image ? Done on what hardware ?
Current openHABian install code will only add -mx320m if the machine you install on has less than 1.5GB.

florian-h05 · December 29, 2022, 8:45am

Out of interest: Is there any reason to have not a little bit more?

mstormi · December 29, 2022, 9:00am

Yes: memory shortage will make systems with that little RAM hang rather than restart quickly. It’s already more than the default.