Please help diagnose regular openhab crashes - java.lang.OutOfMemoryError: Java heap space

  • Platform information:
    • Hardware: Raspberry Pi 3b+
    • OS: openhabian_v1.6-755(0290f04)
    • Java Runtime Environment: openjdk version “1.8.0_252”
    • openHAB version: 2.5.6-2
  • Issue: OpenHAB unpredictably, but seemingly on a more regular basis is crashing such that I cannot access the sitemap using the iphone app or any browser. Occasionally the rules are still firing in the background but eventually they grind to a halt as well. Access to the console also ceases. The info below suggests Java heap space issue. I have searched the forums and while there is a lot of information I am feeling a bit out of my depth in diagnosing this. One of the suggestions was to set up the ability to generate a hprof file when openhab (the jvm?) crashed. I did that and the output is listed below, but I’m not sure what to make of it or if it is even relevant.
    I have a number of bindings including Astro, Australian Bureau of Meteorology (from Marketplace), Expire, HTTP, iCloud, Kodi, Log Reader, Network, NTP, Onkyo, OpenSprinkler, Samsung TV, Serial, TCP/UDP, System Info, Wake on LAN, Zwave. I have node-red, grafana and influxdb installed via openhabian menu options. I have the influxdb, mapdb and 44d4j persistence services installed.
    I haven’t offloaded the logs and am not using zram. Not sure what other information may be relevant, but I am happy to provide it.
    I know that I can increase the heap space allocated but rather than take a scattergun approach and hope something works, I thought I might try a more informed approach. Also I read that increasing the heap space is likely to delay the issue rather than fix the problem. It has also been suggested to remove then add bindings in slowly to see where memory is being used, however I am not sure how to actually track that. I tried to install Visual VM but that was a bit over my head.

Any help in working out what is going on is greatly appreciated.

 openhab2.service - openHAB 2 - empowering the smart home
   Loaded: loaded (/usr/lib/systemd/system/openhab2.service; enabled; vendor preset: enabled)
   Active: active (running) since Sat 2020-08-01 23:27:52 AEST; 23h ago
     Docs: https://www.openhab.org/docs/
           https://community.openhab.org
 Main PID: 558 (java)
    Tasks: 205 (limit: 2319)
   Memory: 440.9M
   CGroup: /system.slice/openhab2.service
           └─558 /usr/bin/java -Dopenhab.home=/usr/share/openhab2 -Dopenhab.conf=/etc/openhab2 -Dopenhab.runtime=/usr/share/openhab2/runtime -Dopenhab.userdata=/var/lib/openhab2

Aug 02 22:13:49 openhab karaf[558]: Exception in thread "Active Thread: Equinox Container: a099b0d0-8329-4576-8f55-1a038d4bbfae" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:13:52 openhab karaf[558]: Exception in thread "OH-OSGiEventManager" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:13:55 openhab karaf[558]: Exception in thread "OkHttp Dispatcher" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:13:55 openhab karaf[558]: Exception in thread "JmDNS(JmDNS-/192.168.1.3).Timer" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:14:00 openhab karaf[558]: Exception in thread "OH-discovery-2355" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:14:03 openhab karaf[558]: Exception in thread "pool-12-thread-1" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:14:03 openhab karaf[558]: Exception in thread "OH-common-1598" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:14:10 openhab karaf[558]: Exception in thread "OkHttp Dispatcher" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:15:17 openhab karaf[558]: Exception in thread "OH-discovery-2361" java.lang.OutOfMemoryError: Java heap space
Aug 02 22:36:40 openhab karaf[558]: Exception in thread "OkHttp Dispatcher" java.lang.OutOfMemoryError: Java heap space
~
~

Eclipse Memory Analyzer tool output:
this log was generated during the crash prior to the status message posted above (about 24 hours earlier) - sadly I didn’t think to keep a copy of the status message at that time and strangely when openhab crashed this evening, a new hprof wasn’t generated (not sure if I needed to delete the old one, but i have done so now in anticipation of another crash.

12,226 instances of "org.quartz.simpl.TriggerWrapper", loaded by "org.openhab.core.scheduler" occupy 116,927,904 (47.94%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "<system class loader>"

Keywords
java.lang.Object[]
org.openhab.core.scheduler
org.quartz.simpl.TriggerWrapper

 Common Path To the Accumulation Point 

Class Name	Ref. Objects	Shallow Heap	Ref. Shallow Heap	Retained Heap
org.quartz.core.QuartzSchedulerThread @ 0x6ba666b8 openHAB-job-scheduler_QuartzSchedulerThread Thread 9,999	168	239,976	1,048
\
<Java Local> org.quartz.simpl.RAMJobStore @ 0x6baaab78 9,999	72	239,976	4,666,792
.\
triggers java.util.ArrayList @ 0x6bab5b00 9,999	24	239,976	68,368
..\
elementData java.lang.Object[17083] @ 0x68e7f8d0 9,999	68,344	239,976	68,344
...+
[4832] org.quartz.simpl.TriggerWrapper @ 0x73c2fa40 1	24	24	9,568
...+
[4834] org.quartz.simpl.TriggerWrapper @ 0x73c2fa70 1	24	24	9,568
...+
[4835] org.quartz.simpl.TriggerWrapper @ 0x73c2fa88 1	24	24	9,568
...+
[8987] org.quartz.simpl.TriggerWrapper @ 0x67693d08 1	24	24	9,568
...+
[6909] org.quartz.simpl.TriggerWrapper @ 0x661e31c0 1	24	24	9,568
...+
[4838] org.quartz.simpl.TriggerWrapper @ 0x73c2fad0 1	24	24	9,568
...+
[4839] org.quartz.simpl.TriggerWrapper @ 0x73c2fae8 1	24	24	9,568
...+
[11065] org.quartz.simpl.TriggerWrapper @ 0x68b448e8 1	24	24	9,568
...+
[4841] org.quartz.simpl.TriggerWrapper @ 0x73c2fb18 1	24	24	9,568
...+
[4842] org.quartz.simpl.TriggerWrapper @ 0x73c2fb30 1	24	24	9,568
...+
[4843] org.quartz.simpl.TriggerWrapper @ 0x73c2fb48 1	24	24	9,568
...+
[4846] org.quartz.simpl.TriggerWrapper @ 0x73c2fb90 1	24	24	9,568
...+
[10091] org.quartz.simpl.TriggerWrapper @ 0x68191d18 1	24	24	9,568
...+
[11604] org.quartz.simpl.TriggerWrapper @ 0x690b9388 1	24	24	9,568
...+
[6000] org.quartz.simpl.TriggerWrapper @ 0x65895e70 1	24	24	9,568
...+
[8013] org.quartz.simpl.TriggerWrapper @ 0x66ce11e8 1	24	24	9,568
...+
[6974] org.quartz.simpl.TriggerWrapper @ 0x66288c88 1	24	24	9,568
...+
[12088] org.quartz.simpl.TriggerWrapper @ 0x695b6b68 1	24	24	9,568
...+
[11584] org.quartz.simpl.TriggerWrapper @ 0x69081990 1	24	24	9,568
...+
[6065] org.quartz.simpl.TriggerWrapper @ 0x6593b948 1	24	24	9,568
...+
[10156] org.quartz.simpl.TriggerWrapper @ 0x68237830 1	24	24	9,568
...+
[8078] org.quartz.simpl.TriggerWrapper @ 0x66d86d10 1	24	24	9,568
...+
[7039] org.quartz.simpl.TriggerWrapper @ 0x6632e760 1	24	24	9,568
...+
[5061] org.quartz.simpl.TriggerWrapper @ 0x742f5178 1	24	24	9,568
...+
[9117] org.quartz.simpl.TriggerWrapper @ 0x677df368 1	24	24	9,568
...\
Total: 25 of 9,999 entries; 9,974 more 9,999	239,976	239,976	

org.quartz.simpl.SimpleThreadPool$WorkerThread @ 0x6ba67390 openHAB-job-scheduler_Worker-3 Thread
1	144	24	20,056
\
<Java Local> org.quartz.simpl.TriggerWrapper @ 0x70e3bcd0
1	24	24	9,568

Total: 2 entries
10,000	312	240,000	

Are you using an SD card and if so how long has it been in use? Do you have power outages and is the Rpi’s power supply good? Asking because you may have a corrupted card as they do have a limited amount of read/writes. If you have an extra card and a good backup try a fresh install with the backup.

You mentioned not using Zram but if OH is running on an SD card I would recommend using it.

Just some observations here… that is quite a shopping list of bindings and services, especially running on a Pi 3b
Just as an example, my system runs on a desktop PC dedicated to OpenHAB. Couple of months ago, I got a Pi 4 to play with. When I restart the PC, OpenHAB dashboard is available as soon as I can manage to get the browser open. On the Pi, I have to wait a few minutes.
As far as the memory leak… something is causing a problem and as you have read in other threads, folks who experience this problem generally have to do a bunch of testing and experimenting to figure out the culprit. If it is a production system, unloading all the bindings and trying them one by one may not be an option. Some of the bindings you use are well know and most likely not the problem. Maybe work backwards and unload suspect bindings until the problem ceases.
Also as mentioned, back everything up and get another sd card. Usually sd wear causes odd behavior not memory leaks but who knows. Better safe though, get a backup sd card if that is what the Pi uses

Memory problems are always hard to find. Looking at the log it looks like there are a lot of triggers created. Do you have a rule that schedules something? Maybe look into that area.

1 Like

Just would like to check: do you also use the amazonechocontrol binding ?
A few weeks ago this binding was root cause for out of memory problems in OH installlations.
The problem was fixed in 2.5.7.

2 Likes

Thanks for the replies. In answer:

  • I’ll get a new SD card and try that - refreshing one before the old one fails isn’t a bad idea. I’ll probably enable zram as well, as it has been on my to-do list for a while
  • I am definitely not running the amazon echo binding
  • I do have a number of timers running for scheduling heating and irrigation. Possibly 5 timers are running at any time. These have been operational for more than a year. Last year I think I had 7 timers running with no problems. Do you think these may be causing the problems? I also have a few rules with cron-based triggers
  • yes it’s a production system but nothing critical to the operation of my house. I am happy to remove bindings etc but I’m not sure how to monitor the success or otherwise of that approach

This is not at all unreasonable - well-behave timers essentially consume no resource while waiting their appointment.

It is possible to write badly-behaved resource gobbling code for timers to execute. Just as it is possible to write badly behaved rules or bindings, only the consequences are a little different.
So treat with the same suspicion as everything else for the time being - no obvious clue here.

Looking at recent amazon-echo threads should show you the techniques people used for monitoring memory usage.