Unwanted reboot due to insufficient memory - All things gone

NCO · June 6, 2021, 6:24am

Platform information:
Hardware: RPi 4 8 GB and 32 GB micro SD + zwave: Aeotec Z-Stick Gen5 (ZW090) - UZB
OS: which comes with openhabian 1.6
Java Runtime Environment: Zulu 11 (standard openhabian)
openHAB version: 3.0.2 (stable)

Tonight OH3.0.2 was restarted automatically due to insufficient memory:
syslog:

Jun  6 00:20:56 homer influxd[685]: [httpd] 127.0.0.1 - openhab [06/Jun/2021:00:20:56 +0200] "GET /query?db=openhab_db&epoch=ms&q=SELECT+value+FROM+autogen.ActTime+WHERE+time+%3C%3D+%272021-06-05T21%3A20%3A56.091018Z%27+ORDER+BY+time+DESC+LIMIT+1%3B HTTP/1.1" 200 $
Jun  6 00:21:00 homer influxd[685]: [httpd] 127.0.0.1 - openhab [06/Jun/2021:00:21:00 +0200] "POST /write?db=openhab_db&rp=autogen&precision=n&consistency=one HTTP/1.1" 204 0 "-" "okhttp/3.14.4" 4e7a7b2f-c64c-11eb-893d-000000000000 111765
Jun  6 00:21:11 homer influxd[685]: [httpd] 127.0.0.1 - openhab [06/Jun/2021:00:21:11 +0200] "POST /write?db=openhab_db&rp=autogen&precision=n&consistency=one HTTP/1.1" 204 0 "-" "okhttp/3.14.4" 55373e92-c64c-11eb-893e-000000000000 69557
Jun  6 00:21:13 homer karaf[890]: #
Jun  6 00:21:13 homer karaf[890]: # There is insufficient memory for the Java Runtime Environment to continue.
Jun  6 00:21:13 homer karaf[890]: # Native memory allocation (malloc) failed to allocate 19472 bytes for AllocateHeap
Jun  6 00:21:13 homer karaf[890]: # An error report file with more information is saved as:
Jun  6 00:21:13 homer karaf[890]: # /var/lib/openhab/hs_err_pid890.log
Jun  6 00:21:16 homer kernel: [121933.655823] device eth0 left promiscuous mode
Jun  6 00:21:16 homer systemd[1]: openhab.service: Main process exited, code=exited, status=1/FAILURE
Jun  6 00:21:16 homer systemd[1]: openhab.service: Failed with result 'exit-code'.
Jun  6 00:21:21 homer systemd[1]: openhab.service: Service RestartSec=5s expired, scheduling restart.
Jun  6 00:21:21 homer systemd[1]: openhab.service: Scheduled restart job, restart counter is at 1.
Jun  6 00:21:21 homer systemd[1]: Stopping Frontail openHAB instance, reachable at http://homer:9001...
Jun  6 00:21:22 homer systemd[1]: frontail.service: Succeeded.
Jun  6 00:21:22 homer systemd[1]: Stopped Frontail openHAB instance, reachable at http://homer:9001.
Jun  6 00:21:22 homer systemd[1]: Stopped openHAB instance, reachable at http://homer:8080.
Jun  6 00:21:22 homer systemd[1]: Starting openHAB instance, reachable at http://homer:8080...

In the past I experienced java heap space issues (EOM exceptions), so I added

EXTRA_JAVA_OPTS="-Xms1024m -Xmx1024m"

to

/etc/default/openhab

the openhab.log looks like this:

2021-06-06 00:20:09.366 [INFO ] [penhab.core.model.script.+++ HEATING] - Sync for Heating started...
2021-06-06 00:20:09.372 [INFO ] [penhab.core.model.script.+++ HEATING] - Heat / WWater switch sync done
2021-06-06 00:20:45.395 [INFO ] [penhab.core.model.script.+++ HEATING] - Warm enough outside - heating skipped.
2021-06-06 00:20:55.231 [INFO ] [openhab.core.model.script.+++ SYSTEM] - Gardena Binding offline since more than 60 min! - Restart of binding done.
2021-06-06 00:20:55.643 [WARN ] [ternal.handler.GardenaAccountHandler] - Error 429 null, {"errors":[{"code":"too.many.failed.login.attempts","status":429,"title":"Too many failed login attempts for this account, please try again later","detail":"Too many failed login attempts for user xxx"}]}
2021-06-06 00:20:56.069 [INFO ] [penhab.core.model.script.+++ DEBUG: ] - IRRIGATION TIMER CHECK - LOW RAIN
2021-06-06 00:20:56.078 [INFO ] [penhab.core.model.script.+++ DEBUG: ] - IRRIGATION TIMER CHECK - Gard_Sens_Moist: 65.0
2021-06-06 00:20:56.085 [INFO ] [hab.core.model.script.+++ IRRIGATION] - Irrigation timer updated to: 0
2021-06-06 00:20:56.136 [INFO ] [openhab.core.model.script.+++ SYSTEM] - Cron Check -> cron OK - ActTime: 06.06./00:20
2021-06-06 00:21:05.897 [WARN ] [ternal.handler.GardenaAccountHandler] - Error 429 null, {"errors":[{"code":"too.many.failed.login.attempts","status":429,"title":"Too many failed login attempts for this account, please try again later","detail":"Too many failed login attempts for user xxx"}]}$
2021-06-06 00:21:11.195 [INFO ] [penhab.core.model.script.+++ WEATHER] - FlowerCare's timestamp updated by change of FlCare_SunL to 195

my events.log:

2021-06-06 00:20:53.952 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:water_control:HubAccount:xxx' changed from INITIALIZING to UNINITIALIZED
2021-06-06 00:20:53.972 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:water_control:HubAccount:xxx' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2021-06-06 00:20:53.975 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:sensor:HubAccount:xxx' changed from INITIALIZING to UNINITIALIZED
2021-06-06 00:20:53.993 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:sensor:HubAccount:xxx' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2021-06-06 00:20:53.995 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:account:HubAccount' changed from OFFLINE (COMMUNICATION_ERROR): Error 429 null, {"errors":[{"code":"too.many.failed.login.attempts","status":429,"title":"Too many failed login attempts for user xxx"}]}
2021-06-06 00:20:54.057 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:account:HubAccount' changed from UNINITIALIZED to UNINITIALIZED (HANDLER_MISSING_ERROR)
2021-06-06 00:20:55.357 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:sensor:HubAccount:xxx' changed from UNINITIALIZED (HANDLER_MISSING_ERROR) to UNINITIALIZED (BRIDGE_UNINITIALIZED)
2021-06-06 00:20:55.362 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:water_control:HubAccount:xxx' changed from UNINITIALIZED (HANDLER_MISSING_ERROR) to UNINITIALIZED (BRIDGE_UNINITIALIZED)
2021-06-06 00:20:55.376 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:account:HubAccount' changed from UNINITIALIZED (HANDLER_MISSING_ERROR) to INITIALIZING
2021-06-06 00:20:55.614 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:account:HubAccount' changed from INITIALIZING to OFFLINE (COMMUNICATION_ERROR): Error 429 null, {"errors":[{"code":"too.many.failed.login.attempts","status":429,"title":"Too many failed login attempts for user xxx"}]}
2021-06-06 00:20:55.650 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:sensor:HubAccount:xxx' changed from UNINITIALIZED (BRIDGE_UNINITIALIZED) to INITIALIZING
2021-06-06 00:20:55.681 [INFO ] [ab.event.ThingStatusInfoChangedEvent] - Thing 'gardena:water_control:HubAccount:xxx' changed from UNINITIALIZED (BRIDGE_UNINITIALIZED) to INITIALIZING
2021-06-06 00:20:58.706 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Sqrl_SnpSh' changed from raw type (image/jpeg): 55333 bytes to raw type (image/jpeg): 55270 bytes
2021-06-06 00:21:00.032 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'ActTime' changed from 06.06./00:20 to 06.06./00:21
2021-06-06 00:21:03.767 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Sqrl_SnpSh' changed from raw type (image/jpeg): 55270 bytes to raw type (image/jpeg): 55166 bytes
2021-06-06 00:21:08.728 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Sqrl_SnpSh' changed from raw type (image/jpeg): 55166 bytes to raw type (image/jpeg): 55075 bytes
2021-06-06 00:21:11.187 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FlCare_SunL' changed from 87 to 195
2021-06-06 00:21:11.201 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'FlCare_TimeSt' changed from 2021-06-06T00:16:09.216268+0200 to 2021-06-06T00:21:11.192084+0200
2021-06-06 00:21:11.500 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Heat_HzSupT' changed from 44.4 °C to 44.3 °C
2021-06-06 00:21:13.751 [INFO ] [openhab.event.ItemStateChangedEvent ] - Item 'Sqrl_SnpSh' changed from raw type (image/jpeg): 55075 bytes to raw type (image/jpeg): 55214 bytes

The beginning of the mentioned hr_err files looks like this (which does not help me):

#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 19472 bytes for AllocateHeap
# Possible reasons:
#   The system is out of physical RAM or swap space
#   In 32 bit mode, the process size limit was hit
# Possible solutions:
#   Reduce memory load on the system
#   Increase physical memory or swap space
#   Check if swap backing store is full
#   Use 64 bit Java on a 64 bit OS
#   Decrease Java heap size (-Xmx/-Xms)
#   Decrease number of Java threads
#   Decrease Java thread stack sizes (-Xss)
#   Set larger code cache with -XX:ReservedCodeCacheSize=
# This output file may be truncated or incomplete.
#
#  Out of Memory Error (allocation.cpp:46), pid=890, tid=1450
#
# JRE version: OpenJDK Runtime Environment Zulu11.45+27-CA (11.0.10+9) (build 11.0.10+9-LTS)
# Java VM: OpenJDK Client VM Zulu11.45+27-CA (11.0.10+9-LTS, mixed mode, serial gc, linux-arm)
# No core dump will be written. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#

---------------  S U M M A R Y ------------

Command Line: -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dopenhab.runtime=/usr/share/openhab/runtime -Dopenhab.userdata=/var/lib/openhab -Dopenhab.logdir=/var/log/openhab -Dfelix.cm.dir=/var/lib/openhab/config -Djava.library.pa$

Host: rev 3 (v7l), 4 cores, 7G, Raspbian GNU/Linux 10 (buster)
Time: Sun Jun  6 00:21:13 2021 CEST elapsed time: 121901.507148 seconds (1d 9h 51m 41s)

---------------  T H R E A D  ---------------

Current thread (0x7117d800):  JavaThread "DBus Worker Thread-3" [_thread_in_vm, id=1450, stack(0x67b10000,0x67b60000)]

Stack: [0x67b10000,0x67b60000],  sp=0x67b5e3b8,  free space=312k
Native frames: (J=compiled Java code, A=aot compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0x6d8584]  VMError::report(outputStream*, bool)+0x2ba4
V  [libjvm.so+0x6d9198]  VMError::report_and_die(int, char const*, char const*, std::__va_list, Thread*, unsigned char*, void*, void*, char const*, int, unsigned int)+0x21c
V  [libjvm.so+0x6d9fcc]  VMError::report_and_die(Thread*, char const*, int, unsigned int, VMErrorType, char const*, std::__va_list)+0x38
V  [libjvm.so+0x2331d0]  report_vm_out_of_memory(char const*, int, unsigned int, VMErrorType, char const*, ...)+0x78
V  [libjvm.so+0x6405c]  AllocateHeap(unsigned int, MemoryType, AllocFailStrategy::AllocFailEnum)+0x98
V  [libjvm.so+0x698ffc]  ThreadsSMRSupport::add_thread(JavaThread*)+0x64
V  [libjvm.so+0x68b044]  JavaThread::prepare(_jobject*, ThreadPriority)+0x12c
V  [libjvm.so+0x41e668]  JVM_StartThread+0x11c
J 14109  java.lang.Thread.start0()V java.base@11.0.10 (0 bytes) @ 0xb5a8e4a8 [0xb5a8e440+0x00000068]
J 14107 c1 java.lang.Thread.start()V java.base@11.0.10 (71 bytes) @ 0xb5a8f304 [0xb5a8f160+0x000001a4]
J 8100 c1 java.util.concurrent.ThreadPoolExecutor.addWorker(Ljava/lang/Runnable;Z)Z java.base@11.0.10 (289 bytes) @ 0xb4f71e18 [0xb4f71a40+0x000003d8]
J 7073 c1 java.util.concurrent.ThreadPoolExecutor.ensurePrestart()V java.base@11.0.10 (41 bytes) @ 0xb4dc10b0 [0xb4dc1020+0x00000090]
J 7101 c1 java.util.concurrent.ScheduledThreadPoolExecutor.delayedExecute(Ljava/util/concurrent/RunnableScheduledFuture;)V java.base@11.0.10 (58 bytes) @ 0xb4ddacf4 [0xb4ddac20+0x000000d4]
J 4483 c1 java.util.concurrent.ScheduledThreadPoolExecutor.schedule(Ljava/lang/Runnable;JLjava/util/concurrent/TimeUnit;)Ljava/util/concurrent/ScheduledFuture; java.base@11.0.10 (56 bytes) @ 0xb49770dc [0xb4976e60+0x0000027c]
j  java.util.concurrent.ScheduledThreadPoolExecutor.execute(Ljava/lang/Runnable;)V+6 java.base@11.0.10
j  org.openhab.binding.bluetooth.bluez.internal.BlueZPropertiesChangedHandler.handle(Lorg/freedesktop/dbus/interfaces/Properties$PropertiesChanged;)V+56
j  org.openhab.binding.bluetooth.bluez.internal.BlueZPropertiesChangedHandler.handle(Lorg/freedesktop/dbus/messages/DBusSignal;)V+5
j  org.freedesktop.dbus.connections.AbstractConnection$3.run()V+55
J 8459% c1 java.util.concurrent.ThreadPoolExecutor.runWorker(Ljava/util/concurrent/ThreadPoolExecutor$Worker;)V java.base@11.0.10 (187 bytes) @ 0xb5007a0c [0xb5007820+0x000001ec]
j  java.util.concurrent.ThreadPoolExecutor$Worker.run()V+5 java.base@11.0.10
j  java.lang.Thread.run()V+11 java.base@11.0.10
v  ~StubRoutines::call_stub
V  [libjvm.so+0x371d50]  JavaCalls::call_helper(JavaValue*, methodHandle const&, JavaCallArguments*, Thread*)+0x2b4
V  [libjvm.so+0x37054c]  JavaCalls::call_virtual(JavaValue*, Handle, Klass*, Symbol*, Symbol*, Thread*)+0x140
V  [libjvm.so+0x415fc0]  thread_entry(JavaThread*, Thread*)+0xa0
V  [libjvm.so+0x6914f8]  JavaThread::thread_main_inner()+0xf0
V  [libjvm.so+0x68e660]  Thread::call_run()+0x158
V  [libjvm.so+0x57d5b8]  thread_native_entry(Thread*)+0x108
C  [libpthread.so.0+0x6494]  start_thread+0xcc

So there was no message like in the past Java heap space
After the reboot no things are left in MainUI.
Another restart did not help to recover

With 8GB I assume to have more than enough, but how to make use of it to avoid the issue above?
Any help would be greatly appreciated.

J-N-K · June 6, 2021, 7:59am

If you run out of memory, there is a bug. Increasing the memory for the JVM will not solve that but only postpone the crash. You need to find the memory leak (probably in a binding).

Lost things: Look in the backup folder of the json-database and restore a backup of the things file from there.

NCO · June 6, 2021, 8:34am

Thanks, Jan.

That’s a valid point.

The question is how?

I remember, that I had issues with long running rules in the past (too many sleeps within rules triggered too often).
After switching to timers, this issue was (most likely) gone.

I assume I should start to check the forum for memory issues on the bindings I use.
That’s the only idea I have.

I actuall restored a previous image of the SD-card

J-N-K · June 6, 2021, 9:11am

Since you posted a log showing that: Is the gardena binding restarting very often? Maybe some resources are not properly disposed when restarting.

NCO · June 6, 2021, 9:15am

Right. Good point.
I deactivated the restart routine and will just send a Telegram if the thing goes offline for a start.