Can someone run a stack trace of the system in a broken state and upload please? Im curious as to what specific threads are causing the issue. There were some major changes to the rules engine between 2.5 and 3.0 (and in the more recent snapshots)
you may want to add instructions or a link to something that explain how to do so because some of these folks who are experiencing this issue may not know how to do so
Thanks
Valid point!
jstack -l PID > my-dump.txt
PID is the Java process ID
this is for Linux Iām guessing?
Yes. Several above mentioned raspberry pi so this should work workout issue on there.
Iāll do this as soon at is happens again.
In my case though, the system can run stable for days, even weeks, before it suddenly gets into this. Would be very interesting to know what triggers itā¦
Hereās the stack trace of one process that run during the freeze:
my-dump301.txt (146.0 KB)
I had the same problem in the past, even with Snapshot 3.1 (<#2165).
I upgrade to #2165 a week ago. Before the upgrade I changed this:
File: /etc/default/openhab
EXTRA_JAVA_OPTS="-Xms1024m -Xmx2048m"
I am running on a PI 4+ with 8GB and still had the problemā¦I upgraded a week ago, and did not do any more upgrades, just let the system run, and do normal setup work in rules/widgets/items, so system was not in āpause modeā.
And it run for a week now w/o any problems. Current Status:
Maybe it works for others as well (it is from an older entry here in the forum)
Hello,
I have similar problems but I donāt see anything strange in openhab Logs.
The behavious is that after some hours or days (randomly) openhab stop works, the free RAM memory decrease under 30% (at startup is more than 55% free) and SSH is not avaible anymore
Raspberry need to be hardware restarted.
Here I copy my sistem configuration, I can perform some test to better understand if the problem is linked to Java or textual .rules files ?
runtimeInfo:
version: 3.0.1
buildString: Release Build
locale: it_IT
systemInfo:
configFolder: /etc/openhab
userdataFolder: /var/lib/openhab
logFolder: /var/log/openhab
javaVersion: 11.0.9
javaVendor: Azul Systems, Inc.
javaVendorVersion: Zulu11.43+88-CA
osName: Linux
osVersion: 4.9.35-v7+
osArchitecture: arm
availableProcessors: 4
freeMemory: 40478264
totalMemory: 153206784
Iām not sure if this is related or not, but the RRD4j service seems to be using an enormous amount of CPU. I compared to my OH3 (which is running RRD4j) and I donāt see any of these threads being under load. Iāve also tuned my RRD4j down to only provide persistence for a small number of items instead of everything as the default currently has. As a test, can you disable/remove the RRD4j service and see if this problem calms down?
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-1]" #250 daemon prio=5 os_prio=0 cpu=26610.80ms elapsed=141491.92s tid=0xe109b400 nid=0x265 waiting on condition [0xd7ebd000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-2]" #251 daemon prio=5 os_prio=0 cpu=28106.98ms elapsed=141491.89s tid=0xe35af400 nid=0x266 waiting on condition [0xd7e6d000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-3]" #252 daemon prio=5 os_prio=0 cpu=27348.65ms elapsed=141491.88s tid=0xe229c800 nid=0x267 waiting on condition [0xd7e1d000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-4]" #253 daemon prio=5 os_prio=0 cpu=26975.74ms elapsed=141491.88s tid=0xe229e400 nid=0x268 waiting on condition [0xd7dcd000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-5]" #254 daemon prio=5 os_prio=0 cpu=27351.98ms elapsed=141491.87s tid=0xe35bf000 nid=0x26a waiting on condition [0xd7d7d000]
"RRD4J Sync-ThreadPool for org.rrd4j.core.RrdSyncThreadPool@1357647 [Thread-6]" #255 daemon prio=5 os_prio=0 cpu=27822.25ms elapsed=141491.86s tid=0xe0e2e800 nid=0x26b waiting on condition [0xd7d2d000]
"OH-RRD4j-1" #259 prio=5 os_prio=0 cpu=90183.52ms elapsed=141490.48s tid=0x002c8800 nid=0x26f waiting on condition [0xd7c24000]
"OH-RRD4j-2" #261 prio=5 os_prio=0 cpu=90188.39ms elapsed=141490.42s tid=0xda74a400 nid=0x272 waiting on condition [0xd99be000]
"OH-RRD4j-3" #263 prio=5 os_prio=0 cpu=90222.19ms elapsed=141488.81s tid=0xe45be000 nid=0x275 waiting on condition [0xd7bd4000]
Just for what itās worth: I am not (and have never been) using RRD4j, Iām using jdbc persistence.
Also for me currently java is still using less than 1% cpu two days after last restart of OH. So for me this is something that obviously happens very suddenly when it happens, not something that builds up over time.
Now Iām experiencing heap space exception tooā¦it seems related to the scheduler:
2021-02-04 05:04:37.680 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@166d93e' takes more than 5000ms.
2021-02-04 05:05:04.064 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
`java.lang.OutOfMemoryError: Java heap space`
My openhab is stable version 3.0.1-2 on raspberry 4 -_-
RRD4j is on by default on OH3. It must be disabled if you donāt want it.
Iām not entirely sure exactly what happens during an upgrade, but at least in my case the jdbc addon that was installed in 2.5 was also installed after upgrading to 3.0 and no other persistence addons has ever been installed.
This in not necessarily the case. This is just where the memory happened to be out.
Java will spend more and more time garbage collecting when it runs out of memory which explains the high load on the system.
I had the same problems, even with OH 2.5 and this was the reason I migrated to Groovy rules and started to develop a metric addon.
Using metrics I could see that the memory would build up over a couple of days.
With RPi 4 (at least the 4GB+ versions) the problem isnāt that present any more, but with smaller memory I had this a lot.
Different reasons can lead to the same symptoms here so we should not mingle everything. I think removing all DSL rules and see if this still occurs is the first important step.
In openHAB 3 the default strategy is to store the state of all items in a rrd4j database. As this is done under a scheduled process it can explain the high number of scheduler processes. I donāt know if this causes the out-of-memory problem.
For more on the default persistence strategy see: rrd4j - Persistence Services | openHAB
I think you are on to something. I have test cluster with one rpi3 that is not running zram. It runs out of memory and I can see a lot of rrd4j activity. I bet disabling rrd4j will make things a lot better
Same here. I was running 3.0 with file based rules without problems. Now with 3.0.1 stable I am running in Java Heap space exception every 1 or two days. My first suspicion was the MODBUS Binding, after I added it and got some pollers on my Pluggit Ventilation, the problem occured. But now I disabled all ModBus-Things, and it still happens.
From openhab.log
2021-02-04 17:15:48.748 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:16:05.475 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:16:39.171 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:17:57.072 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:18:32.636 [INFO ] [era.internal.handler.IpCameraHandler] - The alarm stream was not running for camera 192.168.75.32, re-starting it now
2021-02-04 17:20:49.560 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@85831e' takes more than 5000ms.
2021-02-04 17:21:14.167 [WARN ] [com.zaxxer.hikari.pool.HikariPool ] - 1m51s816ms328Ī¼s101ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2021-02-04 17:23:28.837 [WARN ] [hannel.AbstractChannelHandlerContext] - An exception 'java.lang.OutOfMemoryError: Java heap space' [enable DEBUG level for full stacktrace] was thrown by a user handler's exceptionCaught() method while handling the following exception:
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:29.785 [WARN ] [netty.util.concurrent.DefaultPromise] - An exception was thrown by org.openhab.binding.ipcamera.internal.handler.IpCameraHandler$2.operationComplete()
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:29.809 [WARN ] [org.eclipse.jetty.io.ManagedSelector] - java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:14.074 [ERROR] [se.californium.elements.UDPConnector] - Exception in network stage thread [UDP-Receiver-0.0.0.0/0.0.0.0:5683[0]]:
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:22:45.863 [ERROR] [org.apache.felix.fileinstall ] - In main loop, we have serious trouble
java.lang.OutOfMemoryError: Java heap space
2021-02-04 17:23:30.836 [ERROR] [io.openhabcloud.internal.CloudClient] - Error connecting to the openHAB Cloud instance
2021-02-04 17:23:32.098 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 710d4355-00d2-412a-a507-adca5dec9cce, base URL = http://localhost:8080)
2021-02-04 17:23:32.112 [INFO ] [y.internal.handler.ShellyBaseHandler] - shellyswitch25-10a5fa: Thing goes OFFLINE: message.offline.status-error-watchdog
2021-02-04 17:23:33.210 [INFO ] [y.internal.handler.ShellyBaseHandler] - shellyswitch25-f3fb9b: Thing goes OFFLINE: message.offline.status-error-watchdog
2021-02-04 17:23:44.767 [INFO ] [.reconnect.PeriodicReconnectStrategy] - Try to restore connection to '127.0.0.1'. Next attempt in 60000ms
2021-02-04 17:23:47.925 [INFO ] [.transport.mqtt.MqttBrokerConnection] - Starting MQTT broker connection to '127.0.0.1' with clientid 2e54a7bd-7dc3-44f0-9075-1fc7940ef0d5
2021-02-04 17:23:59.092 [WARN ] [era.internal.handler.IpCameraHandler] - !!!! Camera possibly closed the channel on the binding, cause reported is: Java heap space
2021-02-04 17:24:02.184 [WARN ] [com.zaxxer.hikari.pool.HikariPool ] - 3m41s182ms62Ī¼s223ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2021-02-04 17:26:26.303 [WARN ] [com.zaxxer.hikari.pool.HikariPool ] - 1m37s459ms62Ī¼s834ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2021-02-04 17:25:19.174 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = 710d4355-00d2-412a-a507-adca5dec9cce, base URL = http://localhost:8080)
2021-02-04 17:29:26.470 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.lang.OutOfMemoryError: Java heap space
And so onā¦
Quick poll. Of the people having this issue, are you on Pi2/Pi3 type systems or systems with more memory/cpu?
My system is a Proxmox VM with Buster. I gave 2 cpu(Celeron) und 2 GB memory.