Docker Container Crashes with my new Hardware

portboy · January 20, 2025, 6:38pm

Hi … I’m desperate.

Since upgrading my hardware, I’ve encountered a persistent issue with one of my Docker containers.
Sometimes the container crashes directly after start, sometimes after hours …
I can’t identify any function in Openhab that causes the container to crash.
The Openhab container keeps crashing with the following error:

# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x000014ad623f1b25, pid=198, tid=1495
#
# JRE version: OpenJDK Runtime Environment (17.0.12+7) (build 17.0.12+7-Debian-2deb11u1)
# Java VM: OpenJDK 64-Bit Server VM (17.0.12+7-Debian-2deb11u1, mixed mode, tiered, compressed oops, compressed class ptrs, g1 gc, linux-amd64)
# Problematic frame:
# V  [libjvm.so+0x7f1b25]  PhaseChaitin::interfere_with_live(unsigned int, IndexSet*) [clone .part.0]+0xd5

Here’s what I’ve tried so far to resolve the issue:

Removed all plugins from UNRAID
Reinstalled Docker (deleted the IMAGE disk).
Tested various versions of the Docker container for OpenHab 4.2.2 or 4.3.
Ran the Docker container from scratch, without any prior configuration.
Verified and reset permissions on the config directory.

Interestingly, all other containers on my server are functioning flawlessly. Before the hardware upgrade, Openhab Docker was rock-solid and never crashed.

But what else can I do? I’m truly at my wit’s end.

Heroe are two crashlogs:
hs_err_pid198.log (247.4 KB)
hs_err_pid25.log (186.4 KB)

Platform information: UNRAID 7.x
ASRock Industrial IMB-X1714 , Version To Be Filled By O.E.M.
American Megatrends International, LLC., Version P1.70
13th Gen Intel® Core™ i9-13900K @ 3000 MHz
32 GB RAM

apas_csc · January 21, 2025, 7:26am

This should never happen. No Java application bug is able to crash the JVM, unless there is the unlikely case of a bug in the JVM.

Is it always crashing in phasechaitin? This part of Java‘s C2 hotspot compiler, which compiles to optimized native code. It is kind of undetermined and depending on workload when it will kick in.

If the C2 compiler is the issue you could disable it and probably won’t see much of a performance difference. Use these JVM parameters:

-XX:TieredCompilation (To enable C1)
-XX:TieredStopAtLevel=1 (To disable C2)

You could also try to update OpenJDK. 17.0.13 is available since November 2024.

I’d first guess there is some kind of hardware error, e. g. memory, and if OpenHAB happens to be the biggest process or the most active it would be most likely to encounter the error. But then there would be all kind of different errors and sometimes also in other containers/processes.

DrRSatzteil · January 21, 2025, 7:41am

Since my upgrade to 4.3.2 I also had lots of VM crashes. It also happened before but I had vm crashes three days in a row the last week.

I’m not on docker though but run OH on Proxmox as LXC container. I switched from openJDK 17 to Zulu JDK 21 a couple of days ago and hadn’t had a crash since. It’s too early to say that this really solved the problem but it’s looking good right now Fingers crossed

portboy · January 21, 2025, 8:20am

@apas_csc , it’s not always phasechaitinn …
I can see relocInfo, PhiNode, PhaseLive … but always “C2 CompilerThread0”.
I attached my last error-logs.
hs_err_pid25.log (249.4 KB)
hs_err_pid198.log (247.4 KB)
hs_err_pid199.log (182.8 KB)
hs_err_pid23.log (232.3 KB)
hs_err_pid24.log (216.5 KB)

I will try your suggestion. What I’ve done yesterday I limited the container to 4 CPU cores …
Until now it’s stable …

apas_csc · January 21, 2025, 10:44am

Really looks like a bug in your OpenJDK version, which may be hardware/environment dependent. If the issue persists even with the latest version of Java 17 you’d have to open a bug at Debian since you are using their OpenJDK distribution.

Otherwise switching to Azul, which seems to be the recommended version for OpenHAB, could be an alternative. I use Adoptium Temurin and Bellsoft Liberica without issues.

portboy · January 21, 2025, 10:55am

How can I change the JVM within the container … because UNRAID don’t use java itself.

DrRSatzteil · January 21, 2025, 11:23am

I would not generally recommend to do this within the container but rather create your own image based on the official one so that you can reproduce the same setup everytime. (See docker build command)

However for a test run you could connect to your container „docker exec -it <name_of_openhab_container> /bin/bash“ and uninstall Java there (something like apt purge default-jdk or openjdk… something) and reinstall following the Zulu installation documentation. Restart the OH service after that. Make sure not to remove your modified container (docker compose down), your changes will be lost when you spin up a new container afterwards.

rlkoshak · January 21, 2025, 6:27pm

Before going too far down the rabbit hole, have you tried the alpine flavor (I think I saw you mention you are running the Debian flavor) of the official image. Running on a different base image might be just enough to avoid the problem.

portboy · January 21, 2025, 10:26pm

Thanks for all the advice
But since I limited the container to using only 4 CPU cores (instead of no limit), it seems to be stable … after many reboots … stable …

DrRSatzteil · January 24, 2025, 3:55pm

Apparently I had two VM crashes since switching to Zulu. One yesterday and one two days before that. So it’s definitely not the solution… However the container is already limited to two cores.

portboy · January 24, 2025, 4:07pm

From my point of view, the reduction to 4 cores was the solution, as these no longer crash. But they are still under observation

DrRSatzteil · January 24, 2025, 4:12pm

Maybe I should increase to four cores then