Erratic Java SIGSEGV since moving to odroid HC2

  • Platform information:
    • Hardware: Odroid HC2 (Samsung Exynos5422, Cortex-A15 2Ghz and Cortex-A7 Octa core CPUs, 2GB RAM, boot from sd card, root partition on 2.7TB HDD)
    • OS: Armbian 20.05.2
    • Java Runtime Environment: openjdk version “1.8.0_252” (Zulu 8.46.0.225-CA-linux_aarch32hf) (build 1.8.0_252-b225) / openjdk version “1.8.0_252” (build 1.8.0_252-8u252-b09-1~deb9u1-b09)
    • openHAB version: 2.5.5-1
  • Issue of the topic: Java crashes regularly with SIGSEGV

I’ve been running OpenHAB (by way of openhabian sd card image) on a Raspberry PI 3b+ for a few months now, without much problems. On the same Raspberry PI, I have running mosquito and dnsmasq to provide dhcp/dns for my LAN.

I’ve migrated the entire setup to an Odroid HC2 with a 2.7TB HDD to get rid of the SD card in the setup. Installed armbian as a base, then installed openhabian from git as per the instructions at https://www.openhab.org/docs/installation/openhabian.html#other-linux-systems-add-openhabian-just-like-any-other-software. Also have mosquito running on the same HC2, and the dnsmasq moved over as well.

However, openhab is unstable on the new setup. It will crash regularly with a SIGSEGV in java:

Jun 06 18:19:20 house.ow.sono systemd[1]: Started openHAB 2 - empowering the smart home.                                                                                     
Jun 06 18:52:20 house.ow.sono karaf[24085]: Exception in thread "items-4" java.lang.IncompatibleClassChangeError: vtable stub                                                
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at org.eclipse.smarthome.core.items.GroupItem.collectStateMembers(GroupItem.java:409)                                    
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at org.eclipse.smarthome.core.items.GroupItem.getStateMembers(GroupItem.java:402)                                        
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at org.eclipse.smarthome.core.items.GroupItem.stateUpdated(GroupItem.java:372)                                           
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at org.eclipse.smarthome.core.items.GenericItem$1.run(GenericItem.java:259)                                              
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)                                       
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)                                       
Jun 06 18:52:20 house.ow.sono karaf[24085]:         at java.lang.Thread.run(Thread.java:748)
Jun 06 20:59:19 house.ow.sono karaf[24085]: #
Jun 06 20:59:19 house.ow.sono karaf[24085]: # A fatal error has been detected by the Java Runtime Environment:
Jun 06 20:59:19 house.ow.sono karaf[24085]: #
Jun 06 20:59:19 house.ow.sono karaf[24085]: #  SIGSEGV (0xb) at pc=0x00000000, pid=24085, tid=0x9b23f470
Jun 06 20:59:19 house.ow.sono karaf[24085]: #
Jun 06 20:59:19 house.ow.sono karaf[24085]: # JRE version: OpenJDK Runtime Environment (8.0_252-b225) (build 1.8.0_252-b225)
Jun 06 20:59:19 house.ow.sono karaf[24085]: # Java VM: OpenJDK Client VM (25.252-b225 mixed mode, Evaluation linux-aarch32 )
Jun 06 20:59:19 house.ow.sono karaf[24085]: # Problematic frame:
Jun 06 20:59:19 house.ow.sono karaf[24085]: # C  0x00000000
Jun 06 20:59:19 house.ow.sono karaf[24085]: #
Jun 06 20:59:19 house.ow.sono karaf[24085]: # Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Jav
Jun 06 20:59:19 house.ow.sono karaf[24085]: #
Jun 06 20:59:19 house.ow.sono karaf[24085]: # An error report file with more information is saved as:
Jun 06 20:59:19 house.ow.sono karaf[24085]: # /var/lib/openhab2/hs_err_pid24085.log
Jun 06 20:59:19 house.ow.sono karaf[24085]: #
Jun 06 20:59:19 house.ow.sono systemd[1]: openhab2.service: Main process exited, code=killed, status=6/ABRT
Jun 06 20:59:20 house.ow.sono karaf[2349]: Can't connect to the container. The container is not running.
Jun 06 20:59:20 house.ow.sono systemd[1]: openhab2.service: Control process exited, code=exited status=1
Jun 06 20:59:20 house.ow.sono systemd[1]: openhab2.service: Unit entered failed state.
Jun 06 20:59:20 house.ow.sono systemd[1]: openhab2.service: Failed with result 'signal'.
Jun 06 20:59:25 house.ow.sono systemd[1]: openhab2.service: Service hold-off time over, scheduling restart.
Jun 06 20:59:25 house.ow.sono systemd[1]: Stopped openHAB 2 - empowering the smart home.

/var/lib/openhab2/hs_err_pid24085.log

I’ve compared a few of these hs_err files, and they seem to be all over the place. Sometimes the stack trace is empty, sometimes the stack trace has a bunch of entries, but never the same. In all cases, it seems to be at pc=0 though.

I have a few bindings running:

openhab> bundle:list | grep Binding
136 x Active x  80 x 2.5.0                   x openHAB Core :: Bundles :: Binding XML
227 x Active x  80 x 2.5.5                   x openHAB Add-ons :: Bundles :: Astro Binding
228 x Active x  80 x 2.5.5                   x openHAB Add-ons :: Bundles :: Chromecast Binding
229 x Active x  80 x 2.5.5                   x openHAB Add-ons :: Bundles :: MQTT Broker Binding
233 x Active x  80 x 2.5.5                   x openHAB Add-ons :: Bundles :: Network Binding
234 x Active x  80 x 2.5.5                   x openHAB Add-ons :: Bundles :: Onkyo Binding
235 x Active x  80 x 1.14.0                  x openHAB PanasonicTV Binding
236 x Active x  80 x 2.5.5                   x openHAB Add-ons :: Bundles :: Volvo On Call Binding
openhab>

Sometimes, it will run fine for a few hours, then crash. Sometimes it will crash within seconds of start-up or during start-up. I have yet to find a pattern.

I tried disabling all bindings, then enabling them one by one. At first, it would crash immediately after enabling the astro binding, but after two times, it would not crash anymore after re-enabling the astro binding.

I’ve tried replacing the openhabian-provided jdk with the armbian stock openjdk from apt. No difference.

I’m now trying forcing the java process to the ‘big cores’ (apparently, the cpu used in the HC2 has 4 coretex-a15’s and 4 coretex-a7’s) as per https://www.j-dimension.com/java-process-crashing-on-odroid-hc1-xu4/ (the XU4 is basically the same hardware as the HC2). Has been running for about 15 minutes now, so the jury is still out.

Anyone experienced this before? I’ve searched to forum here, and there seem to be similar cases:

There are more, but they seem to post to specific, consistent crashes or are so old that they are not relevant anymore.

Anyway, I don’t expect anyone will be able to provide a magic solution off-the-cuff, just posting in the hopes someone somewhere had this exact same issue and found a solution, or has a hint on how to further debug this issue. I’d like to continue using openhab, since migrating to something else is going to be a big pain. But if I can’t solve this issue, then this setup is not usable.

See:

@wborn thanks for getting back to me. I did see that topic (I actually linked it above). Sadly, none of the solutions there are relevant / working:

  • there’s a suggestion it is the KNX binding, I do not have that binding installed
  • there’s a suggestion to pin java to the big cores, did that, still crashing repeatedly
[11:05:37] root@house:/etc/systemd/system/openhab2.service.d# taskset -p `pidof java`
pid 31465's current affinity mask: f0
[11:05:38] root@house:/etc/systemd/system/openhab2.service.d#

Oh, I forgot to mention in the original post: I also tried cleaning out tmp and cache directories a few times, to no avail.

When it comes to clean-cache, it’s very likely that there are tons of failures when starting openHAB. openHAB should need only some restarts to get this going away. d be to only But it seems that Odroid has a severe problem with some parts of the software. If thid is only true for you or for all of them, I don’t know. :slightly_smiling_face:

Did you know that it’s possible to boot a Raspberry Pi3 without any SD-Card? Another Option would be to only boot from SD-Card, but put all other stuff to a SSD.
openHABian also provides ZRAM to prevent wearout.

@Udo_Hartmann given the rate at which it is crashing, i have had at least 10 restarts already since clearing tmp and cache.

Anyway, it does indeed seem openhab is not running well on odroid xu4 / hc2 etcetera. Not sure the problem is with the odroid though, it’s been running my nextcloud instance for a year before I repurposed it for openhab, and that ran rock-solid. Never crashed, always performed.

As for the raspberry pi, openhab is slow enough as it is without adding a usb hdd to the mix. This is another part of the reason I’m moving it to the odroid. Often, rules take 5 - 10 seconds after the triggering event to fire. Having a 5-10 second wait between pressing a button on a remote and the intended action happening is not a very good user experience.

Anyway, for now I’ve decided to give home assistant a go. So far it’s running well on the odroid, no crashes and very responsive. Still need to port over all the rules though, but I just don’t feel like debugging jvm issues is how I want to spend my free time, so getting rid of the java dependency feels like a good step. I’m not particularly fond of yaml, but happy to endure that or a better user experience. I’ve had issues with that before on the rpi 3b+, where I would get jvm crashes after updating, or just out of the blue. Usually clearing cache and tmp would solve those issues.

im also a big fan of odroid and had similar setup.

i had openHAB running on odroid HC2 for more than a year (knx+homekit+gardena+some smaller bindings). it used ubuntu and oracle jdk. it was very stable - maybe 1 crash in a month but i was also deploying unstable versions of binding on it.

then i decided to make HC2 to a NAS, added HDD, installed OMV linux, grafana, influxdb, zulu jdk. OpenHAB started to crash several times a day. moving from zulu to oracle jdk reduced a number of crashed but it was still less stable and slower than before. probably it was too much for HC2.

i moved openhab now to a dedicated Odroid N2 - no single crash since 1 month. HC2 is still running NAS, grafana, influxdb, log server and is very stable.