OpenHAB Periodically Unresponsive on Port 8080 and 22

  • Platform information:
    • Hardware: Raspberry Pi 3 Model B Rev 1.2 / 1GB RAM, 2GB swap / running off HDD
    • OS: Raspbian GNU/Linux 11 (bullseye)
    • Java Runtime Environment: openjdk version “11.0.16”
    • openHAB version: 3.4.1 (originally an OpenHABian build)

This morning around 02:23, my OpenHAB server stopped responding on port 8080. I could ping the server but could not SSH into the server. Telnet could connect to the ports but got no response. I was seeing this about once a month but after switching to OpenJDK and OpenHAB 3.4.1, the problem seemed to be less frequent. However, the server last hung on April 10

I power-cycled the server, and checked kern.log.1, messages.1, and syslog.1 - I found nothing useful. In fact, the last entries were from May 20 or 21, yet the server was working the lights last night. The OpenHAB events log shows normal activity up to 01:42 this morning. I run a Cloud Connector check within OpenHAB every fifteen minutes - the one scheduled for 01:45 did not happen. However, Apache appears to have responded on port 8080 at 01:53 (monitors every 15 minutes, alerts after two failures).

I have been capturing hourly memory and other status information but nothing jumps out at me. It should have run at 02:00 this morning but the last data at 01:00 this morning shows:

Tue 23 May 2023 01:00:01 AM EDT
               total        used        free      shared  buff/cache   available
Mem:          845576      432412       50968         504      362196      349612
Swap:        2097148      267432     1829716
MemTotal:         845576 kB
MemFree:           50968 kB
MemAvailable:     349612 kB
Buffers:           78912 kB
Cached:           243580 kB
SwapCached:        28300 kB
Active:           334340 kB
Inactive:         358664 kB
Active(anon):     181196 kB
Inactive(anon):   187960 kB
Active(file):     153144 kB
Inactive(file):   170704 kB
Unevictable:          16 kB
Mlocked:              16 kB
SwapTotal:       2097148 kB
SwapFree:        1829716 kB
Dirty:               296 kB
Writeback:             0 kB
AnonPages:        353644 kB
Mapped:            31600 kB
Shmem:               504 kB
KReclaimable:      39704 kB
Slab:              71080 kB
SReclaimable:      39704 kB
SUnreclaim:        31376 kB
KernelStack:        5472 kB
PageTables:         3976 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2519936 kB
Committed_AS:     772368 kB
VmallocTotal:   262930368 kB
VmallocUsed:       12868 kB
VmallocChunk:          0 kB
Percpu:              768 kB
CmaTotal:         262144 kB
CmaFree:            1148 kB

After the reboot, the values are:

[09:28:04] root@openhabian:/home/openhabian# free
               total        used        free      shared  buff/cache   available
Mem:          845576      457936       53840        3708      333800      327456
Swap:        2097148        2064     2095084
[09:28:15] root@openhabian:/home/openhabian# cat /proc/meminfo
MemTotal:         845576 kB
MemFree:           51332 kB
MemAvailable:     325816 kB
Buffers:            7460 kB
Cached:           302436 kB
SwapCached:          116 kB
Active:           178744 kB
Inactive:         531636 kB
Active(anon):       4412 kB
Inactive(anon):   398788 kB
Active(file):     174332 kB
Inactive(file):   132848 kB
Unevictable:          16 kB
Mlocked:              16 kB
SwapTotal:       2097148 kB
SwapFree:        2095084 kB
Dirty:                28 kB
Writeback:             0 kB
AnonPages:        400488 kB
Mapped:            84984 kB
Shmem:              3708 kB
KReclaimable:      24716 kB
Slab:              53488 kB
SReclaimable:      24716 kB
SUnreclaim:        28772 kB
KernelStack:        5248 kB
PageTables:         3588 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:     2519936 kB
Committed_AS:     701984 kB
VmallocTotal:   262930368 kB
VmallocUsed:       12736 kB
VmallocChunk:          0 kB
Percpu:              768 kB
CmaTotal:         262144 kB
CmaFree:           25588 kB

CmaFree at 1148 kB stuck out, but it had been lower earlier.

Does anyone see something that might explain the hang? As a workaround, I can reboot the server every two weeks or so.

what about the more recent files without ending in .1 ?

My bad - I have been spending too much time looking at OpenHAB logs that rotate when OpenHAB restarts, and vi/vim is jumping partway into log files on open. Nothing unusual in kern.log/messages and the boot sequence looks fine. The last syslog entries:

May 23 01:04:16 openhabian rngd[566]: stats: bits received from HRNG source: 30760064
May 23 01:04:16 openhabian rngd[566]: stats: bits sent to kernel pool: 30698944
May 23 01:04:16 openhabian rngd[566]: stats: entropy added to kernel pool: 30698944
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2 successes: 1537
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2 failures: 1
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2(2001-10-10) Monobit: 0
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2(2001-10-10) Poker: 0
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2(2001-10-10) Runs: 0
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2(2001-10-10) Long run: 1
May 23 01:04:16 openhabian rngd[566]: stats: FIPS 140-2(2001-10-10) Continuous run: 0
May 23 01:04:16 openhabian rngd[566]: stats: HRNG source speed: (min=623.922; avg=974.819; max=988.874)Kibits/s
May 23 01:04:16 openhabian rngd[566]: stats: FIPS tests speed: (min=19.889; avg=26.632; max=51.971)Mibits/s
May 23 01:04:16 openhabian rngd[566]: stats: Lowest ready-buffers level: 2
May 23 01:04:16 openhabian rngd[566]: stats: Entropy starvations: 0
May 23 01:04:16 openhabian rngd[566]: stats: Time spent starving for entropy: (min=0; avg=0.000; max=0)us
May 23 01:05:01 openhabian CRON[27188]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 23 01:15:01 openhabian CRON[27510]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 23 01:17:01 openhabian CRON[27580]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
May 23 01:25:01 openhabian CRON[27848]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 23 01:35:01 openhabian CRON[28175]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
May 23 01:42:05 openhabian systemd[1]: Starting Daily apt download activities...

The FIPs failures started on May 19. “Daily apt download” has been running without incident throughout the day.

So far, the issue always happens early in the morning. I have seen “debian-sa1” in January and last December but about an hour earlier (standard time vs. Daylight Savings?). Once the last message was “Starting Flush Journal to Persistent Storage…”.

I am not seeing this on another Pi 3B running standard Raspbian so I suspected some sort of a Java memory/heap issue. Last December, I updated /etc/default/openhab but that did not help.

# updated based on https://community.openhab.org/t/solved-increasing-ram-for-openhab/123957/6
#EXTRA_JAVA_OPTS="-Xms16m -Xmx256m"
EXTRA_JAVA_OPTS="-Xms250m -Xmx350m"

I have made no progress figuring out why my OpenHABian server hangs periodically. I ended up enabling watchdog - this morning at 00:18, my OpenHABian server rebooted itself (the last reboot was June 17) without any intervention on my part.

Enabling watchdog turned out to be quite easy. Everything was pre-installed - I just had to edit /etc/systemd/system.conf and change:

#RuntimeWatchdogSec=0

to:

RuntimeWatchdogSec=15

It is hard to tell if watchdog is running - the wdctl fails with

[08:45:01] root@openhabian:~# wdctl
wdctl: cannot read information about /dev/watchdog0: No such file or directory

However dmesg shows that it is loaded and set to the correct RuntimeWatchdogSec option:

[08:41:59] root@openhabian:~# dmesg -T | grep watchdog
[Sun Jul 30 00:19:01 2023] bcm2835-wdt bcm2835-wdt: Broadcom BCM2835 watchdog timer
[Sun Jul 30 00:20:25 2023] systemd[1]: Using hardware watchdog 'Broadcom BCM2835 Watchdog timer', version 0, device /dev/watchdog
[Sun Jul 30 00:20:25 2023] systemd[1]: Set hardware watchdog to 15s.

Before setting up watchdog, I checked for the existence of /dev/watchdog* which promptly caused a reboot. This might have been a coincidence or perhaps I triggered the underlying watchdog code before I had all the pieces enabled.