OH3: stops working; really?!

Max_G · December 7, 2022, 5:43am

openHABian 3.3.0 rPi4 4GB

Well, I recently migrated from OH2 to OH3…
A new rPi4, new OH3 install eight days ago.
One rule change 2 days ago; since then it has been pottering along

Today OH3 simply stopped working!

The terminal session was still responsive.

Checking the service:

openhab.service - openHAB - empowering the smart home
     Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/openhab.service.d
             └─override.conf
     Active: active (running) since Sun 2022-11-27 22:44:18 AEST; 1 weeks 2 days ago
       Docs: https://www.openhab.org/docs/
             https://community.openhab.org
   Main PID: 774 (java)
      Tasks: 270 (limit: 4915)
        CPU: 1d 9h 30min 2.896s
     CGroup: /system.slice/openhab.service
             └─774 /usr/bin/java -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dopenhab.runtime=/usr/share/openhab/runtime -Dopenhab.userdata=/var/lib/openhab -Dopenhab.logdir=/var/log/openhab -Dfelix.cm.dir=/var/lib/openhab/config -Djava.library.path=/var/lib/openhab/tmp/lib -Djett>

Dec 07 01:05:32 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to copy file /var/log/openhab/events.log.1.gz to /var/log/openhab/events.log.0.gz: java.nio.file.FileSystemException /var/log/openhab/events.log.0.gz: Read-only file system
Dec 07 01:05:33 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to delete 1, /var/log/openhab/events.log.1.gz: Read-only file system
Dec 07 01:05:33 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to copy file /var/log/openhab/events.log.1.gz to /var/log/openhab/events.log.0.gz: java.nio.file.FileSystemException /var/log/openhab/events.log.0.gz: Read-only file system
Dec 07 01:05:33 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to delete 1, /var/log/openhab/events.log.1.gz: Read-only file system
Dec 07 01:05:33 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to copy file /var/log/openhab/events.log.1.gz to /var/log/openhab/events.log.0.gz: java.nio.file.FileSystemException /var/log/openhab/events.log.0.gz: Read-only file system
Dec 07 01:05:34 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to delete 1, /var/log/openhab/events.log.1.gz: Read-only file system
Dec 07 01:05:34 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to copy file /var/log/openhab/events.log.1.gz to /var/log/openhab/events.log.0.gz: java.nio.file.FileSystemException /var/log/openhab/events.log.0.gz: Read-only file system
Dec 07 01:05:35 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to delete 1, /var/log/openhab/events.log.1.gz: Read-only file system
Dec 07 01:05:35 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to copy file /var/log/openhab/events.log.1.gz to /var/log/openhab/events.log.0.gz: java.nio.file.FileSystemException /var/log/openhab/events.log.0.gz: Read-only file system
Dec 07 01:05:35 openhabian karaf[774]: org.ops4j.pax.logging.pax-logging-api [log4j2] ERROR : Unable to delete 1, /var/log/openhab/events.log.1.gz: Read-only file system

… showed read-only file system.

Checking it:

# [2022-12-07 14:35] openhabian@openhabian /var/log/openhab $ 
la
total 38M
drwxrwxr-x 1 openhab openhabian 4.0K Dec  6 14:10 ./
drwxr-xr-x 1 root    root       4.0K Dec  4 00:00 ../
-rw-rw-r-- 1 openhab openhab       0 Nov 18 16:03 audit.log
-rw-r--r-- 1 openhab openhab     28M Dec  7 08:28 events.log
-rw-r--r-- 1 openhab openhab    1.3M Dec  4 03:55 events.log.1.gz
-rw-r--r-- 1 openhab openhab    1.5M Dec  4 13:01 events.log.2.gz
-rw-r--r-- 1 openhab openhab    1.4M Dec  4 23:03 events.log.3.gz
-rw-r--r-- 1 openhab openhab    1.3M Dec  5 09:34 events.log.4.gz
-rw-r--r-- 1 openhab openhab    1.5M Dec  5 18:03 events.log.5.gz
-rw-r--r-- 1 openhab openhab    1.3M Dec  6 05:00 events.log.6.gz
-rw-r--r-- 1 openhab openhab    1.4M Dec  6 14:10 events.log.7.gz
-rw-r--r-- 1 openhab openhab    780K Dec  7 09:50 openhab.log
-rw-r--r-- 1 openhab openhab     317 Nov 18 16:07 openhab.log.1.gz
-rw-r--r-- 1 openhab openhab     522 Nov 27 22:44 openhab.log.2.gz
-rwxrwxr-x 1 openhab openhab       0 Jun 27 16:19 Readme.txt*

… shows that is not the case.

We did not run out of space:

# [2022-12-07 14:33] openhabian@openhabian ~ $ 
df -h
Filesystem                                Size  Used Avail Use% Mounted on
/dev/root                                  15G  6.8G  7.0G  50% /
devtmpfs                                  1.9G     0  1.9G   0% /dev
tmpfs                                     1.9G     0  1.9G   0% /dev/shm
tmpfs                                     778M   11M  768M   2% /run
tmpfs                                     5.0M     0  5.0M   0% /run/lock
/dev/mmcblk0p1                            253M   50M  203M  20% /boot
/dev/zram1                                721M  150M  519M  23% /opt/zram/zram1
overlay1                                  721M  150M  519M  23% /var/lib/openhab/persistence
/dev/zram2                                323M   36M  263M  12% /opt/zram/zram2
overlay2                                  323M   36M  263M  12% /var/lib/influxdb
/dev/zram3                                974M  767M  140M  85% /opt/zram/zram3
overlay3                                  974M  767M  140M  85% /var/log
192.168.1.127:/volume1/backup/openhabian   21T   11T   11T  49% /media/SynologyBackup
tmpfs                                     389M     0  389M   0% /run/user/1000


# [2022-12-07 14:33] openhabian@openhabian ~ $ 
free
               total        used        free      shared  buff/cache   available
Mem:         3982300     1606436      464180       10456     1911684     2336288
Swap:        3145720           0     3145720

None the wiser I rebooted the rPi4.

This certainly does not constitute a “stable” system.

Any ideas what I need to do to prevent this from happening?

Sascha_ · December 7, 2022, 6:37am

Did you check system log / dmesg?
Seems the overlay fs has been full.

Max_G · December 7, 2022, 7:30am

Hmm, df -h shown above seems to have enough free space on the overlay drives

Unfortunately did not look and only have the current dmesg file.

# [2022-12-07 17:27] openhabian@openhabian ~ $ 
journalctl --list-boots 
 0 b57c252df1c148a6aa3f1bca117b395d Wed 2022-12-07 15:31:14 AEST—Wed 2022-12-07 17:27:58 AEST

So, if the overlay fs is full, why is it? What fills it? And what do to do about it?

binderth · December 7, 2022, 7:33am

In my experience this happens mostly, when the SD card is about to fade out.
Export your config and/or move to a new SD Card and have an eye on it.

maniac103 · December 7, 2022, 8:00am

ls doesn’t tell you about the file system status. You should rather check mount whether the fs is mounted r/o.

I’d second that guess.

Pedro_Liberal · December 7, 2022, 8:29am

And that is exactly how I lost an octoprint installation as well. Back up, restore and dump that sd card…

Max_G · December 7, 2022, 9:17am

Thank you for your input.

While this is a SanDisc brand-new card from the shop, as blister-ware, to make sure I got an orginal (actually four of those); I also ran a full write test on them to ensure all is good.

I never liked SD cards in SBCs, and prefer SSDs instead.

What is then the tenor in how to move forward?
Should I:
a) just burn the SD card onto another SD card?
b) burn the SD to a SSD?
c) rebuild the machine, but this time with apt, rather than the openHABian image
d) any other ideas?

binderth · December 7, 2022, 9:28am

There’s no need for SSD or leaving openHABian for that in my experience. I always went with option a). There’s a menu in openHABian for that - if you’re able to access the SD card that is. otherwise make sure to export your config (backup option in openHABian or CLI) and at best you have amanda-backup in place, to be sure not to lose anything, if the SD card suddenly goes blank.

on the other hand: if the SD is really brand-new (I even had openHAB instances running on cheap nonames for years), it also could have been some OS-related mishap, which made the filesystem RO. Can’t rule out openHABian/openHAB related issues in your specific use case, but seems very unlikely.

is it openHABian only? or do you have anything other installed/configured on the same raspberry?

Max_G · December 7, 2022, 9:35am

openHABian with influx/grafana installed via the config tool, but not being used yet.

This new rPi4 is supposed to be the new OH3 machine.
It runs in parallel to the current OH2 system…
But has since I installed it:
a) stopped logging and needed a rebuild as I could not fix it
b) shows random script errors
c) and today stopped working
… none of that I experienced in OH2 on an SSD in over 4 years.

Pedro_Liberal · December 7, 2022, 9:50am

My experience is completely the opposite. It’s not a question of “if” but “when” the sd card will die. Based on that, do you honestly want to carry that in the back of your mind?
Whenever you have any kind of problem with openHAB, you’ll always be wondering “if the sd card working correctly?” And just think of all of the weird errors you start getting when the file system starts dying. How many hours will you put into it before you decide “nope, gotta be the sd card?”.
Like… for openHAB you don’t even need an ssd. Just get a cheapo 2.5” laptop hdd. It works fine, and you can check that future issue out of the list.

Max_G · December 7, 2022, 10:39am

Exactly my concern!

I am currently checking if I really have a genuine card. Really?! All stuff that is not required when sticking a brand SSD in.

Well I am copying the SD card as we speak.
But will explore other options.

The SSD uses less oomph than the HDD.

Pedro_Liberal · December 7, 2022, 10:51am

Oh? What do you mean?

Edit: power draw?? I’m pretty sure we’re discussing peanuts here right? An ssd might use 50% less energy, but 50% less of 5 watts is still as insignificant as 5 watts anyway…

binderth · December 7, 2022, 11:02am

perhaps I’m lucky?
My main openHAB2 ran approx 4 years with the same SD Card (>1000 items, most of them changing at least every minute - some every 10 or 1-2 seconds). Logfiles, RRD4j persistence, etc. on that SD Card.
my main openHAB3 now runs almost two years on the same SD Card - same.

What did run into problems was my openHAB2 instance, which only had my smart meter and Zehnder comfoair (those with “physical” RS232-to-USB connections). Those had only a bunch of items, not much to do and every now and then a OH-configuration change or OS updates…
and my Pi for my magicmirror.builders.
My remote OH2 (updated to OH3) instance ran off a laptop SSD, but I reverted back to SD Card since two years. And this Pi is in an extreme situation: in winter it runs with down to -20° in the room and if someone is there the place heats up to 30° and more.

You also can use ZRAM for logging and RRD4J persistence - it’s in an openHABian menu also.

And is it really that big of a problem? Amanda runs daily, so I don’t lose anything at all. And if it should happen, then I just plug in another SD Card, which I either copy within that same Pi (you could setup “clone SD” every week or so?) or on another machine. 10mins tops.? plus like 10bucks for a new SD Card. OH (on openHABian) doesn’t need tons of GB to run.

But:
I never let grafana or any other I/O-heavy application run on the same Pi as openHAB. That’s running on my Synology.

rlkoshak · December 7, 2022, 3:10pm

OK everyone, let’s take a step back.

If @Max_G is using a default install of openHABian, and there is nothing to indicate otherwise, the file system in question (where OH logs) is in zram. That means this has nothing to do with the SD card itself. None of the zram stuff is written to the SD card except on a restart of the zram service or a reboot of the whole machine.

Typically, a zram file system will only become read only under these three conditions:

the amount of space allocated to that zram file system ran out
the amount of RAM on the machine ran out
and of course there could be something weird in the OS or the zram service to mess things up

Moving to another SD card is unlikely to change the root cause. Moving to an SSD or HDD might solve this only insofar as upon doing so there’s no really good reason to run zram so zram should be disabled, thereby disabling the part that is causing the problem.

We’ve seen that the file system didn’t run out of space (assuming that df command was run while the file system was set to read only).

However, the free command reports memory usage in KB. 464 KB free is really low. So I’m going to guess the root cause of the problem here is you’ve run out of RAM. That’s surprising given what you are running on this machine so that’s worth exploration. What’s consuming 4 GB of RAM?

mstormi · December 7, 2022, 3:30pm

Well, yes and no.
The most often encountered reason for readonly is that either of the limits defined in /etc/ztab are hit. It doesn’t necessarily mean the disk is full. There’s also a limit on compressed data size that might be inadequate especially if your data doesn’t compress well or at all.
zramctl --output-all will tell you. Given the ‘maximum’ use case with InfluxDB active, that’s not all that unlikely. I vaguely remember talks about an excessive number of items, too.
You can increase the figures in ztab if they’re too small. But pay attention to a slow, controlled increase in multiple steps.

InfluxDB isn’t really a great idea to run. rrd4j is the default for good reasons. One not so well known of these reasons is it uses a fixed amount of storage per item, no matter how long you run it. On a resource limited SBC, that’s definitely more clever than to run a full scale database.

maniac103 · December 7, 2022, 4:24pm

464180 kiB equals ~450 MiB though, and that plus the ~1.8 GiB in cached data looks like a sufficient reserve to me.

rlkoshak · December 7, 2022, 4:29pm

Yep, I did the conversion in my head wrong.

Max_G · December 7, 2022, 9:34pm

I have taken a openhab-cli --full backup (just in case).

Based on this thread, I have put the SD card back in.
I have then uninstalled grafana and influx. However, openhab complains vigorously and endlessly:

2022-12-08 06:55:02.736 [ERROR] [org.influxdb.impl.BatchProcessor    ] - Batch could not be sent. Data will be lost
org.influxdb.InfluxDBIOException: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:8086

Despite stopping the grafana and influxdb services and then uninstalling both, as outlined here: How to uninstall influxdb from Ubuntu

Any hints appreciated.

Thanks; did not know this command.
At present it shows:

# [2022-12-08 07:23] openhabian@openhabian ~ $ 
zramctl --output-all
NAME       DISKSIZE   DATA COMPR ALGORITHM STREAMS ZERO-PAGES TOTAL MEM-LIMIT MEM-USED MIGRATED MOUNTPOINT
/dev/zram3       1G    82M  9.1M zstd            4       5841 29.7M      400M    29.7M       0B /opt/zram/zram3
/dev/zram2     350M   312K  3.6K zstd            4          0  156K      150M     156K       0B /opt/zram/zram2
/dev/zram1     750M 171.7M  3.2M zstd            4          8 85.9M      300M    85.9M       0B /opt/zram/zram1
/dev/zram0       1G     4K   87B lzo-rle         4          0    4K      400M       4K       0B [SWAP]

… which is just a few minutes after openhabian reboot.
All I notice is that zram2 is fully used.

It also still shows influx in there too:

# [2022-12-08 07:25] openhabian@openhabian ~ $ 
df -h
Filesystem                                Size  Used Avail Use% Mounted on
/dev/root                                  15G  6.4G  7.4G  47% /
devtmpfs                                  1.9G     0  1.9G   0% /dev
tmpfs                                     1.9G     0  1.9G   0% /dev/shm
tmpfs                                     778M  1.7M  777M   1% /run
tmpfs                                     5.0M     0  5.0M   0% /run/lock
/dev/mmcblk0p1                            253M   50M  203M  20% /boot
/dev/zram1                                721M  144M  525M  22% /opt/zram/zram1
overlay1                                  721M  144M  525M  22% /var/lib/openhab/persistence
/dev/zram2                                323M   36K  299M   1% /opt/zram/zram2
overlay2                                  323M   36K  299M   1% /var/lib/influxdb
/dev/zram3                                974M   62M  845M   7% /opt/zram/zram3
overlay3                                  974M   62M  845M   7% /var/log
192.168.1.127:/volume1/backup/openhabian   21T   11T   11T  49% /media/SynologyBackup
tmpfs                                     389M     0  389M   0% /run/user/1000

mstormi · December 7, 2022, 9:45pm

Nope, it’s obviously the opposite.

Nope. It shows the zram filesystem for InfluxDB is still there. Remove it from /etc/ztab.

Max_G · December 7, 2022, 10:02pm

Interesting; I thought mem-used shows what is being used in the context of zram being full… also given that zram1 and 3 use less than allocated.

Anyway, I have removed influx from the ztab file. Thank you.