Karaf creates exceptions filling the disk, destroying parts of openhab definitions

Hi!

This is quite urgent to me because I cannot open any rollershutter anymore.

The exceptions all look like (see below). Lots of them per second.

This is openhab 3.2.0-1 on Raspbian buster.

So I

  • killed karaf
  • made space
  • apt update && apt ugrade (should not have done that)
  • rebooted
  • all things are gone
  • rebooting created a new openhab.log but the old one disappeared. openhab.log.1.gz is old. I did not find openhab in /etc/logrotate.d, who rotates them?
  • checked all still existing logs. I cannot see anything relevant prior to the exceptions.
  • srv/openhab-userdata/jsondb/org.openhab.core.thing.Thing.json had length 0, I restored it. Last modification time was in 2020
  • reboot, now I got a java error message (see below) about missing org.eclipse.smarthome.core.thing.internal.ThingImpl
  • apt install --reinstall openhab openhab-addons
  • still the same error

I compared a working backup with current state but did not yet find another relevant difference. Next I will continue with that search.

/var/log/kern.log.1 contains just one line saying

May 21 22:02:51 openhab kernel: [477561.583573] cgroup: fork rejected by pids controller in /system.slice/openhab.service

Might be relevant?

May 21 21:57:48 openhab karaf[579]: Exception in thread "EventAdminAsyncThread #11" Exception in thread "EventAdminAsyncThread #12" java.lang.NullPointerException
May 21 21:57:48 openhab karaf[579]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:217)
May 21 21:57:48 openhab karaf[579]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
May 21 21:57:48 openhab karaf[579]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
May 21 21:57:48 openhab karaf[579]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
May 21 21:57:48 openhab karaf[579]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
May 21 21:57:48 openhab karaf[579]: #011at java.base/java.lang.Thread.run(Thread.java:829)

2022-05-22 09:42:43.223 [ERROR] [re.storage.json.internal.JsonStorage] - Couldn't deserialize value 'org.openhab.core.storage.json.internal.StorageEntry@1a70599'. Root cause is: org.eclipse.smarthome.core.thing.internal.ThingImpl cannot be found by org.openhab.core.thing_3.2.0

I just found this one:

https://community.openhab.org/t/syslog-overwhelmed-with-karaf-npes/136057

He has is no answer yet. He has the same problem, only his disk did not fill up.

  • rotation is defined in / done by log4j not the system’s logrotate
  • org.eclipse.smarthome.core.thing.internal.ThingImpl doesn’t look like it would be/belong to OH 3.X

org.eclipse.smarthome namespace was changed in OH3.

What bindings are installed? Is this an upgraded installation? If so: from which version did you upgrade and how did you do that?

Bindings: Astro, Kodi, Network, Spotify, AVM Fritz, LG webOS, Network UPS, Exec, Mail, OpenHAB ZigBee, Yama Receiver, Homematic, MQTT, OpenWeatherMap, Z-Wave
Automation: Jython

Nothing new in the last few months.

I believe I never downloaded an addon outside of debian openhab-addons.

This system was originally 2.5 but I upgraded to OH3 when it became available. I probably did that as recommended, but I really do not remember.
I did not change anything except editing some rules in the last few month.

@Wolfgang_S: Will look at the log problem later. I find a lot of files with that string. Is my installation really that broken? It always worked mostly fine. See attached output of grep -arl org.eclipse.smarthome
grep.txt (3.5 KB)

I did only check at the smartphone: are these files all in the backup folder? Than don‘t care about it.

Also I see that you have two userdata folders. Which one is in use? Do you see exceptions at startup?

with

you refer to srv/openhab-userdata/ ?
I think that is a shared folder / mount for the samba export ( duplicated view of /var/lib/openhab )

From symptoms you get, it might be a system configuration error or some kind of thread leak. Given number of addons you have its hard to determine.
To determine which is faulty - disable everything. Start with plain distro and observe behavior. My personal bet is automation as this thing is custom and most likely requires adjustment.

1 Like

Meanwhile I just reloaded a 2 days old backup which fixed the problem. When it happens again I will do a backup right away before running apt upgrade, so there should be fewer differences to a working system.

Since I did not change anything about addons since a long time, I did not disable any for now. But I will install some kind of monitoring software which tells me when a log file suddenly starts growing fast. So I will hopefully be able to better find the cause.

With that in mind, I will also migrate from Raspberry to a virtual guest on my home server, so I can backup/restore the entire image. Then I can simply copy the runaway image and debug it separately.

It might be interesting to find common factors with the user I mentioned having had the same problem.

About the files with org.eclipse.smarthome: After ignoring backup and /srv, what remains is:

var/lib/openhab/config/org/eclipse/smarthome/audio.config
var/lib/openhab/config/org/eclipse/smarthome/voice.config

apt-file search for /audio.config and /voice.config finds nothing, so I deleted them and the now empty directories above.

The question why such code was executed at all remains open. I must admit I know next to nothing about Java, and I do not want to - especially since there is also kotlin.

I also deleted some empty openhab2 directories like /var/lib/openhab2, /var/log/openhab2. No idea why they were still there.

And since I regularly do full backups with restic (the very best backup solution IMHO) I see no need for openhab to create its own backups. Is there a config option to disable that (which I did not find) or can I just delete them with cron?

1 Like

About the empty jsondb/org.openhab.core.thing.Thing.json:

I found

2022-05-22 00:07:11.687 [ERROR] [re.storage.json.internal.JsonStorage] - Error writing JsonDB to /var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json. Cause No space left on device.
2022-05-22 00:07:11.733 [ERROR] [re.storage.json.internal.JsonStorage] - Error writing JsonDB to /var/lib/openhab/jsondb/backup/1653170831692--org.openhab.core.thing.Thing.json. Cause No space left on device.

and I believe that this resulted in the empty file.

Maybe the saving code does not correctly handle full disk.

But as I said - this is Java, I do not even know how to look at the source code in a .jar file and I do not want to know.

My backups say

root@speicher:/mnt/snapshots# ls -lnrt 2*/openhab/var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json | tail -5
-rw-rw-r-- 1 110 115 1446007 20. Mai 01:00 2022-05-20T01:00:22+02:00/openhab/var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json
-rw-rw-r-- 1 110 115 1446027 21. Mai 01:00 2022-05-21T01:00:22+02:00/openhab/var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json
-rw-rw-r-- 1 110 115       0 22. Mai 00:51 2022-05-22T01:00:24+02:00/openhab/var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json
-rw-rw-r-- 1 110 115       0 22. Mai 07:32 2022-05-22T08:40:18+02:00/openhab/var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json
-rw-rw-r-- 1 110 115 1446019 22. Mai 17:25 2022-05-22T17:24:17+02:00/openhab/var/lib/openhab/jsondb/org.openhab.core.thing.Thing.json
  • login to the UI as admin
  • go to Admin - settings
  • go to JSONStorage
  • there you find the the definition for the number of backups to be taken
  • try to set it to 0

Thank you!