High IO Wait / mapdb?

Hi,

I’ve been using OpenHAB2 on a pi and switch to a fresh machine updated with OpenHAB2 with lastest snapshot available on cloudbees (20161009). I reinstalled everything from scratch.

I discovered that that IO Wait on the machine is huge. But no swap. An iotop shows me high activity on disk :

  316 be/3 root        0.00 B/s    0.00 B/s  0.00 % 76.67 % [jbd2/sda3-8]
18847 be/4 openhab     0.00 B/s  335.90 K/s  0.00 %  0.96 % java -Dopenhab.home=/usr/share/openhab2 -Dopenhab.conf=/etc/openhab2 -Dop~runtime/karaf/lib/boot/org.osgi.core-6.0.0.jar org.apache.karaf.main.Mai...

As soon as I stop Openhab2, my iowait decreases.

After Googling for tools able to show culpit, I tried fatrace, and my terminal is full of :

java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb.p
java(18599): W /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): W /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): W /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): W /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): W /var/lib/openhab2/mapdb/storage.mapdb
java(18599): R /var/lib/openhab2/mapdb/storage.mapdb.t
java(18599): W /var/lib/openhab2/mapdb/storage.mapdb.p

Any idea ? Why so much activity on the internal mapdb storage ?

Thanks and regards

Is this while being idle or only during startup?

When OpenHAB is running and startup finished.

Hum, interesting, if I stop several bundle (through console) everything is running fine.
I reenable bundle and IO wait is still fine.

I continue investigating and will report you back with more details.

Thanks @Kai

Interesting - so please let us know as soon as you have found the culprit!

Something seems wrong with the Freebox binding.

Debug log doesn’t seem not report interesting messages…

I changed my OH server about 250 days ago and just recognized that into the new SSD drive, there has been 12TB write in SSD hardware level. Server is dedicated to OH, so WTF? I started to dig this out and recognized that mapdb is constantly writing transaction file. I have pretty busy system about 1000 items and many of item is updated every couple of seconds. That’s why I have increased mapdb commit interval to 60 seconds (many years ago already). I was “hoping” that mapdb persistence service then writes file ones per every minute, but that’s not case. MapDB actually write every item change to transaction file and commit changes every minute to “real” db file. I don’t know how mapdb internally handles the transaction file (seems to be some kind of Write Ahead Log), but transaction file size grows all the time. I restarted mapdb bundle and after one hour, transaction file size before commit was around 2MB. File size increasing from 0 to 2MB and then after minute commit interval, file size start again from 0. So it means that mapdb writes at least 2MB every minute, which means ~3GB per day and ~1TB per year. Actual mapdb file size where latest values are stored is only ~30kB. So if that file is only written every minute, it means ~16GB per year, so there is huge difference.

Is this how mapdb persistence is planned to work or is there some kind bug?

Were there any updates to this? I have just moved my openbhab2 instance to a R-Pi-4B (4GB). And for some reason this kicked in on mine as well (And I don’t use mapdb for persistence myself, I use mariadb).

fatrace doesn’t (currently) work on my Pi-4 (Looking into that now), so I can’t see what it’s doing, but I don’t believe it’s anything in cache or tmp (As I have those on tmpfs) but it pegs at least 2 cores (Sometimes all 4) at 99.9% WaitIO - And of course everything becomes very stodgy at that point as there’s no bandwidth left for diskIO for anyone else.

image

Oh… It’s not consistent either. Just after snapping the above, it returned to 98% idle on all 4 cores again. It spends a bit like that then lots of waitIO again for a bit… then idle… etc…