[Docker] chown in startup takes forever

rickard · September 30, 2017, 11:28am

Running the official OH2 Docker image (2.1.0) on RPi. There seems to be a wrapper script around the OH2 runtime, setting up users and groups.

I am not sure why this is not done during the image creation instead of at every startup. In particular,

chown -R openhab:openhab /openhab

seems to take forever (several minutes). Is there a better way of running OH2 from Docker? I want downtime to be as short as possible after reboot/upgrade. Right now, the startup script takes several minutes to complete.

rlkoshak · October 2, 2017, 2:19am

This is odd behavior and I would look more into it. There by is something odd with your setup that is causing the chown to take a long time.

This isn’t done during image creation because the volumes are not mounted into the container during image creation. It instead until you run a container that the mounted volumes are they to perform the chmod on.

On a typical install the chown is performed almost instantly so I would look into why this standard file system operation is taking a long time.

rickard · October 4, 2017, 7:08am

Is there some migration case to consider? Or why did they get the wrong owner in the first place? At least it would be nice to be able to opt out of this.

Slow RPI (2) on slow media, and lots of files. I don’t know what “normal” is, but just doing “find .” in the directory takes time (maybe 20 secs after a cold boot). File system and media looks healthy for all I can see. Volume is mounted as ./data through Compose.
The rest of OpenHAB boots in less than a minute, which is of course not fast but reasonable given the performance of the RPi2.

rlkoshak · October 4, 2017, 2:32pm

OH’s start script fixes the permissions of all of its main files (i.e. files in conf and userdata) so that the openhab user can read and write to them. This was implemented to mitigate the hundreds of file permission problems people used to inundate the forum with because they created a conf file using their regular user and didn’t give the openhab user permission to read it. This has been the case since version 1.7 or 1.8.

You can modify the start script to skip the chown step. You would need to create a copy of start.sh and then mount that into the container over the existing start.sh. If you do so you need to be very diligent to keep the files in your conf and userdata folders so that the openhab user inside the container (i.e. user id 9001, group id 9001 by default unless you pass a different UID/GID to the container using environment variables).

It should still happen almost instantly. The only thing I can think of that would make it take longer than a few seconds even on a RPi 2 with the slowest media I can think of is if the media itself is failing.

A chown shuld take no longer than that. But the fact it takes 20 secs in just that directory is also anomolous. I’d expect 20 secs for the entire /, not just the OH folder.

This is a reasonable amount of time.

For comparison, on my RPi 1 B it takes less than 20 seconds to run a find on the entire file system (72484 files). And I think I’m running with a cheap class 4 SD card (i.e. not fast). On my a RPi 0W it takes 22 seconds but has more than 10k more files.

rickard · October 4, 2017, 9:59pm

I investigsted some more and it seems the performance hit is only visible when I am inside the container, and only the first time. Once chmod or find is run, subsequent command are instant. Rebooting the pi and running find/chmod outside of the container takes about a second.

In addition to volumes, I also added tmpfs mounts (e.g. /openhab/userdata/tmp:exec,mode=777), but I don’t see how that can affect things.

I need to investigate a bit more it seems.

rlkoshak · October 4, 2017, 10:37pm

It is interesting that the problem appears in Docker. I don’t know what extra overhead is caused be the magic behind the scenes that Docker must go through when mounting a volume into the container. Maybe an RPi 2 isn’t powerful enough to run OH inside Docker. I know it is plenty powerful enough to run OH natively though, tons of people do it.

Very odd and interesting problem.

I wouldn’t expect the tmpfs mounts to make much of a difference… unless you are running out of RAM and maybe your folder is being sent down to swap space. Maybe the cost of bringing the files from swap back into RAM and then piping them through Dockers file system magic is the cause. Just idle speculation really.

novalis111 · May 15, 2018, 11:51am

For the record, this is a problem with the docker storage engine (overlayfs2) and its poor performance on recursive updates and other operations. Read about it here. There is no cure in sight atm.

polychronov · March 28, 2021, 7:49am

Just wanted to bring this one up too. I’m testing docker container on a NUC with SSD where the persistence layer is an nfs export, mounted from my Synology NAS on a NUC local folder and I experience exactly the same issue.
If I use directly attached SSD the container starts much faster.
What I noticed though is that the chown takes the same huge time also out of the container which means it’s not the docker issue mentioned above in my opinion.
I guess it’s caused by the tmp and cache folders and the many files inside. Will investigate this further and will try to create something like a mixed setup, i.e. maybe to point the cache and tmp to local folders and the rest to be on the NFS.

Cheers,
K.

polychronov · March 28, 2021, 8:01am

Hi,
I’ve found the root cause. It is miio binding maps retrieved from vacuum cleaner that makes lots of files which is later on hard to do chown on.
More info shared here: Xiaomi Robot Vacuum Binding - #1840 by polychronov

Cheers,
Konstantin