openHAB in Docker won't start all of a sudden

I’m sick and jet lagged so not exactly thinking my best today. I’m hoping someone can see what I’ve done or messed up.

My before state:

  • Ubuntu 16.04 host
  • Latest Docker
  • Official Docker image from Docker Hub

I noticed that the base OS had a bunch of security updates so I ran my update Ansible playbook which updates the host and all of my Docker images. The pull of the openHAB image failed because the tags have changed, unsurprising.

So I changed my tag to use the 2.1.0-snapshot tag. OK would not restart. Rather than try to figure out what was wrong I just wiped out my userdata folder and tried to let ti recreate it. Didn’t work and did not attempt to recreate the userdata contents.

So I changed to the 2.0.0 tag and restored the userdata folder. Still wouldn’t start. Tried the delete of the userdata again and it still doesn’t start.

OH doesn’t start up enough to generate anything into the logs or connect to the console.

The logs I see when userdata is there (I have docker configured to send sysout to syslog):

Feb  3 12:09:51 chimera kernel: [835898.429815] aufs au_opts_verify:1597:dockerd[1688]: dirperm1 breaks the protection by the permission bits on the lower branch
Feb  3 12:09:51 chimera dockerd[1221]: time="2017-02-03T12:09:51.282544498-07:00" level=warning msg="Your kernel does not support swap memory limit."
Feb  3 12:09:51 chimera dockerd[1221]: time="2017-02-03T12:09:51.282629113-07:00" level=warning msg="Your kernel does not support cgroup rt period"
Feb  3 12:09:51 chimera dockerd[1221]: time="2017-02-03T12:09:51.282654112-07:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Feb  3 12:09:51 chimera 04ce4fc366e1[1221]: Launching the openHAB runtime...
Feb  3 12:09:51 chimera 04ce4fc366e1[1221]: karaf: KARAF_ETC is not valid: /openhab/userdata/etc
Feb  3 12:09:51 chimera dockerd[1221]: time="2017-02-03T12:09:51.586181097-07:00" level=error msg="containerd: deleting container" error="exit status 1: \"container 04ce4fc366e19d60789aabbc4e63ec5b61d882d41c292677b3109617d23bf308 does not exist\\none or more of the container deletions failed\\n\""

When my old and previously working userdata is present I see the following in the logs:

Feb  3 12:12:23 chimera kernel: [836050.310213] aufs au_opts_verify:1597:dockerd[6814]: dirperm1 breaks the protection by the permission bits on the lower branch
Feb  3 12:12:23 chimera dockerd[1221]: time="2017-02-03T12:12:23.141370527-07:00" level=warning msg="Your kernel does not support swap memory limit."
Feb  3 12:12:23 chimera dockerd[1221]: time="2017-02-03T12:12:23.141816366-07:00" level=warning msg="Your kernel does not support cgroup rt period"
Feb  3 12:12:23 chimera dockerd[1221]: time="2017-02-03T12:12:23.141848277-07:00" level=warning msg="Your kernel does not support cgroup rt runtime"
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: Launching the openHAB runtime...
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: WARN: file:/openhab/userdata/etc/config.properties is not found, so not loaded
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: null
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: WARN: file:/openhab/userdata/etc/config.properties is not found, so not loaded
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: Error occurred shutting down framework: java.lang.NumberFormatException: null
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: java.lang.NumberFormatException: null
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: #011at java.lang.Integer.parseInt(Integer.java:542)
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: #011at java.lang.Integer.parseInt(Integer.java:615)
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: #011at org.apache.karaf.main.ConfigProperties.<init>(ConfigProperties.java:208)
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: #011at org.apache.karaf.main.Main.updateInstancePidAfterShutdown(Main.java:226)
Feb  3 12:12:23 chimera 04ce4fc366e1[1221]: #011at org.apache.karaf.main.Main.main(Main.java:191)
Feb  3 12:12:23 chimera dockerd[1221]: time="2017-02-03T12:12:23.87504043-07:00" level=error msg="containerd: deleting container" error="exit status 1: \"container 04ce4fc366e19d60789aabbc4e63ec5b61d882d41c292677b3109617d23bf308 does not exist\\none or more of the container deletions failed\\n\""

Note the warning about userdata/etc/config.properties not existing. This file does in fact exist. The NumberFormatException appears to be happening after some other non-reported error which causes karaf to fail to start or decide to shut down.

Unfortunately I can’t get the image to come up and run long enough to attach to the container and explore the file system to see if something else weird is going on.

I’d really like to keep running OH in Docker as it will make my future plans for backup, deployment, and eventual migration from an old laptop to a home lab type configuration easier. For the short term I’m going back to apt-get though as right now nothing works at all.

I welcome any insights.

If the userdata folder is not being recreated, then I would first check to make sure that your volumes are correctly mapped.

If you’re sure that the userdata/etc/config.properties exists (and I assume you have checked your volume mapping in the docker run command), I would run a bash shell in the container (whilst it is running), and then check that the file is actually visible from inside the container, for the logged in user.

Other than that, it may be worth building your own image using the Dockerfile from git. That way, you’ll at least eliminate any issues with the downloaded image. If that doesn’t work, you could modify the Dockerfile to run /bin/bash instead of the openHAB launcher. Once in the shell, you can manually try to run openHAB and figure out where its falling over.

I folders that get passed in as volumes have not changed and the script that I run to start the container hasn’t changed so I’m as certain as I can be without the ability to check from inside the container that the volumes are mapped correctly.

The problem is I can’t keep it running long enough to check. It just goes into a restart loop until it finally gives up. When I try I get “the container is currently restarting, try again when it is started” or the like.

Building my own image is the next thing I’ll have to try, but I need to get something running before I play around with that.

I’m going from memory, but you should be able to override the entrypoint in your docker run command, e.g. docker run -v /opt/openhab:/openhab -it --entrypoint=/bin/bash openhab/openhab etc

This will run a container without starting openHAB, just giving you a shell prompt. Your container will then not constantly restart as openHAB is not being launched.

Ok now I think I’m getting somewhere. Your hint about overriding the entrypoint worked like a charm. Clearly I’m just intermediate skill with Docker right now.

OK, so I think there has been a change to the image recently. The openhab user inside the container used to be UID 1000. That wasn’t a problem as user 1000 often maps to the first “real” user on the system and indeed did on my system. I have not yet gone through the effort to force it to run as my created openhab user.

The change is now the openhab user inside the container is running as UID 9001 and no longer has permissions on the userdata and config folders largely because there is no user 9001 on my host system I think. At least that is my going position as I did actually give full permissions to those folders so any user should be able to read and create files there. I saw an error like this on another thread and was confused then but didn’t have time to look into it more.

I just tried to run with the --user option but that doesn’t appear to override the user. It is still using 9001 inside the container. It won’t be that difficult of me to change the uid of my existing openhab user but it seems like I should be able override that and make it run using my UID instead. Of course, it could be a failure of ansible passing the user ID like it should (though it works for my other containers).

I do notice that there has been a lot of updates to the documentation as well (libpcap support now woohoo!).

I do wish this sort of breaking change to the container would be announced here.

I’ll map my openhab user to 9001 for now and look into why I can’t use an arbitrary uid later. Or maybe I’ll just run as root so I can get libpcap support and maybe dhcplisten support on the network binding as well.

Thanks for the help! I have a way forward now.

EDIT:

I did a little looking into it a little and I see no reason that the -u argument in the Docker command would not be working. The Docker docs clearly state that the -u option on the command like will override the user specified in the Docker file. Why it is not is a mystery to me.

But mapping my openhab:openhab user on the host to 9001 fixed things and now it boots as it should.

Glad to hear that you’ve figured out the issue. I have come across this user permissions issue before, and overriding with --user=root didn’t work. I didn’t spend much time on it at the time, but I recall that there were some explicit folder permissions being set in the openHAB launcher shell script that was copied in into the image for the openhab user - i.e. they were ignoring the user that docker was running under.

Because of this, and also a few customisation needs (i.e. I have support for various scripts I use built into my image) I always build my own openHAB image. I keep a ‘base’ image with all my custom bits, and then roll my openHAB image based on this base, which just installs the latest openHAB (downloading via wget, so I can control exactly what is happening). Takes seconds to rebuild a new image.

Apparently there is an environment variable you have to pass through as well as using the -u option. I’m about to test this but that is what the response was to my issue on github. More to folow.

I know I could create my own image but given that I’ve written and maintain the Docker installation docs I feel like I should use the official image so that the documents match what users will see.

More to follow when I figure out the proper way of doing things.

How do you map openhab:openhab on your host to 9001?
I have exactly the same problem. My openhab id on host is 999 and on the image 9001.

usermod -u <NEWUID> <LOGIN>    
groupmod -g <NEWGID> <GROUP>
find /mnt/config/openhab -user <OLDUID> -exec chown -h <NEWUID> {} \;
find /mnt/config/openhab -group <OLDGID> -exec chgrp -h <NEWGID> {} \;
usermod -g <NEWGID> <LOGIN>

Map is probably not the right word. I up and changed the ID on my host to 9001.

I should add that you should be able to pass on USER_ID=999 as a passed in environment variable in addition to the -u openhab:openhab option to force it to use your UID for openhab. Since I moved my user I haven’t been able to test this yet.

Seems like passing in -e USER_ID=999 doesn’t solve the problem, I still get the same error.

same for me here …
Docker doesn´t start …

Log says:

Launching the openHAB runtime…
stdout
11:28:59
karaf: KARAF_ETC is not valid: /openhab/userdata/etc

docker start is looping then with status “restarting…”

additional note:
When i try to start the image with that “entrypoint” hint from above it shows me:

docker run -v /volume1/Synology-Infrastructure/openHAB2/conf/:/openhab -it --entrypoint=/bin/bash openhab/openhab:2.0.0-amd64
bash: /openhab/.bashrc: Permission denied

You need to make sure that the permissions on all the files and folders mapped into the container are read/write able by UID 9001. All folders need to be executable by UID 9001 as well.

And heres the problem … I´m running the thing on a synology nas where i can´t create a user with a specific uid…
and the USERID variable for the docker doesn´t work :frowning:

You can take the Dockerfile and change the uid that gets built into the image and build your own I guess. Not sure what else can be done.

when i use -u option openhab starts to start up but after 2 secs this appears :frowning:

docker run -u 1035 --name openhab --net=host -v /etc/localtime:/etc/localtime:ro -v /volume1/Synology-Infrastructure/openHAB2/addons:/openhab/addons -v /volume1/Synology-Infrastructure/openHAB2/conf:/openhab/conf -v /volume1/Synology-Infrastructure/openHAB2/userdata:/openhab/userdata 5ad8888d9abb
Launching the openHAB runtime…
Unable to update instance pid: Unable to create directory /openhab/runtime/instances

That is still a permisisons issue. everything in /volume1/Synology-Infrastructure/openHAB2/userdata must be read/write for user 1035 and all folders must have execute for user 1035.