Openhabian config with root FS on a NAS

Hello everyone,

I seem to be unable to upgrade from 3.1.0~M3-1… It is since months that I try to upgrade, I tried at least 5 or 6 times, booking several hours of time to be able to address the challenge and still no luck. Since I can’t spend days on this, I skip and hope that next version will be better… no luck so far…

The 3.1.0~M3-1 is running absolutly fine and flawlessly since months, it is fine, but I can’t stay like this for ever.

Symptoms of failed upgrades are a huge list of java errors of the kind below. It is not always this one, usually I spend hours trying for address the cause of the one I can understand, to get into another one if by chance I was able to do something that fixed it…

Of course, I always do restart openhab several times (up to 5, with 5 to 10 minutes waiting time between the restarts)…

What are the ways out? I’d like to avoid re-installing everything… what are the ways the investigate what goes wrong? Reading the forums, it seems for most people the upgrades are seamless:-(

Thanks in advance for any help !

2022-07-15 15:14:28.186 [ERROR] [Events.Framework                    ] - FrameworkEvent ERROR

org.osgi.framework.BundleException: Exception in org.apache.karaf.jaas.modules.impl.Activator.stop() of bundle org.apache.karaf.jaas.modules.
	at org.eclipse.osgi.internal.framework.BundleContextImpl.stop(BundleContextImpl.java:890) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxBundle.stopWorker0(EquinoxBundle.java:1046) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxBundle$EquinoxModule.stopWorker(EquinoxBundle.java:376) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.Module.doStop(Module.java:660) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.Module.stop(Module.java:521) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.decStartLevel(ModuleContainer.java:1888) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1763) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.SystemModule.stopWorker(SystemModule.java:275) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxBundle$SystemBundle$EquinoxSystemModule.stopWorker(EquinoxBundle.java:208) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.Module.doStop(Module.java:660) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.Module.stop(Module.java:521) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.container.SystemModule.stop(SystemModule.java:207) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.framework.EquinoxBundle$SystemBundle$EquinoxSystemModule$1.run(EquinoxBundle.java:226) ~[org.eclipse.osgi-3.17.200.jar:?]
	at java.lang.Thread.run(Thread.java:834) ~[?:?]
Caused by: java.lang.NoClassDefFoundError: org/apache/karaf/jaas/modules/ldap/LDAPCache
	at org.apache.karaf.jaas.modules.impl.Activator.doStop(Activator.java:82) ~[?:?]
	at org.apache.karaf.util.tracker.BaseActivator.stop(BaseActivator.java:108) ~[?:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:870) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:1) ~[org.eclipse.osgi-3.17.200.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl.stop(BundleContextImpl.java:862) ~[org.eclipse.osgi-3.17.200.jar:?]
	... 13 more
Caused by: java.lang.ClassNotFoundException: org.apache.karaf.jaas.modules.ldap.LDAPCache cannot be found by org.apache.karaf.jaas.modules_4.3.7
	at org.eclipse.osgi.internal.loader.BundleLoader.generateException(BundleLoader.java:529) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.loader.BundleLoader.findClass0(BundleLoader.java:524) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.loader.BundleLoader.findClass(BundleLoader.java:416) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.loader.ModuleClassLoader.loadClass(ModuleClassLoader.java:168) ~[org.eclipse.osgi-3.17.200.jar:?]
	at java.lang.ClassLoader.loadClass(ClassLoader.java:522) ~[?:?]
	at org.apache.karaf.jaas.modules.impl.Activator.doStop(Activator.java:82) ~[?:?]
	at org.apache.karaf.util.tracker.BaseActivator.stop(BaseActivator.java:108) ~[?:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:870) ~[org.eclipse.osgi-3.17.200.jar:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl$3.run(BundleContextImpl.java:1) ~[org.eclipse.osgi-3.17.200.jar:?]
	at java.security.AccessController.doPrivileged(Native Method) ~[?:?]
	at org.eclipse.osgi.internal.framework.BundleContextImpl.stop(BundleContextImpl.java:862) ~[org.eclipse.osgi-3.17.200.jar:?]
	... 13 more

How did you install openHAB?

How are you attempting to perform the upgrade?

What type of machine are you running it on?

All of those errors are coming from Karaf which is one layer below openHAB itself. Have you modified any of the config files in $OH_USERDATA/etc by chance?

Thanks Rich for jumping in.

I installed openhab with openhabian on a Raspberry 4. I have tried upgrading either via APT or via openhabian-config menus and it gets me to the same situation.

As for the config files in $OH_USERDATA/etc, I looked at them and I don’t remember having touched any of them… Should I try to do a diff with a reference set of files?

One more info, in case it would help.

I almost succeded once to upgrade (but i did so many things to try to fix - out of desesperation - that I did not move to it as my production environment, but I still have it running as a test environment) - at that time the latest was 3.3.0.M2.

I have the exact same problem on that version i.e. I can’t upgrade it any more to anything. I have the same undecipherable bunch of errors that are different depending on the number of restarts I attempt etc…

How long after that first start after the upgrade do you wait? Maybe it needs to download somethings and you are not waiting long enough.

How old is the SD card this is running on? Weird behaviors like this often point to a corrupted file system or the SD card is failing.

I wait usually 5 minutes. I’ll try to wait more, I’m pretty sure I did it at some point, but I’ll try again.

I don’t use SD cards other than for the boot partition, the system is installed on a mounted partitions from a NAS.

Well, that’s somewhat of an unusual configuration and not directly supported by openHABian. Or put another way, you aren’t really running openHABian, you are running openHABian with some additional stuff you’ve changed. Given that, our ability to help is hampered because we don’t know what else might be different and we can’t make assumptions we otherwise would be able to make. To a large extent, we can’t know if the problem is something related to OH or it’s a problem you introduced by deviating from openHABian.

I assume you are not running with ZRAM enabled. There’s no point in doing so if you’re running off of a remotely mounted file system.

Are you sure everything that needs to be mounted is mounted and read/writable?

If possible, what I would do is run an experiment. First create a fresh install of 3.1 (no configs) and run an upgrade. If that fails we know there is nothing related to your configs causing the problem and it’s definitely a problem caused by things you’ve done to openHABian. We don’t know what you’ve done so if that’s the case :woman_shrugging:

If that does work, again start from a clean and fresh install of 3.1 and this time restore a backup and make sure all is well. Then run the upgrade. If only now it doesn’t work than you know the problem is somewhere in your configs and we can maybe narrow down those until we find the problem.

I understand.

I reason I say it is an openhabian config is because I install first on SD card - stricto sensu openhabian -, then i rsync the system partition on the NAS and then restart booting from this new file system. No changes no nothing other than copying and repointing to the copy…

Everything else on the raspberries works and upgrades with no apparent problems, for already some years.

I’ll look at the tests you propose and will see where to go from there.

What i find painful when reinstalling from scratch is everything that is not covered by Openhab backup and restore (grafana, influxdb, my ser2net config to access the Zwave dongle on a different raspberry and a few other things like this…). But i will probably have to go for it…

Thank you for the time you took to look at my case.

Do the experiments on a new SD card. Once you figure out what works and what doesn’t work you can then decide whether you can salvage the unworking SD card/config or need to start over from scratch.

Maybe you already know and I do not know if it is relevant: there are some hardlinks in the openhabian image, that need to be treated properly by the rsync.

Hi, thanks a lot for chiming in… do you know what those hardlinks (or at least example of them)? I’ll check if that could be the issue…
I’m doing rsync -Phax --numeric-ids to perform the copies…

/srv are bind mounts, then the zram directories, but I am not an expert.

Ok, i don’t use zram, so that should be good.
Do you what /srv is used for ? I don’t use Samba either… And i never go to this /srv folder.

So, I started by doing, on both my production env (3.1.0-M3-1) and my test environment (3.3.0.M2) a full purge of openhab, checked that all relevant directories had been cleaned, and re-installed openhab 3.3.0, fresh.
I did the basic username config to check that it was basically ok and then restored my backup.
After giving the sensation that it was going to work, I ended up on both configs with the same kind of random java errors.

Tomorrow I will check your exact steps, Rich.

You may be interested in giving a look at this Thread.

Ok, sorry for getting back to this so late. Last weekend, I did rebuild a full 3.3.0 configuration on a SD card, I was able to load my backup openhab config and to recover my backup Influxdb database…
And it was successful.
Then, I moved the / directory to the NAS (using rsync as above shown), and all was good. Including installation of the bindings etc…

But, I think there is still a problem, as for nearly all bindings I try to install after having moved to the network system partition I get a significant bunch of java errors and ultimately I need to uninstall them (and then recover a properly working openhab).

I’m nearly sure that when I’ll upgrade to the soem next version of openhab, I’ll have the same problems… At this stage, my conclusion is : when I move the openhabian system to the NAS, there is something that breaks openhab, not to the point that it stops working, as I have it working like this for a long time, but to the point that it is not able to add bindings, or to upgrade etc…
:frowning:

So I need to study this more. @rlkoshak I know what I’m trying to do is not exactly what openhabian is supporting, so I’ll understand if you can’t help. But I’m hoping someone could have a similar config and be able to share some experience. This can’t be that different from the SSD based configs?

@Lionello_Marrelli , thanks for the link you shared, I don’t think what is mentionned could apply to me, as in my case the system works very well after I move the / entirely to the NAS. Do you think I am missing something?

What is strange is that e.g. neither grafana, nor influxdb, nor any other Raspeberry OS seems to have the same problem, as they all update without problem and work just fine over time, even with / on the NAS.
I rsync the / config using -H option to preserve the hard links, so I would hope that this works well.
One thing I noticed is that the problems displayed in the log are very often related to “bundleFiles” in the cache. I don’t know if this can help locating the possible challenge?

Thanks a lot in advance for any help…

Given how permissions are handled through NFS and SAMBA yes, it is significantly different from an SSD based config. But unless the errors are related to file permissions :person_shrugging: . It’s not an approach I’ve ever run with for openHAB. When I’ve tried to run something from an NFS mounted folder in the past (e.g. PostgreSQL) I had nothing but trouble.

Keep in mind that you can’t do hard links across different file systems. When you move to the NAS you have moved to a different file system so definitely pay attention to those. You might have one that is pointing somewhere that isn’t being rsynced properly. That’s just a guess though.

Thank you Rich as usual for your inputs.
For the sake of clarity, i should add that i move the entire system folder starting from /. And I use rsync to perform the move.
Then I forget about the file system on the SD card, I don’t keep any type of sync between the NFS mounted file system and the SD card.
And I have no hybrid config : all system files are on the NAS.
Also I don’t use Samba, i edit the files i need to edit directly on the system using vi or using VS Code via ssh.

One thing I’m thinking about testing is openhab installed straight on Raspi OS config without openhabian and see if there is the same challenges.

I also noticed that openhabian script to move FS to SSD is using rsync -avx options.

I’ll try rebuilding my setup with this instead of rsync -PHax like am doing so far and we’ll we if there is an improvement.

And I changed the subject of the thread to reflect the real topic, better than before.

No. openHABian does not support moving any of its partitions to SSD or NFS.
What you saw is not “the” script to move to SSD but part of the SD mirroring feature.

Noone -including myself- knows what are all the places one would have to look at and change things in for this to work so noone will have this in mind when developing, or testing against.
Not to mention the amount of work you (and we on the forum) are wasting trying this.
That’s the reason these modifications aren’t supported. That’s why you shouldn’t be doing this.
So why are you so insistent on making it work with your NAS ?
Even if you manage to get it to work, you will keep being at risk that things may change at any time with an openHABian or Linux package update, breaking your installation.

Your current setup is already less resilient than a proper new openHABian install would be, one reason because it depends on the NAS to be available in addition to your RPi (and I bet you do not have a spare NAS, do you ?).
NAS mounting has a number of more drawbacks. I know well, I myself had used such a setup in early times myself and openHABian as it is today is an evolutionary improvement over it.

I suggest you reinstall from scratch to a new SD and use the builtin mechanisms.
That is: setup SD mirroring.

You could change the directory you run your persistence database in to a NAS mounted one (remember to eventually take it off ZRAM /etc/ztab), but please do not touch the root partition, or swap.