OH 4.1.1 startup takes more than an hour

Hi,
after updating from OH 3.4 to OH 4.1.1, the startup takes more than an hour:

2024-01-23 14:19:44.855 [INFO ] [org.openhab.core.Activator          ] - Starting openHAB 4.1.1 (build Release Build)
2024-01-23 14:19:46.193 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Time zone set to 'Europe/Berlin'.
2024-01-23 14:19:46.219 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Location set to '51xxx,6.yyy'.
2024-01-23 14:19:46.223 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Locale set to 'de_DE'.
2024-01-23 14:19:46.226 [INFO ] [.core.internal.i18n.I18nProviderImpl] - Measurement system set to 'SI'.
2024-01-23 14:20:16.313 [INFO ] [.core.model.lsp.internal.ModelServer] - Started Language Server Protocol (LSP) service on port 5007
2024-01-23 14:20:31.227 [WARN ] [ty.util.ssl.SslContextFactory.config] - Trusting all certificates configured for Client@34567[provider=null,keyStore=null,trustStore=null]
2024-01-23 14:20:31.230 [WARN ] [ty.util.ssl.SslContextFactory.config] - No Client EndPointIdentificationAlgorithm configured for Client@34567[provider=null,keyStore=null,trustStore=null]
2024-01-23 14:20:33.943 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = 5xxxx1, base URL = http://localhost:8080)
2024-01-23 14:20:35.417 [INFO ] [ab.ui.habpanel.internal.HABPanelTile] - Started HABPanel at /habpanel
2024-01-23 14:20:47.542 [WARN ] [marytts.TextToMaryXML               ] - Locale is null, overriding with de_DE
2024-01-23 14:20:47.545 [WARN ] [marytts.TextToMaryXML               ] - Locale is null, overriding with de_DE

<<< what happens here?????

2024-01-23 15:36:34.668 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'home.sitemap'
2024-01-23 15:36:37.921 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'network.things'
2024-01-23 15:36:38.554 [INFO ] [ternal.dhcp.DHCPPacketListenerServer] - DHCP request packet listener online
2024-01-23 15:36:39.032 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'exec.things'
2024-01-23 15:36:39.072 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'mail.things'

Any ideas on how to debug this?
I thought cleaning cache and multiple reboots would help, but no success yet.

You did not provide all needed information:
Hardware
OS
Java version
Installation/Upgrade method

Ups, you are right:
OpenHabian on Raspbian Linux 11 (bullseye)
Raspberry Pi 4, 4GB
Java 17
Upgrade via Openhabian

Any old openHAB 3 bindings running (addons folder) ?

No, only openhab-addons-4.1.1.kar in this folder.

Impossible to tell. I’d guess you’re stuck in some kind of race condition.
Is it reproducible behavior ? Forcefully restart OH to see. Clean the cache, eventually.
Enable debug level logging on org.apache and org.openhab so there’s an extensive log you might have a chance of spotting some cause or turning point.

Yes, it is reproducible - since the upgrade, I have restarted the system (sometimes only openHAB service) 5…7 times, always same delay.
I have cleaned the cache only once, so I will try again.
Plus I will try to enable debug level logging.

Okay, stopped the service, cleaned cache and restarted with both log levels set to DEBUG.
Startup is in progress for >35mins now.
Here is the log:
openhab.log (88.7 KB)
Any ideas of what could be wrong?

Just as an idea: deactivate all 3 Suggestion Finders in Add-On Management under settings

Tried restart with deactivated suggestion finders, but no difference, still 1h startup…

Not sure if this is the root cause but at least a starting point. I see in the logs that a new hue:bridge thing was added and the server is constantly updating discovery results.
Try disabling the hue binding.
If this does not help try disabling other bindings.

Agree remove all bindings and then re add them one by one if that fixes the delay. My system is fully restarted in under sixty seconds for which I am Thankful for at the moment as doing development on Actions means a restart everytime I trial a new jar. I do not use a PI.

Addons can be more then bindings like the Mary text to speech.

Hit pause on any rules and use the pause buttons on things as well if you narrow it down to a binding. The pause button is very handy for this kind of thing.

Start listing all installed bindings and we could tell you which ones are the most suspicious. As an example example, no need to uninstall astro or ntp.

Start by uninstalling all non official bindings, even more if you use versions not built with your current version of OH.

Okay, will go this way.
Might take some time with 21 (official) bindings, plus automation, UI, persistence, transformation, voice&speech…

My rough guess on that is that you probably have a thing which is “flickering” and does not reach online status. Loading of rules etc. is one of operation after which start level indicator goes up. Only things are making it conditional, so all of them have to get online (if binding supporting them is present).

OH4 is more resource intensive and as you mentioned 21 bindings a Pi4, 4MB might be too low.
Ho much heap memory did you allocate?

No, 4GB should still be sufficient, if there is nothing else running on the machine.

Are you able to reach the karaf console when it is doing this? you may be able to do a bundle:list and see what has started and possibly it may list what is starting next. Not sure if it would be useful as never had this happen.

Wow, many hints, thanks a lot!
Yesterday, I removed the Hue and Mary Text to speech bindings, but startup still took over an hour.

There is nothing else running on the 4GB PI4B, so this should be fine.
My system information shows 1.5% CPU load, 1054MB used, 2770MB free, 477 CPU threads.

I will test if I can reach the karaf console and see what bundle:list shows.
I will also go on disabling bindings, but with a startup time of 1h, this is time consuming…

In that kind of situation :

  1. Start from scratch > remove everything, start, increase number of plugins, re-start, …
  2. Use “dychotomy methodology” (you have 21 plugins, list them alphabetically and add them in batch, so go through (for example) : 0 plugin (working) > 10 plugins failed) > 5 plugins (working) > 7 plugins (failed) > 6 (culprit)