Jetty update, Karaf 4.1.3 upgrade and full LSP support

From what I read here, The “B” and quoted versions are a specific format of Felix Config Admin - which is the service that maintains our userdata/config folder, so it is absolutely correct that files in there have such a format.

Nonetheless, everything in userdata/etc are *.cfg files, which must not use this format as they are “plain OSGi” and thus normal Java property files. I would hence say that it is a bug in Karaf or FileInstall that the Felix ConfigAdmin format is used when files are written to etc. I didn’t figure out yet, why this happens though :frowning:

Ideally, I would want Karaf not writing to etc at all, but only push changes to ConfigAdmin so that they end up in the config folder. I had a discussion with Karaf people a while ago and it turned out that this isn’t possible - Karaf will always write configurations that are handled through it to the cfg files.

I’ve now spent many many hours trying to get that working better, but so far without much success. I’d love to see anyone from the community helping on such things, but it seems that such technical internas are not that thrilling for anybody to dive into… So just as a word of warning: As long as I am alone with such stuff and have to spend complete days on it, I won’t find time to drive anything else like feature, PR reviews, new releases, etc…

1 Like

Quick note to everyone who is suffering from the “Blacklisting ServiceReference” issue - we have a new distro #1103 available that should fully solve this issue (no need for workarounds anymore, all kudos go to @sjka!).

This build also has an improved add-ons install mechanism, which hopefully reduces the situations where people see empty dashboards.

Furthermore I have changed the way the packaging configs are created, so with a bit of luck this might avoid the “additional quotes” bug in the cfg files - let’s touch wood :deciduous_tree: (or better :christmas_tree:).

Regards,
Kai

8 Likes

So it should be safe to upgrade now? I’m still on 1073, which seems to have the broken quotes but still works fine with it :slight_smile:

BackUp 1073 and try :wink:

I just upgraded to #1103.

I must say it was a quite cumbersome experience. After upgrading, OpenHAB was extremely slow (using all my CPU) and wouldn’t complete the installation of the UI’s. I rebooted and cleared the cache several times. It didn’t work. Then I downgraded back to the previous version (apt-get install openhab2=2.2.0~20171201130444-1), that worked. Then I upgraded again, and I just gave OH 10-15 or so to ‘settle down’. That seemed to work. So I’m finally up and running. :slight_smile:

I must admit, I still have that workaround in place (org.apache.felix.eventadmin.Timeout=0), but I had my portion of OH issues for this evening (WAF dropped below zero). I’ll try to switch it off tomorrow and see what happens. I’ll share the outcome.

Upgraded an hour ago (coming from #1084), no workarounds needed anymore.
But: I had to reinstall all PaperUI configured (I am not usind the addons.cfg) bindings, actions, …

Works flawlessly, no endless spinning wheels anymore, no need to reload the page. Just a simple “binding installed”. Nice work, thanks.

It took a little while until everything settled and I too had to delete cache and tmp in between, but that was it.

Also big kudos to @sjka, thx for all your hard work.

I had to reinstall all PaperUI configured (I am not usind the addons.cfg) bindings, actions, …

Didn’t it automatically re-install those? Do you still have a copy of your original userdata/etc/org.openhab.addongs.cfg file?

I too had to delete cache and tmp in between

Are you using the official upgrade script (or are you using apt)? The update script cleans the cache&tmp folders automatically and one time should definitely suffice. What effects did you have after the update that made you clean everything again?

Did not wait long enough I guess.

Yepp, backup from last night:

remote = true
package = minimal

(Although I don’t use the /service/addons.cfg since month). Timestamp org.openhab.addongs.cfg is from 20. Nov. 2017

Nope, I’m using a manual install and this time I deleted everything excluding the conf folder. So maybe the reason I had to reinstall all bindings is the fact that I deleted (on purpose) my userdata folder just to make sure I don’t keep any old files.

Ok, this explains - please use the update script in future as this has quite some logic build in on purpose. :slight_smile: And if you use it, you can help us identify potential problems in it - if you invent your own way, you are on your own :wink:

1 Like

Maybe I will try, but since migrating from 1.x to 2.x this script works well and I know what the script is doing (that is not the case for the implemented backup/restore script).

It also deletes cache and tmp during upgrading and once in a while I delete the userdata folder as well to remove any old files, especially this time where we had problems beginning with #1083 and lasting until today.

Thx, again, #1103 is still working fine.

I think the biggest issue is that the backup/restore script is kept up to date with the changing needs of the OH. As the versions change, some files move around in userdata, some files need to be preserved, some files need to be replaced. The backup/restore scripts manages all of this and essentially will do the same things to your manually installed OH that an apt-get/yum install will do to upgrade and installed OH.

3 Likes

Ok, last night, right before bedtime I decided to turn off the workaround (I commented out org.apache.felix.eventadmin.Timeout=0).

Approximately one hour later, this appeared in my logs:

2017-12-02 01:05:15.916 [WARN ] [org.apache.karaf.services.eventadmin] - EventAdmin: Blacklisting ServiceReference [{org.osgi.service.event.EventHandler, org.eclipse.smarthome.core.events.EventPublisher}={event.topics=smarthome, component.name=org.eclipse.smarthome.core.internal.events.OSGiEventManager, component.id=51, service.id=150, service.bundleid=111, service.scope=bundle} | Bundle(org.eclipse.smarthome.core_0.9.0.201712011551 [111])] due to timeout!
2017-12-02 01:05:16.705 [WARN ] [org.apache.karaf.services.eventadmin] - EventAdmin: Blacklisting ServiceReference [{org.osgi.service.event.EventHandler, org.eclipse.smarthome.core.events.EventPublisher}={event.topics=smarthome, component.name=org.eclipse.smarthome.core.internal.events.OSGiEventManager, component.id=51, service.id=150, service.bundleid=111, service.scope=bundle} | Bundle(org.eclipse.smarthome.core_0.9.0.201712011551 [111])] due to timeout!

Which froze the entire system.

I’m on build #1103

I’m going to put back the workaround.

@Kai: if you need more info to troubleshoot this, let me know what you need.

[EDIT]
I think the cause of the blacklisting is my LG TV binding (Connect SDK). Right before blacklisting happened, I saw this in my events.log:

2017-12-02 01:05:15.853 [hingStatusInfoChangedEvent] - 'connectsdk:WebOSTV:192_168_3_20' changed from ONLINE: Device Ready to OFFLINE
2017-12-02 01:05:15.853 [vent.ItemStateChangedEvent] - LGTV_GV_Living_Power changed from ON to OFF

I would need to do additional checks to see if it is a coincidence.
Anyway, one binding should not be able to trigger a blacklist of the OH code.
[/EDIT]

I can confirm that my setup with Build #1103 is working without the workaround! No issues anymore :slight_smile: Thx!!

Ok, I upgraded now, from 1073 to 1104. When I first started after the upgrade I got error message like

2017-12-02 10:28:13.328 [ERROR] [core.karaf.internal.FeatureInstaller] - Failed installing 'openhab-package-"standard"': No matching features for openhab-package-"standard"/0.0.0

in my log. So I stopped, manually removed the quotation marks from my org.openhag.addons.cfg and started again and after that things seem to work just fine!

Still struggling with something strange happening to some of my rules (I think it’s complaining about “now” being null), guess I’ll get back to that in a separate thread if I don’t solve it :slight_smile:

Anyway, one binding should not be able to trigger a blacklist of the OH code.

Fully ACK! That’s why the whole event processing now works asynchronously.

For that very reason your case puzzles me! As I don’t know which way of installing OH your are using: Are you sure the tmp & cache folder were deleted while upgrading? The log at least shows the correct version numbers, but I just wanna be sure it’s actually also running the correct code.

I would need to do additional checks to see if it is a coincidence.

That would be awesome, thanks for your help! Anything towards tracking down the mechanism how it happened is highly appreciated, as currently I can’t make any sense of it.

I can also confirm that the workaround is not needed anymore, even though the hue binding is still creating warnings at startup:

2017-12-03 12:05:23.441 [WARN ] [ome.core.thing.internal.ThingManager] - Initializing handler for thing 'hue:0210:0017884a3074:5' takes more than 5000ms.
2017-12-03 12:05:23.452 [WARN ] [nal.common.AbstractInvocationHandler] - Timeout of 5000ms exceeded while calling method 'ThingHandler.initialize()' on 'org.eclipse.smarthome.binding.hue.handler.HueLightHandler@189a57'. Thread 'safeCall-3' (101) is in state 'WAITING'
2017-12-03 12:05:23.460 [WARN ] [nal.common.AbstractInvocationHandler] - Timeout of 5000ms exceeded while calling method 'ThingHandler.initialize()' on 'org.eclipse.smarthome.binding.hue.handler.HueLightHandler@771cc0'. The task was still queued.
2017-12-03 12:05:23.461 [WARN ] [nal.common.AbstractInvocationHandler] - Timeout of 5000ms exceeded while calling method 'ThingHandler.initialize()' on 'org.eclipse.smarthome.binding.hue.handler.HueLightHandler@1ca466d'. Thread 'safeCall-2' (99) is in state 'WAITING'

Furthermore I also face the issue that after each upgrade via apt-get my bindings from the eclipse market place are gone.

I opened an issue yesterday about the hue initialization taking more than 5s.

I think this could be due to a bottleneck in the thread pool size, which currently only has 3 threads - I have created this PR in order to increase the pool.
If the HueLightHandler is causing a delay, 3 of them suffice to exhaust the pool, which would explain the effect.

@Dries and everybody else who still run into the blacklisting (not individual handler timeouts) issue: could you please switch the log level of org.eclipse.smarthome.core.internal.common to TRACE, i.e.

log:set TRACE org.eclipse.smarthome.core.internal.common

It would be interesting to see what happens there during the approx. 30 seconds before the blacklisting log entry occurs.

And no, an exhausted thread-pool should not cause this issue anymore. Although it for sure can’t hurt to increase its size anyway because it would result in a more responsive system at peak times on most hardware.

Hi Simon,

My upgrade from #1098 to #1103 was done via apt-get. So /tmp and /cache should be deleted automatically (I didn’t check).

I switched the workaround off this morning after upgrading to #1106. I just set the log level to what you suggested. I’ll keep an eye on it. My log is filling at a pretty high pace now. :slight_smile: BTW, how to I set the level back to normal?

So far (10h later), I didn’t encounter the Blacklisting issue.

Although my Chromecast binding is sometimes generating this:

2017-12-03 13:59:50.441 [WARN ] [nal.common.AbstractInvocationHandler] - Timeout of 5000ms exceeded while calling method 'StateProfile.onCommandFromItem()' on 'org.eclipse.smarthome.core.thing.internal.profiles.SystemDefaultProfile@34e9c15d'. Thread 'safeCall-13' (414) is in state 'TIMED_WAITING'
	at java.lang.Object.wait(Native Method)
	at su.litvak.chromecast.api.v2.Channel$ResultProcessor.get(Channel.java:239)
	at su.litvak.chromecast.api.v2.Channel.send(Channel.java:367)
	at su.litvak.chromecast.api.v2.Channel.sendStandard(Channel.java:336)
	at su.litvak.chromecast.api.v2.Channel.getMediaStatus(Channel.java:508)
	at su.litvak.chromecast.api.v2.ChromeCast.getMediaStatus(ChromeCast.java:295)
	at org.openhab.binding.chromecast.internal.ChromecastCommander.handleControl(ChromecastCommander.java:151)
	at org.openhab.binding.chromecast.internal.ChromecastCommander.handleCommand(ChromecastCommander.java:61)
	at org.openhab.binding.chromecast.handler.ChromecastHandler.handleCommand(ChromecastHandler.java:147)
	at sun.reflect.GeneratedMethodAccessor188.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.eclipse.smarthome.core.internal.common.AbstractInvocationHandler.invokeDirect(AbstractInvocationHandler.java:157)
	at org.eclipse.smarthome.core.internal.common.InvocationHandlerSync.invoke(InvocationHandlerSync.java:56)
	at com.sun.proxy.$Proxy144.handleCommand(Unknown Source)
	...

I’ll post an update if I have new findings.

[EDIT]
Okay, logs are filling REALLY fast. I just generated 8Mb of data on openhab.log, and that includes just 3 minutes of TRACE. That will be about 160Mb/hour. Last time the system blacklisted something was during the night. Since OH just keeps a limited number of log-data, I’m not sure I’ll be able to see the trace info.