Jetty update, Karaf 4.1.3 upgrade and full LSP support

Tags: #<Tag:0x00007f433e18f708> #<Tag:0x00007f433e18f618> #<Tag:0x00007f433e18f438>

Just for the record: This workaround completely breaks the Tellstick binding. I’m starting to understand why @Kai wouldn’t recommend it :slight_smile:

Yes, I think this is what I have seen in testing too.

I cannot imagine that this could be related. Do you have any proof?

I’m starting to understand why @Kai wouldn’t recommend it :slight_smile:

I was just about to recommend it - no not recommend it but to set this as a default. I did some tests with openHAB 2.1 and saw that this never ever did blacklisting of event handlers, no matter how long they block. So with the Karaf upgrade, this seems to be a new behaviour (and the problem of exceeded timeouts had been present before) - by adding this option, we would be back at full 2.1 compatibility. I have created a PR for it, so if you think this is a bad idea, you should veto asap :slight_smile:

Sounds as if Felix FileInstall isn’t our friend either :frowning: Not just the KAR handling is broken, also normal jars cause issues. Can anyone confirm that this is only since the Karaf upgrade? Or were there problems on Karaf 4.0.8 already as well?

To speed up things I’ve set the zwave binding to DEBUG, makes a lot of logging data :grinning:

In about 14 hours of log files (30 MB of data) both timeouts

2017-12-06 08:47:49.578 [WARN ] [nal.common.AbstractInvocationHandler] - Timeout of 5000ms exceeded while calling method 'ThingHandler.initialize()' on 'org.eclipse.smarthome.binding.hue.handler.HueLightHandler@7aec67'. Thread 'safeCall-18' (444) is in state 'WAITING'
2017-12-06 08:50:03.906 [WARN ] [nal.common.AbstractInvocationHandler] - Timeout of 5000ms exceeded while calling method 'EventSubscriber.receive()' on 'org.eclipse.smarthome.io.monitor.internal.EventLogger@1b389ce'. Thread 'safeCall-18' (444) is in state 'RUNNABLE'

did not show up anymore.
So what should be my lessons learned? I guess this is not an openHAB related problem, but a problem with my hardware?
Btw, I’m using a RPi3 with 128 GB USB 2.0 SSD Harddisk attached, so there is no sd card playing around.
I need to upgrade from Jessie to Stretch anyway, so maybe it’s time to do a new OS install …

Not more than I’m describing in this issue. I did everything I could to try to get the binding going (restarted telldus-core, restarted the binding, restarted entire openhab, fiddled around with the library path setting) and nothing helped. As a desperate measure I removed the org.apache.felix.eventadmin.Timeout=0 and restarted OH, and suddenly everything was working.

edit: Ok, I’m starting to give up my efforts to understand what’s happening here. I now put the Timeout=0 back and restarted OH and everything seems to be working. I HATE problems that I don’t understand!

edit 2: Nope. Too happy too soon. Another restart of the binding and it’s broken again…

Havin’ problems, too. Haven’t seen this before the Karaf upgrade, and it started some time after I recently added another jar. OH somehow stopped and after that it refused to do anything after its startup sequence. Tried multiple restarts, removed/readded jars until I noticed my ZWave controller thing was gone, in fact ALL of my things were gone.
Had to recreate the controller manually, now it’s working again (I think … still checking).
On one restart occasion, I noticed it keeps loading .items file multiple times ??
FWIW, felix.eventadmin.Timeout=0 was in place before and all the time.
This was with #1099 as well as now with latest #1115.
Confused, too…

Well, it hints at the fact that indeed sometimes the logging framework (log4j2) takes long for writing a single logging message to disk - and requiring more than 5 seconds to execute a logging line in the code is something rather unexpected to all developers… As I have removed the event handler timeout in the distro now, I would prefer to keep the logging configuration the way it is (i.e. to always flush it to disk), because it is rather annoying to see the latest log entries only after minutes, when the logger decides that stuff should be written to the file. We should keep it in the back of our minds though that extensive logging might cause an issue on some platforms and that changing it to use a memory buffer can resolve this.

1 Like

I have changed the Jetty configuration in the latest build #1117. Could you update to this and see if the warnings are gone?

Sorry for my late reply, I was traveling.

This is probably a stupid question, but what do mean exactly with “log rollover”? I guess this is the event where openhab.log becomes openhab.log.X and a new (empty) openhab.log is created. Right?

If I look at the timing of the first Blacklist event, this might be possible:

  • 2017-12-03 22:00:59.962 => log rollover;
    In between: 67.600 entries in the openhab.log are written;
  • 2017-12-03 22:01:24.384 => Blacklist event occurs;
    In between: 28.200 entries in the openhab.log are written;
  • 2017-12-03 22:01:30.098 => next rollover event occurs;

On the other hand, I’m not sure if this Blacklist event can be considered as a reliable/realistic case. As soon as I switched on the trace for org.eclipse.smarthome.core.internal.common, OpenHAB was mostly occupied with writing data to the log (it needed just 30 seconds to create 16Mb of log data).

My second Blacklist event (2017-12-03 22:51:18.769) happened approx. 50 minutes after a log rollover (and after 23000 lines were written in the logfile). And nothing significant happens right before that event. The last event in openhab.log was 9 minutes earlier:

2017-12-03 22:42:52.895 [INFO ] [lipse.smarthome.model.script.Spotify] - Spotify client ID and Secret set: spotify_client_id (...)
2017-12-03 22:51:18.769 [WARN ] [org.apache.karaf.services.eventadmin] - EventAdmin: Blacklisting ServiceReference [{org.osgi.service.event.EventHandler, org.eclipse.smarthome.core.events.EventPublisher}={event.topics=smarthome, component.name=org.eclipse.smarthome.core.internal.events.OSGiEventManager, component.id=68, service.id=157, service.bundleid=111, service.scope=bundle} | Bundle(org.eclipse.smarthome.core_0.9.0.201712011551 [111])] due to timeout!

I saw you included the workaround as default in the latest snapshot build. I’m not sure if that means you are still investigating the issue or not. My experience shows at least no disadvantages with this workaround.

Kind regards,
Dries

After moving to #1119, OH reproducibly fails to load all of its bindings as defined in services/addons.cfg.
I’ve got 3 jars in the addons folder that do load. When I remove them, addons-defined bundle processing on (re)start is fine.
Any hint how to force reprocessing of addons.cfg, any bundle I would need to restart for that to happen ?

I’ll follow up with another maybe stupid question: Is the openhab logs supposed to be rotated? Mine has never done any rotating, they just fill up forever. I’m using the package from snapshot release (Debian). Should a logrotate config file be included in the package?

The OH java builtin logger takes care of rotation. Yes that’s what is meant by “log rollover”.

Hey guys, just installed the 1120

While restarting OH i got this Error.

2017-12-09 13:23:40.647 [ERROR] [lipse.smarthome.model.script.runtime] - FrameworkEvent ERROR - org.eclipse.smarthome.model.script.runtime
org.osgi.framework.BundleException: Unable to acquire the state change lock for the module: osgi.identity; type="osgi.bundle"; version:Version="0.10.0.201712081218"; osgi.identity="org.eclipse.smarthome.model.script.runtime"; singleton:="true" [id=147] STARTED [STARTED]
        at org.eclipse.osgi.container.Module.lockStateChange(Module.java:337) [?:?]
        at org.eclipse.osgi.container.Module.start(Module.java:401) [?:?]
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1620) [?:?]
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.incStartLevel(ModuleContainer.java:1600) [?:?]
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.doContainerStartLevel(ModuleContainer.java:1571) [?:?]
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1514) [?:?]
        at org.eclipse.osgi.container.ModuleContainer$ContainerStartLevel.dispatchEvent(ModuleContainer.java:1) [?:?]
        at org.eclipse.osgi.framework.eventmgr.EventManager.dispatchEvent(EventManager.java:230) [?:?]
        at org.eclipse.osgi.framework.eventmgr.EventManager$EventThread.run(EventManager.java:340) [?:?]
Caused by: java.util.concurrent.TimeoutException: Timeout after waiting 5 seconds to acquire the lock.
        at org.eclipse.osgi.container.Module.lockStateChange(Module.java:334) ~[?:?]
        ... 8 more

maybe that is helpfull in some way

openHAB 2.2.0 Build # 1120.
Warnings are displayed.

Then I have no idea how to help there further :frowning:
Best if you adapt your logging configuration to ignore this logger.

where you can learn more about this?

https://logging.apache.org/log4j/2.x/

Thank you! I’ll try to figure it out, it looks complicated.

Yes, it did. But I found the problem: don’t load your tmp folder via tmpfs in memory and make it too small :sunglasses:

2 Likes

This is quite a long thread. I had the original error last night when everything stopped working in Build 1131. Was there a fix and should it have been in this build?
Apologies if I missed the important information in the thread.

Thanks!

2017-12-24 06:25:38.561 [WARN ] [org.apache.karaf.services.eventadmin] - EventAdmin: Blacklisting ServiceReference [{org.osgi.service.event.EventHandler, org.eclipse.smarthome.core.events.EventPublisher}={event.topics=smarthome, component.name=org.eclipse.smarthome.core.internal.events.OSGiEventManager, component.id=32, service.id=137, service.bundleid=110, service.scope=bundle} | Bundle(org.eclipse.smarthome.core_0.10.0.201712111157 [110])] due to timeout!