I note that there are also errors in here for Hue as well -:
15:50:55.373 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing hue:0200:1:light17 takes more than 5000ms for processing event
@kai to me this seems to be more widespread than a binding or two. I see other messages on the forum about mgtt, here we have hue, sonos and zwave and I think I’ve also seen messages about exec and network as well.
Same here for the “milight” binding.
Lights won’t switch from time to time…
2017-02-08 23:15:51.300 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing 'milight:rgbLed:b827ebe64882:9' takes more than 5000ms for processing event
2017-02-08 23:15:51.301 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@d060bf' takes more than 5000ms.
Does anybody have a log with DEBUG enabled for (at least) org.eclipse.smarthome.core.common.*? There should be something like Timeout of [...]ms exceeded, thread [...] in state [...] is at [...]).
This would help to get an idea where it got stuck.
09:13:56.286 [INFO ] [smarthome.event.ItemCommandEvent ] - Item 'ZW_WP2_Switch' received command ON
09:14:01.301 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:14:01.307 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-856 (22744) in state RUNNABLE is at sun.misc.Unsafe.park(Unsafe.java:-2).
09:14:01.317 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:14:01.301 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:14:01.311 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing 'zwave:device:controller:node10' takes more than 5000ms for processing event
09:14:01.321 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing zwave:device:controller:node10 takes more than 5000ms for processing event
09:14:01.323 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 10: Command received zwave:device:controller:node10:switch_binary --> ON
09:14:01.311 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-857 (22745) in state BLOCKED is at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:231).
09:14:01.324 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 10: Creating new message for application command SWITCH_BINARY_SET
09:14:01.325 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:14:01.327 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 0
09:14:01.331 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 0A 00 13 0A 03 25 01 FF 25 9F 8E
09:14:01.331 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 0. Queue={}
09:14:01.335 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 10: Sending REQUEST Message = 01 0A 00 13 0A 03 25 01 FF 25 9F 8E
09:14:01.343 [INFO ] [marthome.event.ItemStateChangedEvent] - ZW_WP2_Switch changed from OFF to ON
Sonos:
09:19:00.204 [INFO ] [smarthome.event.ItemCommandEvent ] - Item 'Sonos_PLAY51_Control' received command PLAY
09:19:05.211 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:19:05.211 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:19:05.215 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing sonos:zoneplayer:3 takes more than 5000ms for processing event
09:19:05.211 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-859 (22839) in state RUNNABLE is at sun.misc.Unsafe.park(Unsafe.java:-2).
09:19:05.215 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing 'sonos:zoneplayer:3' takes more than 5000ms for processing event
09:19:05.216 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:19:05.216 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-860 (22840) in state TIMED_WAITING is at sun.nio.cs.UTF_8$Encoder.encodeArrayLoop(UTF_8.java:596).
09:19:05.217 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:19:05.232 [INFO ] [marthome.event.ItemStateChangedEvent] - Sonos_PLAY51_Control changed from PAUSE to PLAY
It seems as if the thread pools go out of threads. The warning logs actually blame the wrong ones in that case: It’s the ones that do not get a thread assigned anymore in time. This also explains why the binding receives the events only afterwards.
One short term workaround would be to increase the safeCall threadpool size. You could try to add the line
org.eclipse.smarthome.threadpool:safeCall=10
to conf/services/runtime.cfg and see if that helps.
We will look at improving the thread assignment and the warning logging within the ThingManager.
It seems that some recent refactoring causes multiple safeMethodCalls in a row, which can lead to an exhausted thread pool very easily. @sjka is working on a fix, I hope it won’t take long. I’ll wait with the next ESH stable build until this is available.
Kai, I also haven’t seen any repository build for the unstable snapshot version on bintray for the last couple of days. The last build is from February 5th. Is this intentionally or is there a broken mechanism?
@sjka did multiple fixes and improvements on that part (thanks for this!) and the problem should hopefully be gone now (without the need to reconfigure the pool size). I have just done a new ESH stable build and an openHAB distro that contains the fix. Please test and report back (especially in case you still see a problem!).
We had to deactivate the snapshot apt builds due to quota issues. @Benjy is working on moving them to a new place. Meanwhile, I will trigger manual apt snapshots, whenever something important happens (like just right now), so the latest build is available via apt as well).
@philKrylc Sounds weird - I wouldn’t know how the fixes could be causing this; maybe something else is wrong in your setup. If the problem remains, could you provide more detailed logs (including org.eclipse.smarthome.core.common debug) here?
A bit more information, I wasn’t having this issue until I increased the number of things I own. Now, when I restart openHAB two of my five Amazon Dash buttons (and it is always two, but could be different each time) will get stuck on initialising, one of my six hue bulbs will respond very slowly too. bundle:restart ### fixes this problem each time.
2017-02-13 17:25:36.282 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created thread pool 'ConfigStatusService' with size 5
2017-02-13 17:25:37.110 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.thing.internal.ChannelItemProvider' has been added.
2017-02-13 17:25:37.157 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'thingManager' of size 5
2017-02-13 17:25:38.234 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.model.item.internal.GenericItemProvider' has been added.
2017-02-13 17:25:41.585 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'mapdb.persist'
2017-02-13 17:25:46.604 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'other.rules'
2017-02-13 17:25:48.691 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'alexwork.rules'
2017-02-13 17:25:49.281 [INFO ] [el.core.internal.ModelRepositoryImpl] - Validation issues found in configuration model 'bedroom.rules', using it anyway:
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
2017-02-13 17:25:49.288 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'bedroom.rules'
2017-02-13 17:25:49.628 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'livingroom.rules'
2017-02-13 17:25:51.176 [INFO ] [el.core.internal.ModelRepositoryImpl] - Validation issues found in configuration model 'hallway.rules', using it anyway:
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
2017-02-13 17:25:51.181 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'hallway.rules'
2017-02-13 17:25:51.637 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'kitchen.rules'
2017-02-13 17:25:52.855 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.sitemap'
2017-02-13 17:25:52.988 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.model.thing.internal.GenericItemChannelLinkProvider' has been added.
2017-02-13 17:25:53.021 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.model.thing.internal.GenericThingProvider' has been added.
2017-02-13 17:25:53.350 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created thread pool 'safeCall' with size 3
2017-02-13 17:25:53.456 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.thing.link.ManagedItemChannelLinkProvider' has been added.
2017-02-13 17:25:53.839 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.thing.ManagedThingProvider' has been added.
2017-02-13 17:25:54.013 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.items.ManagedItemProvider' has been added.
2017-02-13 17:25:54.031 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'discovery' of size 3
2017-02-13 17:25:55.251 [INFO ] [.dashboard.internal.DashboardService] - Started dashboard at /start
2017-02-13 17:25:55.516 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'thingHandler' of size 3
2017-02-13 17:25:57.057 [INFO ] [basic.internal.servlet.WebAppServlet] - Started Basic UI at /basicui/app
2017-02-13 17:25:57.202 [INFO ] [arthome.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui
2017-02-13 17:25:57.304 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'pcapNetworkInterfaceService' of size 5
2017-02-13 17:25:58.399 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created thread pool 'items' with size 5
2017-02-13 17:25:58.611 [WARN ] [org.pcap4j.util.PropertiesLoader ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.inet, use default value: null
2017-02-13 17:25:58.614 [WARN ] [org.pcap4j.util.PropertiesLoader ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.inet6, use default value: null
2017-02-13 17:25:58.616 [WARN ] [org.pcap4j.util.PropertiesLoader ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.packet, use default value: null
2017-02-13 17:25:58.617 [WARN ] [org.pcap4j.util.PropertiesLoader ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.link, use default value: null
2017-02-13 17:25:58.675 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin
2017-02-13 17:25:58.812 [WARN ] [org.pcap4j.util.PropertiesLoader ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.dlt.raw, use default value: null
2017-02-13 17:25:58.823 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
2017-02-13 17:26:01.732 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Connecting to serial port '/dev/ttyACM0'
2017-02-13 17:26:01.832 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Serial port is initialized
2017-02-13 17:26:01.859 [INFO ] [mmandclass.ZWaveSecurityCommandClass] - Update networkKey
2017-02-13 17:26:01.901 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller
2017-02-13 17:26:01.903 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.
2017-02-13 17:26:02.315 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-2 (99) in state RUNNABLE is at org.pcap4j.core.NativeMappings.pcap_findalldevs(NativeMappings.java:-2).
2017-02-13 17:26:02.316 [WARN ] [ome.core.thing.internal.ThingManager] - Initializing handler for thing 'amazondashbutton:dashbutton:50-f5-da-93-9f-67' takes more than 5000ms.
This is perfect - just exactly what the changes were about: To correctly identify badly behaving handlers and blame them (and no one else anymore):
2017-02-13 17:26:02.315 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-2 (99) in state RUNNABLE is at org.pcap4j.core.NativeMappings.pcap_findalldevs(NativeMappings.java:-2).
2017-02-13 17:26:02.316 [WARN ] [ome.core.thing.internal.ThingManager] - Initializing handler for thing 'amazondashbutton:dashbutton:50-f5-da-93-9f-67' takes more than 5000ms.
This is absolutely true: The handler takes more than 5s to initialize() because it does some fancy pcap stuff in there. Instead, it should schedule a job to do such long running activities asynchronously. @oliver Could you improve the implementation that way?