Handler takes more than 5000ms for processing event

chris · February 8, 2017, 8:42pm

I note that there are also errors in here for Hue as well -:

15:50:55.373 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing hue:0200:1:light17 takes more than 5000ms for processing event

@kai to me this seems to be more widespread than a binding or two. I see other messages on the forum about mgtt, here we have hue, sonos and zwave and I think I’ve also seen messages about exec and network as well.

https://community.openhab.org/search?q=more%20than%205000ms

poertner · February 8, 2017, 10:38pm

Same here for the “milight” binding.
Lights won’t switch from time to time…

2017-02-08 23:15:51.300 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing 'milight:rgbLed:b827ebe64882:9' takes more than 5000ms for processing event
2017-02-08 23:15:51.301 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@d060bf' takes more than 5000ms.

J-N-K · February 9, 2017, 6:10am

Not much help: the new DMX binding shows it, Too. But It’s not everytime and I can’ figure our how to reproduce it.

sjka · February 9, 2017, 8:06am

Does anybody have a log with DEBUG enabled for (at least) org.eclipse.smarthome.core.common.*? There should be something like
Timeout of [...]ms exceeded, thread [...] in state [...] is at [...]).
This would help to get an idea where it got stuck.

philKrylc · February 9, 2017, 8:14am

Here you go:

ZWave:

09:13:56.286 [INFO ] [smarthome.event.ItemCommandEvent    ] - Item 'ZW_WP2_Switch' received command ON
09:14:01.301 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:14:01.307 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-856 (22744) in state RUNNABLE is at sun.misc.Unsafe.park(Unsafe.java:-2).
09:14:01.317 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:14:01.301 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:14:01.311 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing 'zwave:device:controller:node10' takes more than 5000ms for processing event
09:14:01.321 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing zwave:device:controller:node10 takes more than 5000ms for processing event
09:14:01.323 [DEBUG] [ding.zwave.handler.ZWaveThingHandler] - NODE 10: Command received zwave:device:controller:node10:switch_binary --> ON
09:14:01.311 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-857 (22745) in state BLOCKED is at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:231).
09:14:01.324 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 10: Creating new message for application command SWITCH_BINARY_SET
09:14:01.325 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:14:01.327 [DEBUG] [ocol.ZWaveController$ZWaveSendThread] - Took message from queue for sending. Queue length = 0
09:14:01.331 [DEBUG] [wave.internal.protocol.SerialMessage] - Assembled message buffer = 01 0A 00 13 0A 03 25 01 FF 25 9F 8E 
09:14:01.331 [DEBUG] [ve.internal.protocol.ZWaveController] - Message queued. Queue length = 0. Queue={}
09:14:01.335 [DEBUG] [ing.zwave.handler.ZWaveSerialHandler] - NODE 10: Sending REQUEST Message = 01 0A 00 13 0A 03 25 01 FF 25 9F 8E 
09:14:01.343 [INFO ] [marthome.event.ItemStateChangedEvent] - ZW_WP2_Switch changed from OFF to ON

Sonos:

09:19:00.204 [INFO ] [smarthome.event.ItemCommandEvent    ] - Item 'Sonos_PLAY51_Control' received command PLAY
09:19:05.211 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:19:05.211 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded with no thread info available.
09:19:05.215 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing sonos:zoneplayer:3 takes more than 5000ms for processing event
09:19:05.211 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-859 (22839) in state RUNNABLE is at sun.misc.Unsafe.park(Unsafe.java:-2).
09:19:05.215 [WARN ] [ome.core.thing.internal.ThingManager] - Handler for thing 'sonos:zoneplayer:3' takes more than 5000ms for processing event
09:19:05.216 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:19:05.216 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-860 (22840) in state TIMED_WAITING is at sun.nio.cs.UTF_8$Encoder.encodeArrayLoop(UTF_8.java:596).
09:19:05.217 [WARN ] [ore.internal.events.OSGiEventManager] - Dispatching event to subscriber 'org.eclipse.smarthome.core.thing.internal.ThingManager@1ddd675' takes more than 5000ms.
09:19:05.232 [INFO ] [marthome.event.ItemStateChangedEvent] - Sonos_PLAY51_Control changed from PAUSE to PLAY

chris · February 9, 2017, 8:52am

In the case of the ZWave log, the binding doesn’t receive the command message until after the error.

This message -:

09:14:01.324 [DEBUG] [dclass.ZWaveBinarySwitchCommandClass] - NODE 10: Creating new message for application command SWITCH_BINARY_SET

:- is when the binding is called (the first line in the handler is to log this message).

Kai · February 9, 2017, 8:58am

It seems as if the thread pools go out of threads. The warning logs actually blame the wrong ones in that case: It’s the ones that do not get a thread assigned anymore in time. This also explains why the binding receives the events only afterwards.
One short term workaround would be to increase the safeCall threadpool size. You could try to add the line

org.eclipse.smarthome.threadpool:safeCall=10

to conf/services/runtime.cfg and see if that helps.
We will look at improving the thread assignment and the warning logging within the ThingManager.

Kai · February 9, 2017, 12:17pm

It seems that some recent refactoring causes multiple safeMethodCalls in a row, which can lead to an exhausted thread pool very easily. @sjka is working on a fix, I hope it won’t take long. I’ll wait with the next ESH stable build until this is available.

philKrylc · February 9, 2017, 5:06pm

I added this line in my config and for now it looks promising.

jaydee73 · February 9, 2017, 5:32pm

Kai, I also haven’t seen any repository build for the unstable snapshot version on bintray for the last couple of days. The last build is from February 5th. Is this intentionally or is there a broken mechanism?

Kai · February 10, 2017, 1:39pm

@sjka did multiple fixes and improvements on that part (thanks for this!) and the problem should hopefully be gone now (without the need to reconfigure the pool size). I have just done a new ESH stable build and an openHAB distro that contains the fix. Please test and report back (especially in case you still see a problem!).

We had to deactivate the snapshot apt builds due to quota issues. @Benjy is working on moving them to a new place. Meanwhile, I will trigger manual apt snapshots, whenever something important happens (like just right now), so the latest build is available via apt as well).

philKrylc · February 10, 2017, 11:21pm

Updated to the latest snapshot and removed the “workaround” from runtime.cfg.

My dashbuttons stuck at INITIALIZING, error in the log is:

Initializing handler for thing ‘amazondashbutton:dashbutton:ac-63-be-55-0b-1a’ takes more than 5000ms.

Moreover, all my Hues and the bridge remain offline.

rgerrans · February 11, 2017, 12:31am

I just updated to the latest snapshot (didn’t previously apply the workaround but was having the same issue) and mine started back up with no problem.

Kai · February 11, 2017, 10:16am

@philKrylc Sounds weird - I wouldn’t know how the fixes could be causing this; maybe something else is wrong in your setup. If the problem remains, could you provide more detailed logs (including org.eclipse.smarthome.core.common debug) here?

Benjy · February 13, 2017, 4:42pm

A bit more information, I wasn’t having this issue until I increased the number of things I own. Now, when I restart openHAB two of my five Amazon Dash buttons (and it is always two, but could be different each time) will get stuck on initialising, one of my six hue bulbs will respond very slowly too. bundle:restart ### fixes this problem each time.

Will get a debug log when it next happens.

Kai · February 13, 2017, 5:10pm

@Benjy And you are on a snapshot that has the fixes already?

Benjy · February 13, 2017, 5:13pm

I believe so yes, this is with the build on 12-02-17 10:46:42 (#789)

Kai · February 13, 2017, 5:15pm

Ok, then we’ll need logs…

Benjy · February 13, 2017, 5:43pm

2017-02-13 17:25:36.282 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created thread pool 'ConfigStatusService' with size 5
2017-02-13 17:25:37.110 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.thing.internal.ChannelItemProvider' has been added.
2017-02-13 17:25:37.157 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'thingManager' of size 5
2017-02-13 17:25:38.234 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.model.item.internal.GenericItemProvider' has been added.
2017-02-13 17:25:41.585 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'mapdb.persist'
2017-02-13 17:25:46.604 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'other.rules'
2017-02-13 17:25:48.691 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'alexwork.rules'
2017-02-13 17:25:49.281 [INFO ] [el.core.internal.ModelRepositoryImpl] - Validation issues found in configuration model 'bedroom.rules', using it anyway:
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
2017-02-13 17:25:49.288 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'bedroom.rules'
2017-02-13 17:25:49.628 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'livingroom.rules'
2017-02-13 17:25:51.176 [INFO ] [el.core.internal.ModelRepositoryImpl] - Validation issues found in configuration model 'hallway.rules', using it anyway:
There is no context to infer the closure's argument types from. Consider typing the arguments or put the closures into a typed context.
2017-02-13 17:25:51.181 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'hallway.rules'
2017-02-13 17:25:51.637 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'kitchen.rules'
2017-02-13 17:25:52.855 [INFO ] [el.core.internal.ModelRepositoryImpl] - Loading model 'default.sitemap'
2017-02-13 17:25:52.988 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.model.thing.internal.GenericItemChannelLinkProvider' has been added.
2017-02-13 17:25:53.021 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.model.thing.internal.GenericThingProvider' has been added.
2017-02-13 17:25:53.350 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created thread pool 'safeCall' with size 3
2017-02-13 17:25:53.456 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.thing.link.ManagedItemChannelLinkProvider' has been added.
2017-02-13 17:25:53.839 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.thing.ManagedThingProvider' has been added.
2017-02-13 17:25:54.013 [DEBUG] [ore.common.registry.AbstractRegistry] - Provider 'org.eclipse.smarthome.core.items.ManagedItemProvider' has been added.
2017-02-13 17:25:54.031 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'discovery' of size 3
2017-02-13 17:25:55.251 [INFO ] [.dashboard.internal.DashboardService] - Started dashboard at /start
2017-02-13 17:25:55.516 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'thingHandler' of size 3
2017-02-13 17:25:57.057 [INFO ] [basic.internal.servlet.WebAppServlet] - Started Basic UI at /basicui/app
2017-02-13 17:25:57.202 [INFO ] [arthome.ui.paper.internal.PaperUIApp] - Started Paper UI at /paperui
2017-02-13 17:25:57.304 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created scheduled thread pool 'pcapNetworkInterfaceService' of size 5
2017-02-13 17:25:58.399 [DEBUG] [rthome.core.common.ThreadPoolManager] - Created thread pool 'items' with size 5
2017-02-13 17:25:58.611 [WARN ] [org.pcap4j.util.PropertiesLoader    ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.inet, use default value: null
2017-02-13 17:25:58.614 [WARN ] [org.pcap4j.util.PropertiesLoader    ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.inet6, use default value: null
2017-02-13 17:25:58.616 [WARN ] [org.pcap4j.util.PropertiesLoader    ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.packet, use default value: null
2017-02-13 17:25:58.617 [WARN ] [org.pcap4j.util.PropertiesLoader    ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.af.link, use default value: null
2017-02-13 17:25:58.675 [INFO ] [ui.habmin.internal.servlet.HABminApp] - Started HABmin servlet at /habmin
2017-02-13 17:25:58.812 [WARN ] [org.pcap4j.util.PropertiesLoader    ] - [org/pcap4j/pcap4j.properties] Could not get value by org.pcap4j.dlt.raw, use default value: null
2017-02-13 17:25:58.823 [INFO ] [panel.internal.HABPanelDashboardTile] - Started HABPanel at /habpanel
2017-02-13 17:26:01.732 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Connecting to serial port '/dev/ttyACM0'
2017-02-13 17:26:01.832 [INFO ] [ing.zwave.handler.ZWaveSerialHandler] - Serial port is initialized
2017-02-13 17:26:01.859 [INFO ] [mmandclass.ZWaveSecurityCommandClass] - Update networkKey
2017-02-13 17:26:01.901 [INFO ] [ve.internal.protocol.ZWaveController] - Starting ZWave controller
2017-02-13 17:26:01.903 [INFO ] [ve.internal.protocol.ZWaveController] - ZWave timeout is set to 5000ms. Soft reset is false.
2017-02-13 17:26:02.315 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-2 (99) in state RUNNABLE is at org.pcap4j.core.NativeMappings.pcap_findalldevs(NativeMappings.java:-2).
2017-02-13 17:26:02.316 [WARN ] [ome.core.thing.internal.ThingManager] - Initializing handler for thing 'amazondashbutton:dashbutton:50-f5-da-93-9f-67' takes more than 5000ms.

Hope this helps.

Kai · February 13, 2017, 5:56pm

This is perfect - just exactly what the changes were about: To correctly identify badly behaving handlers and blame them (and no one else anymore):

2017-02-13 17:26:02.315 [DEBUG] [arthome.core.common.SafeMethodCaller] - Timeout of 5000ms exceeded, thread safeCall-2 (99) in state RUNNABLE is at org.pcap4j.core.NativeMappings.pcap_findalldevs(NativeMappings.java:-2).
2017-02-13 17:26:02.316 [WARN ] [ome.core.thing.internal.ThingManager] - Initializing handler for thing 'amazondashbutton:dashbutton:50-f5-da-93-9f-67' takes more than 5000ms.

This is absolutely true: The handler takes more than 5s to initialize() because it does some fancy pcap stuff in there. Instead, it should schedule a job to do such long running activities asynchronously. @oliver Could you improve the implementation that way?