Polling Thread dies after some time - why!?

Hi binding developers,

I hope I find someone having the same issue and may have solved it. I’m writing a binding. The binding starts a polling thread in the initialize method of the handler like this:

Runnable runnable = new Runnable() {
                @Override
                public void run() {
                    try {
                        refreshMowerInfo();
                        updateStatus(ThingStatus.ONLINE);
                    } catch (RobonectCommunicationException rce) {
                        logger.error("Failed to communicate with the mower. Taking it offline.", rce);
                        updateState(CHANNEL_STATUS, new DecimalType(MowerStatus.OFFLINE.getStatusCode()));
                        updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.COMMUNICATION_ERROR, rce.getMessage());
                    } catch (Exception e) {
                        logger.error("Unexpected exception. Setting thing offline", e);
                        updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.NONE, e.getMessage());
                    } catch (Throwable t){
                        // Should not do... however, thread seems to die unexpected soemtimes... try to debug
                        logger.error("Unexpected throwable. Setting thing offline", t);
                        updateStatus(ThingStatus.OFFLINE, ThingStatusDetail.NONE, t.getMessage());
                        
                    }
                }
            };
            int pollInterval = robonectConfig.getPollInterval() > 0 ? robonectConfig.getPollInterval() : 30;
            pollingJob = scheduler.scheduleAtFixedRate(runnable, 0, pollInterval, TimeUnit.SECONDS);

After a while (usually around 3 days), the binding is not updating the channels anymore in the pollInterval. No updates at all. In the log I always see the following

2017-05-31 20:02:05.708 [DEBUG] [nhab.binding.robonect.RobonectClient] - send HTTP GET to: http://192.168.1.210/json?cmd=status 
2017-05-31 20:03:43.017 [ERROR] [lfe.hap.impl.http.impl.BinaryHandler] - Exception in binary handler
java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_65]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_65]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_65]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.8.0_65]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)[:1.8.0_65]
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:880)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[187:org.openhab.io.homekit:2.1.0.2017052823
06]
        at java.lang.Thread.run(Thread.java:745)[:1.8.0_65]
2017-05-31 20:03:43.024 [ERROR] [.hap.impl.http.impl.AccessoryHandler] - Exception caught in web handler
java.io.IOException: Connection timed out
        at sun.nio.ch.FileDispatcherImpl.read0(Native Method)[:1.8.0_65]
        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)[:1.8.0_65]
        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)[:1.8.0_65]
        at sun.nio.ch.IOUtil.read(IOUtil.java:192)[:1.8.0_65]
        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)[:1.8.0_65]
        at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:447)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:880)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:112)[187:org.openhab.io.homekit:2.1.0.201705282306]
        at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137)[187:org.openhab.io.homekit:2.1.0.2017052823
06]
        at java.lang.Thread.run(Thread.java:745)[:1.8.0_65]

The first line (DEBUG) is within my refreshMowerInfo() method. So it is definitely started. But the HTTP Request which is done after this log, obviously never retrieves a response, nor an exception, as these would be logged in DEBUG respectively ERROR. The errors appearing are from the home kit addon…
After this, just a restart of my binding helps…

  • Why do I not get any ERROR log from my own binding ( at least from the catch(Throawable t)? Actually this would mean the thread rather blocks while doing the http request with jetty…
  • Homekit is not related at all with my binding. None of the items associated with my bindings channel are exposed via home kit. But it seem when home kit gets connection issues, my polling thread is blocked. Maybe this is rather a sign of a general network issue at some time and therefore the issues appear always together in the log!?
  • I’m using Jetty HTTP Client in the method refreshMowerInfo. Could it be that jetty is blocked because home kit may also use some jetty in the background?

The usage of jetty looks as follow

private String sendCommand(Command command) {
        ContentResponse response = null;
        try {
            if (logger.isDebugEnabled()) {
                logger.debug("send HTTP GET to: {} ", command.toCommandURL(baseUrl));
            }
            response = httpClient.GET(command.toCommandURL(baseUrl));
        } catch (InterruptedException ie) {
            Thread.currentThread().interrupt();
            logger.error("InterruptedException while trying to send command.", ie);           
        } catch (ExecutionException | TimeoutException e) {
            throw new RobonectCommunicationException("Could not send command " + command.toCommandURL(baseUrl), e);
        }
        String responseString = response.getContentAsString();
        if (logger.isDebugEnabled()) {
            logger.debug("Response body was: {} ", responseString);
        }
        return responseString;
    }

Any help in further tracking down the issue would be highly appreciated! Thanx

At a first glimpse, you are using an outdated JAVA version (1.8.0_65), please update to the latest and try again.

I agree that I should update Java as a good practice, however, I doubt this is the root cause.

After being sure the code stucks in the jetty client a checked the threads via karat console. And indeed this is the threads stack

"ESH-thingHandler-3509" Id=25337 in WAITING on lock=java.util.concurrent.CountDownLatch$Sync@12a838b
    at sun.misc.Unsafe.park(Native Method)
    at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
    at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
    at java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
    at org.eclipse.jetty.client.util.FutureResponseListener.get(FutureResponseListener.java:100)
    at org.eclipse.jetty.client.HttpRequest.send(HttpRequest.java:653)
    at org.eclipse.jetty.client.HttpClient.GET(HttpClient.java:334)
    at org.eclipse.jetty.client.HttpClient.GET(HttpClient.java:322)
    at org.openhab.binding.robonect.RobonectClient.sendCommand(RobonectClient.java:193)
    at org.openhab.binding.robonect.RobonectClient.getMowerInfo(RobonectClient.java:139)
    at org.openhab.binding.robonect.handler.RobonectHandler.refreshMowerInfo(RobonectHandler.java:250)
    at org.openhab.binding.robonect.handler.RobonectHandler.access$0(RobonectHandler.java:248)
    at org.openhab.binding.robonect.handler.RobonectHandler$1.run(RobonectHandler.java:345)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)

So the thread is waiting for a lock…

Why don’t you use the ESH httpUtil (org.eclipse.smarthome.io.net.http.HttpUtil).
I use it in the WeMo binding and don’t face those issues.

I just followed the binding developer guide which recommends jetty http client. But obviously after googling I found many concurrency issues with it, plus it seems that OH also includes quite an outdated version.

Thanks for the hint with esh HttpUtil. I have to check on how to use it then.

AFAIK it uses jetty http client …

true, just checked the code. In this case I do not see much sense in switching. However, I changed the usage of jetty to include a 30 second timeout. Maybe this will help. Will see. New code looks like this

private String sendCommand(Command command) {
        ContentResponse response = null;
        try {
            if (logger.isDebugEnabled()) {
                logger.debug("send HTTP GET to: {} ", command.toCommandURL(baseUrl));
            }
            Request request = httpClient.newRequest(command.toCommandURL(baseUrl)).method(HttpMethod.GET).timeout(30000, TimeUnit.MILLISECONDS);
            response = request.send();
        } catch (InterruptedException ie) {
            Thread.currentThread().interrupt();
            logger.error("InterruptedException while trying to send command.", ie);           
        } catch (ExecutionException | TimeoutException e) {
            throw new RobonectCommunicationException("Could not send command " + command.toCommandURL(baseUrl), e);
        }
        String responseString = response.getContentAsString();
        if (logger.isDebugEnabled()) {
            logger.debug("Response body was: {} ", responseString);
        }
        return responseString;
    }

finally setting a timeout solved the issue.
Maybe this helps somebody else. seems like the default timeout is not used in the case one uses httpClient.GET(...). By having now a timeout defined the request is interrupted after the timeout and the thread is triggered again on next planned schedule.