OH2: major bug with scheduled jobs?

The Freebox binding implements a scheduled job to poll state every 30 seconds.
After a certain time (can be several days), I think the job is no more executed !
Unfortunately there is no debug log in the job so I cannot be 100% sure. Is there a way with the console to check what are the scheduled jobs ?
I initially thought that if nothing was updated suddenly, it was because the connection was lost (session expired) but that is not the case because I can send commands from openHAB. What is not working is only polling the state through the scheduled job.
My feeling is that the job is just no more run.
How can I help to debug that ?

For information, here is how the job is scheduled in the binding:

    @Override
    public void initialize() {
        logger.debug("initializing Freebox Server handler.");
        if (authorize()) {
            updateStatus(ThingStatus.ONLINE);

            if (globalJob == null || globalJob.isCancelled()) {
                long polling_interval = getConfigAs(FreeboxServerConfiguration.class).refreshInterval;
                globalJob = scheduler.scheduleAtFixedRate(globalRunnable, 1, polling_interval, TimeUnit.SECONDS);
            }
        } else {
            updateStatus(ThingStatus.OFFLINE);
        }
    }

    private Runnable globalRunnable = new Runnable() {
        @Override
        public void run() {

            try {
                fetchSystemConfig();
                fetchLCDConfig();
                fetchWifiConfig();
                fetchxDslStatus();
                fetchConnectionStatus();
                fetchFtpConfig();
                fetchAirMediaConfig();
                fetchUPnPAVConfig();
                fetchSambaConfig();
                LanHostsConfig lanHostsConfiguration = fetchLanHostsConfig();

                // Trigger a new discovery of things
                for (FreeboxDataListener dataListener : dataListeners) {
                    dataListener.onDataFetched(getThing().getUID(), lanHostsConfiguration);
                }

            } catch (FreeboxException e) {
                logger.error(e.getMessage());
                updateStatus(ThingStatus.OFFLINE);
            }

        }
    };

I have now added a debug log at the beginning of each scheduled job. OH is restarted. I will wait until the problem occurs again and see if the jobs are still running. I let you know.

@Lolodomo, I have also seen this and have started added print statements this week to my bindings which rely on the shared executor service. I have not discovered anything obvious, but after some amount of time (hours, days) my jobs stop running as well. I did a thread dump as well as hooking up eclipse to my running instance, but don’t see anything immediately obvious on what is causing the lockup here.

Ok , very interesting.
We should declare an issue as soon as you or me has the proof of the jobs suddenly no more scheduled. It would be a critical issue in the core framework.
After half a day, my jobs are still running. I will be patient.
@Kai please be informed of this new coming critical bug.

Bugs that only occur after running for a few days sound like fun to debug :open_mouth:

If there are no bundle/handler restarts involved, it would actually mean that it is a JVM bug, since what is used for scheduling is simply Executors.newScheduledThreadPool. As it is fairly unlikely that there is a bug, we need to have a close look at anything that could influence the scheduling.

Afaik, if a job throws an exception, the scheduler automatically terminates the scheduling. This is imho the most likely cause here.

Ok, in my case, exceptions are catched but maybe not all.

In fact I should just add a generic catch exception I thing !

…or even a “catch(Throwable t)”.

I just submitted a change: https://github.com/openhab/openhab2-addons/pull/1115

@digitaldan: you should check how errors are catched in your scheduled jobs.

FTR: https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html#scheduleAtFixedRate(java.lang.Runnable,%20long,%20long,%20java.util.concurrent.TimeUnit):

If any execution of the task encounters an exception, subsequent executions are suppressed.

With my fix, I think it should be ok. I will have an error logged when the job throws an exception.
Will see what happens in the next following days.

I take a look how it is managed by the different bindings. In general, exceptions are catched (with Exception rather than Throwable). But for example, it seems to be not done by the netatmo binding, and only partially by the RFXCOM binding.

Thanks all, I will take a look, I assumed I was catching everything, but obviously am not :frowning: