Exception in thread "OH-eventexecutor-1"

  • Platform information:
    • Hardware: Pi4
    • OS: OpenHabian 3.0.2
    • openHAB version: 3.0.2

So today when I tried to toggle a switch and got ignored I went to look at the logs and found my openhab was stuck in some kind of “trance”. The web UI responded but nothing meaningful happened. Looking at my openhab log I found this moment in time when things got strange:

2021-06-18 04:01:45.590 [INFO ] [org.openhab.rule.NodeREDLog--       ] - S1 Motion Detected 2021-06-18T08:01:45.482Z
2021-06-18 05:32:03.937 [INFO ] [io.openhabcloud.internal.CloudClient] - Disconnected from the openHAB Cloud service (UUID = f2dcfd60-b4ea-4e15-ada0-10b5a42b8055, base URL = http://localhost:8080)
2021-06-18 05:33:01.337 [INFO ] [io.openhabcloud.internal.CloudClient] - Connected to the openHAB Cloud service (UUID = f2dcfd60-b4ea-4e15-ada0-10b5a42b8055, base URL = http://localhost:8080)
2021-06-18 06:50:01.380 [INFO ] [org.openhab.rule.31f37e7819         ] - vTimeOfDay: EARLY -> SCHOOL_PREP
2021-06-18 06:50:01.380 [INFO ] [org.openhab.rule.31f37e7819         ] - vTimeOfDay: EARLY -> SCHOOL_PREP
2021-06-18 06:51:00.177 [INFO ] [org.openhab.rule.31f37e7819         ] - vTimeOfDay: EARLY -> SCHOOL_PREP

That was strange because that rule shouldn’t do the same thing every minute. It seemed like Item vTimeOfDay wasn’t getting updates…

Then looking at my syslog I found this:

Jun 18 03:55:01 openHABianDevice CRON[18632]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 18 03:56:15 openHABianDevice systemd[1]: Stopping Network Time Synchronization...
Jun 18 03:56:15 openHABianDevice systemd[1]: systemd-timesyncd.service: Succeeded.
Jun 18 03:56:15 openHABianDevice systemd[1]: Stopped Network Time Synchronization.
Jun 18 03:56:15 openHABianDevice systemd[1]: Starting Network Time Synchronization...
Jun 18 03:56:15 openHABianDevice systemd[1]: Started Network Time Synchronization.
Jun 18 03:56:15 openHABianDevice systemd-timesyncd[18674]: Synchronized to time server for the first time 198.60.22.240:123 (0.debian.pool.ntp.org).
Jun 18 04:01:19 openHABianDevice systemd[1]: Stopping Network Time Synchronization...
Jun 18 04:01:20 openHABianDevice systemd[1]: systemd-timesyncd.service: Succeeded.
Jun 18 04:01:20 openHABianDevice systemd[1]: Stopped Network Time Synchronization.
Jun 18 04:01:20 openHABianDevice systemd[1]: Starting Network Time Synchronization...
Jun 18 04:01:20 openHABianDevice systemd[1]: Started Network Time Synchronization.
Jun 18 04:01:20 openHABianDevice systemd-timesyncd[18782]: Synchronized to time server for the first time 185.77.225.135:123 (0.debian.pool.ntp.org).
Jun 18 04:04:27 openHABianDevice karaf[667]: Exception in thread "OH-eventexecutor-1" Exception in thread "OH-eventexecutor-2" java.lang.NullPointerException
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:217)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.lang.Thread.run(Thread.java:834)
Jun 18 04:04:27 openHABianDevice karaf[667]: Exception in thread "OH-eventexecutor-3" java.lang.NullPointerException
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:214)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
Jun 18 04:04:27 openHABianDevice karaf[667]: Exception in thread "OH-eventexecutor-4" #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.lang.Thread.run(Thread.java:834)
Jun 18 04:04:27 openHABianDevice karaf[667]: java.lang.NullPointerException
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.dequeue(LinkedBlockingQueue.java:214)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1054)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1114)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
Jun 18 04:04:27 openHABianDevice karaf[667]: #011at java.base/java.lang.Thread.run(Thread.java:834)

My syslog was 4G big full of these messages.

Not nice. How can I trust it if it might go into wonderland in the middle of the night without a way for me to see this happened?

Any clues of what might have happened here and how do I fix it or collect more data to fix it?

I would say that this points to a bigger issue with your system, possibly a failing SD card. At least it seems openHAB is not the only thing being affected, so the cause most likely lies outside of it.

Well, the syslog is normal, until the karaf message. NTP sync every x minutes is a common thing. I left the other lines in there to show everything seemed normal until that point.

Karaf is a component of OpenHAB’s tech stack. In OpenHAB’s log there is nothing at 04:04:27, only the issue at 05:32:03 evidences something went wrong in OpenHAB.

Finally, I don’t run on an SD card, but on a SSD hard drive. In any case, both are too new to already have failed.

If this is a common problem in OpenHABian, does someone have a way to monitor it and restart the service? A bash script of some kind?

Some extra info I’ve found in my logs. kern.log has this:

Jun 18 04:09:35 openHABianDevice kernel: [322042.913439] cgroup: fork rejected by pids controller in /system.slice/openhab.service

Googleing around this seems to be related to reaching a limit of tasks or threads. That would be strange, as I don’t have many Things/Items and I don’t run many rules.

Some extra info from my system:

openhabian@openHABianDevice:/var/log $ sudo systemctl status user.slice
[sudo] password for openhabian: 
● user.slice - User and Session Slice
   Loaded: loaded (/lib/systemd/system/user.slice; static; vendor preset: enabled)
   Active: active since Thu 2019-02-14 05:11:58 EST; 2 years 4 months ago
     Docs: man:systemd.special(7)
    Tasks: 8
   CGroup: /user.slice
           └─user-1000.slice
             ├─session-849.scope
             │ ├─25345 sshd: openhabian [priv]
             │ ├─25364 sshd: openhabian@pts/0
             │ ├─25365 -bash
             │ ├─25945 sudo systemctl status user.slice
             │ ├─25967 systemctl status user.slice
             │ └─25969 pager
             └─user@1000.service
               └─init.scope
                 ├─25349 /lib/systemd/systemd --user
                 └─25350 (sd-pam)

Anything looking too small there?

Might be, even though it seems excessive to sync every 5 minutes, but why would the service be restarted every time?

That is indeed a good question. It doesn’t happen in another linux box I have, but, that’s not Pi so I don’t know what this distribution is like. Yours doesn’t do this, right?

No, it has been running constantly since the last reboot almost 3 months ago

● systemd-timesyncd.service - Network Time Synchronization
   Loaded: loaded (/lib/systemd/system/systemd-timesyncd.service; enabled; vendor preset:   Drop-In: /usr/lib/systemd/system/systemd-timesyncd.service.d
           └─disable-with-time-daemon.conf
   Active: active (running) since Wed 2021-03-31 19:46:07 CEST; 2 months 19 days ago

But if you suspect too many tasks is run, what does systemctl status openhab show? Here’s mine for reference

● openhab.service - openHAB - empowering the smart home
   Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-06-11 13:42:03 CEST; 1 weeks 1 days ago
     Docs: https://www.openhab.org/docs/
           https://community.openhab.org
 Main PID: 659 (java)                                                                         
Tasks: 229 (limit: 4915)
   Memory: 1.2G
   CGroup: /system.slice/openhab.service
           └─659 /usr/lib/jvm/zulu11-ca-amd64//bin/java -XX:-UsePerfData -Dopenhab.home=/u

Is your number of tasks close to the limit?

This is what it shows now, and no it doesn’t seem close to the limit:

openhabian@openHABianDevice:~ $ sudo systemctl status openhab
[sudo] password for openhabian: 
● openhab.service - openHAB - empowering the smart home
   Loaded: loaded (/lib/systemd/system/openhab.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2021-06-18 20:30:38 EDT; 22h ago
     Docs: https://www.openhab.org/docs/
           https://community.openhab.org
 Main PID: 8485 (java)
    Tasks: 188 (limit: 3978)
   CGroup: /system.slice/openhab.service
           └─8485 /usr/bin/java -XX:-UsePerfData -Dopenhab.home=/usr/share/openhab -Dopenhab.conf=/etc/openhab -Dope

Jun 18 20:30:38 openHABianDevice systemd[1]: Started openHAB - empowering the smart home.

This was the first time it happened, I’ve been using OpenHAB for about 6 months now, but one time is enough to want to understand why and prevent.

Well, then your best bet is probably to monitor this value and see if it increases significantly over time. If it does, it might be a binding creating a lot of threads for some reason, so you should start looking into that. Unfortunately, about the only way to do this is by removing bindings one at a time until the problem disappears, then you at least know which binding causes this so you can look into it more.

Please open a bug on github so this can be tracked. It looks like something is causing a NPE to happen in the event handler. This may be resolved in 3.1 but it would be good to find out.

Good idea. I’ve added a crude log through cron, but should collect data to give an indication of when problems start to happen, if it happens again.

* * * * * echo `date --rfc-3339=seconds`" "`systemctl status openhab | grep Tasks` >> /var/log/openhab/task-count.log

Done