Issue of the topic:
sporadic error in log with full outage of openhab
looks like there might be also an outOfMemory connected (sometimes in log, too)
message always: “Dispatching event to subscriber … takes more than 5000ms.”
I see a connection to using Visual Studio Code with Openhab Extension.
this seems to start most of the time when using Visual Code with Extension and REST API and get live Item state (or something like this)
It is not really reproducible, but looks like, if this happens, the live item state does stop working, too.
trying to "service openhab restart " is not really working anymore, I need to “kill -9” the java thread to get back to normal operation.
This behavior exists with last versions of OH3 and still exists with OH4.
Changed from Windows 10 Visual Code to MacOS with Visual Code without any change.
Need help to narrow this down, please.
If logs where generated please post these here using code fences:
[WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@36b870' takes more than 5000ms.
2023-10-01 08:50:40.928 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.io.monitor.internal.EventLogger@16baa09' takes more than 5000ms.
2023-10-01 08:50:42.326 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.thing.internal.CommunicationManager@36b870' takes more than 5000ms.
While it may not solve root cause of you issue, it might help with early termination of openHAB when first symptoms of failure occur.
Uncomment or add in config.properties following line: karaf.systemd.enabled=true and enable systemd watchog for jvm in openhab.service as described here: systemd for Administrators, Part XV (WatchdogSec=30s In [Service] section).
This way restart of OH will be conducted when CPU gets too busy to react on time. You might even shorter watchog time to lower value.
The actual issue you see is definitely a memory issue. Usually when JVM gets in such situation it is so busy reclaiming memory (garbage collecting) that it does not process incoming signals at all. In order to heal issue you need to find root cause - it might be binding, custom rule (especially ones with timers and improper resource management). Most accurate way is - take a memory dump of running process, before it gets stuck, and browse it with tool such as jprofiler, yourkit, eclipse memory analyzer or jvisualvm. It should help you find something which occupy more memory than it should.
Thanks for the quick response. I will have to dig into your info, as I am not familiar with it. But I like to learn, so it may take some days, but I will come back after I tried it. If may be quicker, if the error is more … present. :o)
Again, many thanks for the path to follow.
Best regards,
Christian
Ok, had some extra time to spend into this:
First: karaf.systemd.enable=true was uncommented in /var/lib/openhab/etc
then:
I added the WatchdogSec=30s into [Service] section in /lib/systemd/system/openhab/openhab.service
I am not sure, if I need to restart or retrigger something, so now I will see, if I am finished here.
Thanks. Ok, updated this and did a daemon-reload.
Now, I am testing the memory dump.
Tried sudo kill -3 PID, but nothing happens. jstack is not installed with OpenJDK 17 runtime, so … going through my options…
FYI - there are several bindings which may cause OOM errors. Please have a look: OH4 runs out of memory
You may simply disable all bindings and enable them one by one to see which causes issue. It will be simpler than getting memory dump and analyzing it.
Yes, agreed, and again thank you for time! It is very appreciated.
My problem here is: it is not reproducible. No time/duration, no special action.
I only see a connection to “when I am doing my programming with Visual Code” (on MacMini with FTP to my Raspberry). I am pretty sure, it has something to do with the connections the “Openhab Extension” is using to do fancy stuff in Visual Code (Hover-Info and such things). Observation: I log into my ssh connection to watch the openhab.log and in the same time click into Visual code to search for the code area my next change/enhancement needs to be placed, then I see the bad loggings and most of the time, they just started…
So, disabling the bindings sounds good, but as it is not reproducible and I guess it is no binding… :o)
So, if there is an additional idea in this direction, that might help also…
I am watching this behavior a pretty long time not, changing some points in this time: Using Visual Code on MacOS (coming from Win10), Did several updates in OpenHab: coming from 3.(? unsure when it starts), now 4.0.3.
Of course I might be “on the wooden path” (german proverb ) or barking up the wrong tree with my observations by now. So every idea is very welcome, to help me track it down faster.
There is an issue with the watchdog: openHAB is not starting because the watchdog seems to kill the process before it is up and running the watchdog code.
I changed it from 30s to 360s but it still restarts, so I disabled it for now.
Check the other thread Lukasz posted above, they are making progress narrowing down the problem. Tell us what bindings you have installed please. Are you using the Shelly binding?
I had this issue unfixed until last update. Now memory is stable again on my side. Still not sure about the cause. But looks like it was no add-on, as i did not change my configuration, but updated to last Version of OH.