Openhab Stop working

Hello,

I’m running openhab 5 on openhabian on a new rpi5.

Is the Second time in 2 days that openhab Stop working completely. I can connect the uis and olso sitemap, but no item tespond

Here is the log.

‘19:43.452[WARN] [org.openhab.core.internal.events.EventHandler] - Dispatching event to subscriber ‘org.openhab.core.io.monitor.internal.EventLogger@6863fd22’ takes more than 5000ms.
19:03:44.351[INFO] [org.openhab.core.io.websocket.log.LogWebSocket] - WebSocket error: java.util.concurrent.TimeoutException: Idle timeout expired: 10000/10000 ms
19:06:19.003[WARN] [org.openhab.core.internal.events.EventHandler] - The queue for a subscriber of type ‘class org.openhab.core.io.monitor.internal.EventLogger’ exceeds 5000 elements. System may be unstable.
19:07:51.681[INFO] [org.openhab.core.io.websocket.log.LogWebSocket] - WebSocket error:
21:47:19.348[WARN] [org.openhab.core.io.rest.sse.internal.SseItemStatesEventBuilder] - Attempting to send a state update of an item which doesn’t exist: Widget_Settings_Bayesian_Sensor_Details
21:47:21.678[WARN] [org.openhab.core.io.rest.sse.internal.SseItemStatesEventBuilder] - Attempting to send a state update of an item which doesn’t exist: Widget_Settings_Bayesian_Sensor_Details
21:47:22.555[WARN] [org.openhab.core.io.rest.sse.internal.SseItemStatesEventBuilder] - Attempting to send a state update of an item which doesn’t exist: Widget_Settings_Bayesian_Sensor_Details’

The “attempting to send a state update to an item which doesn’t exist” comes from MainUI having an Item referenced in a widget on the screen that doesn’t exist. That’s not related to the other errors.

As for the other errors, I’m not sure what could be the cause for these errors. Maybe one of the thread deadlocks that have been discovered recently are causing problems.

You can run ttopon the karaf console to see if youj have lots of threads stuck in a blocked or waiting state. See High CPU Usage since upgrade to 5.0.1 - #4 by rlkoshak. Beyond that I’d start to disable add-ons until the problem goes away to help pinpoint where the problem lies.

Thank you,

sometime problem seek is frustrating but I will test the TTOP bash in the console and let you know.

Thank for your hint!

I tried out the hint - but I do not get anywhere with that.

ttop points to the thread OH-OSGiEventManager:

Thread 464 OH-OSGiEventManager BLOCKED
Stacktrace:
org.openhab.core.io.websocket.log.LogWebSocket.sendMessage line: 228
org.openhab.core.io.websocket.log.LogWebSocket.logged line: 258
org.ops4j.pax.logging.spi.support.LogReaderServiceImpl.fire line: 105
org.ops4j.pax.logging.spi.support.LogReaderServiceImpl.fireEvent line: 83
org.ops4j.pax.logging.log4j2.internal.PaxLoggingServiceImpl.handleEvents line: 414
org.ops4j.pax.logging.log4j2.internal.PaxLoggerImpl.doLog0 line: 1134
org.ops4j.pax.logging.log4j2.internal.PaxLoggerImpl.doLog line: 1099
org.ops4j.pax.logging.log4j2.internal.PaxLoggerImpl.warn line: 465
org.ops4j.pax.logging.internal.TrackingLogger.warn line: 267
org.ops4j.pax.logging.slf4j.Slf4jLogger.warn line: 937
org.openhab.core.internal.events.EventHandler.dispatchEvent line: 160
org.openhab.core.internal.events.EventHandler.handleEvent line: 122
org.openhab.core.internal.events.EventHandler.handleEvent line: 95
org.openhab.core.internal.events.ThreadedEventHandler.lambda$0 line: 66
org.openhab.core.internal.events.ThreadedEventHandler$$Lambda/0x0000000100ae5fc8.run line: -1
java.lang.Thread.runWith line: -1
java.lang.Thread.run line: -1

all threads are blocked by this one:

Thread 34241 OH-LogWebSocket-24 WAITING
Stacktrace:
jdk.internal.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.park line: -1
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block line: -1
java.util.concurrent.ForkJoinPool.unmanagedBlock line: -1
java.util.concurrent.ForkJoinPool.managedBlock line: -1
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await line: -1
org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block line: 241
org.eclipse.jetty.websocket.common.BlockingWriteCallback$WriteBlocker.block line: 89
org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.blockingWrite line: 110
org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendString line: 400
org.openhab.core.io.websocket.log.LogWebSocket.sendMessage line: 232
org.openhab.core.io.websocket.log.LogWebSocket.flush line: 299
org.openhab.core.io.websocket.log.LogWebSocket$$Lambda/0x0000000101ef8000.run line: -1
java.util.concurrent.Executors$RunnableAdapter.call line: -1
java.util.concurrent.FutureTask.run line: -1
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run line: -1
java.util.concurrent.ThreadPoolExecutor.runWorker line: -1
java.util.concurrent.ThreadPoolExecutor$Worker.run line: -1
java.lang.Thread.runWith line: -1
java.lang.Thread.run line: -1

This does not help me to nail down to a rule or similar.
Anyone with an idea beside rebooting the system every few hours?

running openHAB 5.0.3 - Release Build on a Raspberry Pi 5 Model B Rev 1.0
with NVM SSD (512GB, with 13GB used)

There’s clearly a bug in LogWebSocket:

    private synchronized void sendMessage(String message) throws IOException {
        RemoteEndpoint remoteEndpoint = this.remoteEndpoint;
        if (remoteEndpoint == null) {
            return;
        }
        remoteEndpoint.sendString(message);
    }

remoteEndpoint.sendString() is a blocking operation that won’t return until the recipient has received the message (which could be a browser that has been closed, for example), and the operation is done while holding a lock (ref. synchronized).

Any other thread that attempts to acquire this lock will end up waiting for the external recipient to receive the message, which might mean waiting for a timeout.

I would think this should bite many, my guess is that not that many leave the MainUI log viewer open, so that the impact has been limited so far?

A bug should be filed for this.

1 Like

Thanks for that find.
Does that mean I have to restart OH after using the LogViewer to be sure?

I can’t really say, but I would use the LogViewer very sparingly until the bug has been fixed.

I’ve created a GitHub issue to track this bug:

I have also created what I believe to be a fix for the issue, but so far, it hasn’t gotten any attention/review: