Hi,
I’m using openhab for some years now. I’ve migrated to OH3 beginning of this year. This means also moving from file based configuration to UI configuration.
Unfortunatly my OH3 docker instance is crashing after a few days. The instance is using 100% cpu and also a lot of memory. The only thing that helps is restarting the image.
There is no certain point that openhab has high load from one second to another. It’s more a process over several hours.
I could not find the root cause and only have some observations:
The first thing I can find in the logs is a ConcurrentModificationException from the KNX binding:
2021-02-25 07:05:05.502 [DEBUG] [.internal.handler.DeviceThingHandler] - onGroupWrite Thing 'knx:device:bridge:CO2_Sensor_Wohnen' received a GroupValueWrite telegram from '1.0.41' for destination '5/0/7'
2021-02-25 07:05:05.502 [DEBUG] [.internal.handler.DeviceThingHandler] - onGroupWrite Thing 'knx:device:bridge:heizung1' received a GroupValueWrite telegram from '1.0.41' for destination '3/1/5'
2021-02-25 07:05:05.502 [DEBUG] [.internal.handler.DeviceThingHandler] - onGroupWrite Thing 'knx:device:bridge:CO2_Sensor_Wohnen' received a GroupValueWrite telegram from '1.0.41' for destination '5/0/8'
2021-02-25 07:05:05.502 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception:
java.util.ConcurrentModificationException: null
at java.util.HashMap$HashIterator.remove(HashMap.java:1507) ~[?:?]
at java.util.Collection.removeIf(Collection.java:545) ~[?:?]
at org.openhab.binding.knx.internal.handler.DeviceThingHandler.rememberRespondingSpec(DeviceThingHandler.java:223) ~[?:?]
at org.openhab.binding.knx.internal.handler.DeviceThingHandler.lambda$11(DeviceThingHandler.java:366) ~[?:?]
at org.openhab.binding.knx.internal.handler.DeviceThingHandler.withKNXType(DeviceThingHandler.java:148) ~[?:?]
at org.openhab.binding.knx.internal.handler.DeviceThingHandler.onGroupWrite(DeviceThingHandler.java:349) ~[?:?]
at org.openhab.binding.knx.internal.client.AbstractKNXClient$1.lambda$0(AbstractKNXClient.java:107) ~[?:?]
at org.openhab.binding.knx.internal.client.AbstractKNXClient.lambda$8(AbstractKNXClient.java:257) ~[?:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
As you can see the exception is happening if sth. is received at exactly the same time → concurrency problem.
Any idea how I can avoid this?
As soon as this exception pop ups also other plugins are now throwing exceptions:
2021-02-25 07:07:41.074 [INFO ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wanMaxUpstreamRate, getAction=GetCommonLinkProperties, dataType='ui4, parameter='null'}: java.util.concurrent.TimeoutException: Total timeout 2000 ms elapsed
2021-02-25 07:07:44.907 [INFO ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wanAccessType, getAction=GetCommonLinkProperties, dataType='string, parameter='null'}: java.util.concurrent.TimeoutException: Total timeout 2000 ms elapsed
or showing errors like the KM200 Binding:
2021-02-25 07:16:29.322 [WARN ] [00.internal.handler.KM200DataHandler] - Communication is not possible!
Now it takes up to 10h until openhab is really dead. During this team I see some TimeoutExceptions from TR064 plugin or some connection issues from KM200 binding and some strange behaviour from the KNX plugin like
2021-02-25 10:47:19.838 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@5f433dc8' takes more than 5000ms.
Do you have any ideas what I can try next or how I can dig deeper into this issue? The main problem is that it takes several days until this issue is coming up and openhab crashes. So debugging is not easy.
Best regards
ole