OH3 unstable, crash after few days, high load and memory usage, KNX

Hi,

I’m using openhab for some years now. I’ve migrated to OH3 beginning of this year. This means also moving from file based configuration to UI configuration.
Unfortunatly my OH3 docker instance is crashing after a few days. The instance is using 100% cpu and also a lot of memory. The only thing that helps is restarting the image.

There is no certain point that openhab has high load from one second to another. It’s more a process over several hours.

I could not find the root cause and only have some observations:

The first thing I can find in the logs is a ConcurrentModificationException from the KNX binding:

2021-02-25 07:05:05.502 [DEBUG] [.internal.handler.DeviceThingHandler] - onGroupWrite Thing 'knx:device:bridge:CO2_Sensor_Wohnen' received a GroupValueWrite telegram from '1.0.41' for destination '5/0/7'
2021-02-25 07:05:05.502 [DEBUG] [.internal.handler.DeviceThingHandler] - onGroupWrite Thing 'knx:device:bridge:heizung1' received a GroupValueWrite telegram from '1.0.41' for destination '3/1/5'
2021-02-25 07:05:05.502 [DEBUG] [.internal.handler.DeviceThingHandler] - onGroupWrite Thing 'knx:device:bridge:CO2_Sensor_Wohnen' received a GroupValueWrite telegram from '1.0.41' for destination '5/0/8'
2021-02-25 07:05:05.502 [WARN ] [mmon.WrappedScheduledExecutorService] - Scheduled runnable ended with an exception: 
java.util.ConcurrentModificationException: null
	at java.util.HashMap$HashIterator.remove(HashMap.java:1507) ~[?:?]
	at java.util.Collection.removeIf(Collection.java:545) ~[?:?]
	at org.openhab.binding.knx.internal.handler.DeviceThingHandler.rememberRespondingSpec(DeviceThingHandler.java:223) ~[?:?]
	at org.openhab.binding.knx.internal.handler.DeviceThingHandler.lambda$11(DeviceThingHandler.java:366) ~[?:?]
	at org.openhab.binding.knx.internal.handler.DeviceThingHandler.withKNXType(DeviceThingHandler.java:148) ~[?:?]
	at org.openhab.binding.knx.internal.handler.DeviceThingHandler.onGroupWrite(DeviceThingHandler.java:349) ~[?:?]
	at org.openhab.binding.knx.internal.client.AbstractKNXClient$1.lambda$0(AbstractKNXClient.java:107) ~[?:?]
	at org.openhab.binding.knx.internal.client.AbstractKNXClient.lambda$8(AbstractKNXClient.java:257) ~[?:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]

As you can see the exception is happening if sth. is received at exactly the same time → concurrency problem.
Any idea how I can avoid this?

As soon as this exception pop ups also other plugins are now throwing exceptions:

2021-02-25 07:07:41.074 [INFO ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wanMaxUpstreamRate, getAction=GetCommonLinkProperties, dataType='ui4, parameter='null'}: java.util.concurrent.TimeoutException: Total timeout 2000 ms elapsed
2021-02-25 07:07:44.907 [INFO ] [ng.tr064.internal.soap.SOAPConnector] - Failed to get Tr064ChannelConfig{channelType=wanAccessType, getAction=GetCommonLinkProperties, dataType='string, parameter='null'}: java.util.concurrent.TimeoutException: Total timeout 2000 ms elapsed

or showing errors like the KM200 Binding:

2021-02-25 07:16:29.322 [WARN ] [00.internal.handler.KM200DataHandler] - Communication is not possible!

Now it takes up to 10h until openhab is really dead. During this team I see some TimeoutExceptions from TR064 plugin or some connection issues from KM200 binding and some strange behaviour from the KNX plugin like

2021-02-25 10:47:19.838 [WARN ] [ab.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.openhab.core.internal.items.ItemUpdater@5f433dc8' takes more than 5000ms.

Do you have any ideas what I can try next or how I can dig deeper into this issue? The main problem is that it takes several days until this issue is coming up and openhab crashes. So debugging is not easy.

Best regards
ole

What version are you running?

there was a nasty bug fixed that had to do with rules
look at this thread, it’s long, link is toward the end

Here is link to merged issue

I believe you may have to run a snapshot version to pick up the fix

Hi,

thanks for your response! I’m running 3.0.1.
I don’t want to run a snapshot on my “production” environment. Will there be a 3.0.2 including this fix?

Regards
ole

Most probably. The CME shown in the KNX binding is a bug, though.

1 Like

I’ve observed the memory consumption and I can confirm there is a memory leak which is probably be fixed as it was mentioned by @Andrew_Rowe

Due to the fact I don’t want to run a snapshot I’ll wait for 3.0.2 and will restart the container every few days.

Solved with 3.1.0.M3

1 Like