Openhab2 java cpu 100%

milty456 · January 1, 2017, 6:22am

Version: 0.9.0.201612202037
I moved back to this version as the version a few days later broke the ability for the controller to be recognized on the port.
Anyway, I’ve noticed a few times that the openhab user will spike the cpu over 100% and basically render the system useless.

Any idea why this would happen with java?

lsafelix75 · January 2, 2017, 3:15am

It is normal during startup but stabilise after that. You may try to clear cache and tmp in userdata folder.

Sabre66 · February 8, 2018, 9:13pm

I also have 100% + java usage. This does not go away after startup. It can go on for a couple days, and eventually my Pi will cease and i have to reboot. A backup from >3 mos ago works fine. but recent 2.3 OH versions have this.

Any other ideas on how to address this? Thanks

idkpmiller · February 10, 2018, 1:18am

I too am struggling with high load often on my RPi3 install typically this is when I am modifying rules and OH starts reverting to running ‘system started’ rule-sets.

This used to take around ten minutes to settle down but more recently it is over half an hour, possible 45 minutes; during the last 15 minutes OH will respond back positively to an external command change from say hue devices or Alexa but may not action the rules for that command change trigger for another 15 or 20 minutes.

I also see that any CPU load above or close to 1.0 seems to cause lag in rule execution and the Basic UI, is OH2 single threaded by chance?

I too have tried to see what is using the CPU and as top and htop only identify it as java, which I presume means its OH2 as I have nothing else running on this Pi. I cannot seem to get the next level of granularity to troubleshoot further, does someone know how to work out if its something specific in my OH2 configuration/setup that is using too much processing cycles?

I am considering moving to a PC arch platform but if its single threaded In may reassess my options.

Regards

Paul

Basem_AlSaeed · February 10, 2018, 2:19am

yes i have the same issue too, i am running Raspbian and the latest build of OH2, i update OH2 to latest build every few days, and i started to notice this high utilization few weeks ago

Benjy · February 10, 2018, 2:58am

To see if you all are using the same bindings, can you list which ones are installed?

Basem_AlSaeed · February 11, 2018, 1:29am

mine are:
Exec
HTTP
Kodi
Network
NTP
Plex
Samsung TV
SystemInfo
MQTT
OwnTracks
Weather
XBMC

Sabre66 · February 13, 2018, 6:23pm

I have started from scratch and am slowing adding back bindings, keeping an eye on what may make it spike. So far after clean build no spiking.

I had
Weather Underground
Russound
NTP
Network
Chromecast
iCloud
some other i have to try and remember

LukaNoah · February 24, 2018, 7:10pm

I have the same issue with the actual snapshot of oh2.3. I disable the Basic UI then I have cpu workloads less than 100%.

OMR · February 24, 2018, 7:22pm

Some time ago this was an issue if a Basic UI tab was open in a Chrome browser during OH restart.

Could that be the case here?

_tom · February 26, 2018, 8:58pm

I have a similar issue since a few days. The java process uses ~200% CPU with 5GB reserved RAM and 13GB virtual RAM. I am running on the latest nightly on a 16.04 Ubuntu LTS machine. Its an Intel NUC with i5 processor.

Before this issue openhab induced ~1% CPU load. Now it works for some hours (don’t know the exact timespan) and then starts to act crazy.

Latest log messages:

2018-02-26 20:00:08.487 [WARN ] [l.serialmessage.SendDataMessageClass] - NODE 2: Already processed another send data request for this callback Id, ignoring.
2018-02-26 20:00:44.731 [WARN ] [l.serialmessage.SendDataMessageClass] - NODE 2: Already processed another send data request for this callback Id, ignoring.
2018-02-26 20:00:58.897 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - Receive queue TIMEOUT:
2018-02-26 20:00:37.590 [WARN ] [com.zaxxer.hikari.pool.HikariPool   ] - 1m5s724ms902Î¼s973ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2018-02-26 20:01:13.188 [ERROR] [io.gcal.internal.GCalEventDownloader] - Event fetch failed: Remote host closed connection during handshake
2018-02-26 20:05:24.400 [WARN ] [me.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.eclipse.smarthome.io.monitor.internal.EventLogger@e15ab5e' takes more than 5000ms.
2018-02-26 20:13:50.751 [WARN ] [me.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.eclipse.smarthome.io.rest.sse.internal.listeners.SseEventSubscriber@4b1724f8' takes more than 5000ms.
2018-02-26 20:31:05.797 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1103d390 in HttpClient@521526855{STARTED,8<=8<=200,i=1,q=0}
2018-02-26 20:35:20.103 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1103d390 in HttpClient@521526855{STARTED,8<=8<=200,i=2,q=0}
2018-02-26 20:34:37.023 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1103d390 in HttpClient@521526855{STARTED,8<=8<=200,i=1,q=0}

Needless to say, none of the Webinterfaces is working, neither is the SSH console. I can’t even kill the java process via

sudo killall -23 java

Do you have any idea what is going on here?

My bindings:

binding = astro,lifx,ntp,systeminfo,zwave,weather1,exec,market:binding-3670618,market:binding-3499123

I didn’t change anything in my configuration since weeks. The market bindings are added using this script.

wborn · March 4, 2018, 11:20am

@_tom What openHAB version do you use?

Version 2.2.0 addressed all my CPU overload issues which were:

Browsers that get disconnected overload Basic UI while trying to reconnect (ESH PR #3449)
LIFX communication threads ending up in endless loops (ESH issue #4665)

If you’re already on that version you can use the Console see how much CPU time is consumed per thread with: threads --list

It’ll print something like:


Id     │ Name                                                                                         │ State         │ CPU time │ Usr time
───────┼──────────────────────────────────────────────────────────────────────────────────────────────┼───────────────┼──────────┼─────────
1      │ main                                                                                         │ WAITING       │ 2932     │ 2790
2      │ Reference Handler                                                                            │ WAITING       │ 6022     │ 3560
3      │ Finalizer                                                                                    │ WAITING       │ 18524    │ 13070
4      │ Signal Dispatcher                                                                            │ RUNNABLE      │ 0        │ 0
8      │ Active Thread: Equinox Container: 3846a7c1-4f25-419f-bb6a-16aee382c1b4                       │ TIMED_WAITING │ 790218   │ 619640
10     │ Framework Event Dispatcher: org.eclipse.osgi.internal.framework.EquinoxEventPublisher@569dba │ WAITING       │ 98       │ 90
11     │ Start Level: Equinox Container: 3846a7c1-4f25-419f-bb6a-16aee382c1b4                         │ WAITING       │ 19287    │ 18310
12     │ Karaf Lock Monitor Thread                                                                    │ TIMED_WAITING │ 1430303  │ 861960
13     │ Karaf Shutdown Socket Thread                                                                 │ RUNNABLE      │ 1        │ 0
25     │ CM Configuration Updater                                                                     │ WAITING       │ 1459     │ 1400
26     │ CM Event Dispatcher                                                                          │ WAITING       │ 0        │ 0
27     │ fileinstall-/openhab/userdata/etc                                                            │ TIMED_WAITING │ 1465415  │ 1259490
28     │ Thread-11                                                                                    │ RUNNABLE      │ 2        │ 0
31     │ Thread-14                                                                                    │ WAITING       │ 3630774  │ 3324000
...

The “CPU time” and “Usr time” columns at the right should give an indication how much CPU time a thread has consumed. The higher the number, the more CPU time has been consumed by a thread.

When you’ve found a thread that has gone amok you can acquire stack traces to pin point the code location with either:

threads
Listing stack traces for all threads.
threads <id>
Listing stack trace for a certain thread, where you replace <id> with the id of the thread.

The stack traces show which instructions a thread is executing. It will print the exact file and line number, e.g. something like:

"CM Configuration Updater" Id=25 in WAITING on lock=java.util.LinkedList@2bf61c
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:83)
    at java.lang.Thread.run(Thread.java:748)

"CM Event Dispatcher" Id=26 in WAITING on lock=java.util.LinkedList@1b1975c
    at java.lang.Object.wait(Native Method)
    at java.lang.Object.wait(Object.java:502)
    at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:83)
    at java.lang.Thread.run(Thread.java:748)

"fileinstall-/openhab/addons" Id=38 in TIMED_WAITING on lock=org.apache.felix.fileinstall.internal.DirectoryWatcher@ea6bf5
    at java.lang.Object.wait(Native Method)
    at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316)

If you can’t access the Console because openHAB no longer responds (due to the high CPU usage), you could create a cronjob that periodically saves all this thread info to a file.

E.g. by executing a command:

/usr/share/openhab2/runtime/bin/client -p habopen "threads --list && threads" > /var/log/openhab2/thread-info-$(date +"%Y-%m-%d_%H-%M-%S")

Laurens · March 13, 2018, 11:11am

I’m also seeing higher CPU load and I can’t pin point the source (version 2.2.0-1).
https://pastebin.com/hYcmVfKP

wborn · March 14, 2018, 5:43pm

@Laurens This is the Top 10 of your threads sorted descending on CPU time descending with an extra computed Total CPU time % column (using a spreadsheet):

Id	Name	CPU time	Total CPU time %
86	features-1-thread-1	5119732	28.1%
153	HTTP Refresh Service	4918099	27.0%
447	pool-60-thread-1	3439530	18.9%
266	ESH-persist-Monitor-1	1150952	6.3%
38	Thread-14	512823	2.8%
37	Thread-13	504785	2.8%
256	pool-47-thread-1	338963	1.9%
178	upnp-main-3	232411	1.3%
35378	ESH-thingHandler-12	207675	1.1%
177	upnp-main-2	181246	1.0%

The features-1-thread-1 seems to be a Karaf thread. It’ll probably be used for starting and installing/uninstalling openHAB bundles and bindings. Did you start from scratch or install/uninstall any bindings? How long was openHAB running when you created the thread list? It may be a Karaf issue when this thread remains busy and its CPU time keeps increasing.

The HTTP Refresh Service thread seems to be used by the HTTP binding for refreshing the state of your items. So it will be more busy the more items you have and the more frequent these are refreshed.

It’s not really clear what the pool-60-thread-1 is used for. Maybe you can create a stack trace for it and see what code it is executing?

This is my thread list of an openHAB 2.2.0 instance running without CPU issues for about 2 months now.

Id	Name	CPU time	Total CPU time %
213	Thread-74	49340842	16.4%
143	ESH-thingHandler-1	36876063	12.3%
151	ESH-thingHandler-4	36601637	12.2%
150	ESH-thingHandler-3	36326230	12.1%
152	ESH-thingHandler-5	35857121	11.9%
149	ESH-thingHandler-2	35318407	11.7%
215	Plugwise MessageProcessorThread	12027739	4.0%
38	Thread-14	10925330	3.6%
37	Thread-13	10898461	3.6%
214	Plugwise MessageSenderThread	3436501	1.1%

The stacktrace of Thread-74 contains “gnu.io.RXTXPort” which means it is used for serial communications. In my setup that means it is used for communicating with my Plugwise Stick via USB. The thing handler threads are mostly busy with polling state for my other openHAB bindings (TCP and UDP communications in my setup).

Laurens · March 15, 2018, 10:35am

Thanks for the research! My CPU is actually not going to 100%, but more hanging to 12% to 35% in idle modus. It’s runnning on an Atom with 8 cores. It was already running for days at that time.

I have 1 HTTP URL on 5 minutes, 1 HTTP URL on 2 minutes and 1 HTTP URL on 6 minutes. All set with HTTP caching function.

How can I do a stacktrace for a thread (threads 86 and 447 in my case) from karaf console?

I have 119 rules and 225 items.

Installed bindings:

197 │ Active   │  80 │ 2.0.0.201708311514     │ resol Binding
208 │ Active   │  80 │ 0.10.0.b1              │ Astro Binding
216 │ Active   │  80 │ 1.11.0                 │ openHAB Mail Action
217 │ Active   │  80 │ 1.11.0                 │ openHAB Pushover Action
218 │ Active   │  80 │ 1.11.0                 │ openHAB Denon Binding
219 │ Active   │  80 │ 2.2.0                  │ Exec Binding
220 │ Active   │  80 │ 1.11.0                 │ openHAB HTTP Binding
221 │ Active   │  80 │ 1.11.0                 │ openHAB KNX Binding
222 │ Active   │  80 │ 2.2.0                  │ Kodi Binding
223 │ Active   │  80 │ 2.2.0                  │ Network Binding
224 │ Active   │  80 │ 1.11.0                 │ openHAB SNMP Binding
225 │ Active   │  80 │ 1.11.0                 │ openHAB WoL Binding
226 │ Active   │  80 │ 2.2.0                  │ ZWave Binding
227 │ Active   │  80 │ 1.11.0                 │ openHAB InfluxDB Persistence bundle
228 │ Resolved │  75 │ 2.2.0                  │ openHAB Basic UI Fragment, Hosts: 214
229 │ Active   │  80 │ 2.2.0                  │ HABmin User Interface
230 │ Active   │  80 │ 2.2.0                  │ HABPanel User Interface
231 │ Resolved │  75 │ 2.2.0                  │ openHAB Paper UI Theme Fragment, Hosts: 215
232 │ Active   │  80 │ 2.2.0                  │ LG webOS Binding
233 │ Active   │  80 │ 1.11.0                 │ openHAB Sonance Binding
234 │ Active   │  80 │ 0.10.0.b1              │ DMX Binding
235 │ Active   │  80 │ 2.2.0                  │ Nest Binding
237 │ Active   │  80 │ 1.11.0                 │ openHAB FritzboxTr064 Binding

The resol binding is coming from /usr/share/openhab2/addons. All others come from the internal addon place.

wborn · March 15, 2018, 11:59am

Just issue the following commands in the Karaf console:

threads 86

and

threads 447

Laurens · March 15, 2018, 1:30pm

It’s not saying much:

openhab> threads 447
Thread 447 pool-60-thread-1 TIMED_WAITING
Stacktrace:
sun.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.parkNanos line: 215
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos line: 2078
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take line: 1093
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take line: 809
java.util.concurrent.ThreadPoolExecutor.getTask line: 1074
java.util.concurrent.ThreadPoolExecutor.runWorker line: 1134
java.util.concurrent.ThreadPoolExecutor$Worker.run line: 624
java.lang.Thread.run line: 748
openhab> threads 86
Thread 86 features-1-thread-1 WAITING
Stacktrace:
sun.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.park line: 175
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await line: 2039
java.util.concurrent.LinkedBlockingQueue.take line: 442
java.util.concurrent.ThreadPoolExecutor.getTask line: 1074
java.util.concurrent.ThreadPoolExecutor.runWorker line: 1134
java.util.concurrent.ThreadPoolExecutor$Worker.run line: 624
java.lang.Thread.run line: 748

wborn · March 17, 2018, 11:51am

@Laurens
Looks like those threads were idle at that particular moment. You could repeatedly get stack traces from those threads to see what they are doing when they are not idle.

Since openHAB 2.3.0-SNAPSHOT build #1230 a new Karaf version has been integrated which you might want to try to see if it perhaps also fixes the high CPU due to the features-1-thread-1.

See also: Unit-Support, KNX 2, Karaf 4.1.5 Upgrade and more!

_tom · March 24, 2018, 9:53am

Somehow this topic wasn’t set to “Watching” on my end, so I missed all your answers.
I am trying to keep running on the latest nightly with ~5 days delay.
I still have the problem, that Openhab uses up pretty much all of my RAM and all of my CPU power. Since I use the same machine to serve some web pages this is pretty bad, since this almost always leads to killing my apache thread.
@wborn I will try to use your cron suggestion, since I can never get into the Karaf Console anymore when I notice this happening. Both issues you listed were already merged in the version I used when reporting here.

Maybe the new Karaf version eases this issue.

wborn · March 24, 2018, 10:26am

Do you also see OutOfMemoryError exceptions in the log files?

When openHAB starts consuming all your RAM there may be a memory leak. To reclaim memory Java will start to continuously garbage collect unused memory. Because garbage collection is an expensive operation it will increase the CPU load.

See also: How to find the cause of a OutOfMemoryError in OH2?

In my experience the most common cause for memory leaks is adding objects to a list but never removing them. Such bugs could also exist in your own rules.