Version: 0.9.0.201612202037
I moved back to this version as the version a few days later broke the ability for the controller to be recognized on the port.
Anyway, Iβve noticed a few times that the openhab user will spike the cpu over 100% and basically render the system useless.
I also have 100% + java usage. This does not go away after startup. It can go on for a couple days, and eventually my Pi will cease and i have to reboot. A backup from >3 mos ago works fine. but recent 2.3 OH versions have this.
I too am struggling with high load often on my RPi3 install typically this is when I am modifying rules and OH starts reverting to running βsystem startedβ rule-sets.
This used to take around ten minutes to settle down but more recently it is over half an hour, possible 45 minutes; during the last 15 minutes OH will respond back positively to an external command change from say hue devices or Alexa but may not action the rules for that command change trigger for another 15 or 20 minutes.
I also see that any CPU load above or close to 1.0 seems to cause lag in rule execution and the Basic UI, is OH2 single threaded by chance?
I too have tried to see what is using the CPU and as top and htop only identify it as java, which I presume means its OH2 as I have nothing else running on this Pi. I cannot seem to get the next level of granularity to troubleshoot further, does someone know how to work out if its something specific in my OH2 configuration/setup that is using too much processing cycles?
I am considering moving to a PC arch platform but if its single threaded In may reassess my options.
yes i have the same issue too, i am running Raspbian and the latest build of OH2, i update OH2 to latest build every few days, and i started to notice this high utilization few weeks ago
I have a similar issue since a few days. The java process uses ~200% CPU with 5GB reserved RAM and 13GB virtual RAM. I am running on the latest nightly on a 16.04 Ubuntu LTS machine. Its an Intel NUC with i5 processor.
Before this issue openhab induced ~1% CPU load. Now it works for some hours (donβt know the exact timespan) and then starts to act crazy.
Latest log messages:
2018-02-26 20:00:08.487 [WARN ] [l.serialmessage.SendDataMessageClass] - NODE 2: Already processed another send data request for this callback Id, ignoring.
2018-02-26 20:00:44.731 [WARN ] [l.serialmessage.SendDataMessageClass] - NODE 2: Already processed another send data request for this callback Id, ignoring.
2018-02-26 20:00:58.897 [WARN ] [ocol.ZWaveController$ZWaveSendThread] - Receive queue TIMEOUT:
2018-02-26 20:00:37.590 [WARN ] [com.zaxxer.hikari.pool.HikariPool ] - 1m5s724ms902ΓΒΌs973ns - Thread starvation or clock leap detected (housekeeper delta=yank-default).
2018-02-26 20:01:13.188 [ERROR] [io.gcal.internal.GCalEventDownloader] - Event fetch failed: Remote host closed connection during handshake
2018-02-26 20:05:24.400 [WARN ] [me.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.eclipse.smarthome.io.monitor.internal.EventLogger@e15ab5e' takes more than 5000ms.
2018-02-26 20:13:50.751 [WARN ] [me.core.internal.events.EventHandler] - Dispatching event to subscriber 'org.eclipse.smarthome.io.rest.sse.internal.listeners.SseEventSubscriber@4b1724f8' takes more than 5000ms.
2018-02-26 20:31:05.797 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1103d390 in HttpClient@521526855{STARTED,8<=8<=200,i=1,q=0}
2018-02-26 20:35:20.103 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1103d390 in HttpClient@521526855{STARTED,8<=8<=200,i=2,q=0}
2018-02-26 20:34:37.023 [WARN ] [e.jetty.util.thread.QueuedThreadPool] - Unexpected thread death: org.eclipse.jetty.util.thread.QueuedThreadPool$2@1103d390 in HttpClient@521526855{STARTED,8<=8<=200,i=1,q=0}
Needless to say, none of the Webinterfaces is working, neither is the SSH console. I canβt even kill the java process via
The βCPU timeβ and βUsr timeβ columns at the right should give an indication how much CPU time a thread has consumed. The higher the number, the more CPU time has been consumed by a thread.
When youβve found a thread that has gone amok you can acquire stack traces to pin point the code location with either:
threads
Listing stack traces for all threads.
threads <id>
Listing stack trace for a certain thread, where you replace <id> with the id of the thread.
The stack traces show which instructions a thread is executing. It will print the exact file and line number, e.g. something like:
"CM Configuration Updater" Id=25 in WAITING on lock=java.util.LinkedList@2bf61c
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:83)
at java.lang.Thread.run(Thread.java:748)
"CM Event Dispatcher" Id=26 in WAITING on lock=java.util.LinkedList@1b1975c
at java.lang.Object.wait(Native Method)
at java.lang.Object.wait(Object.java:502)
at org.apache.felix.cm.impl.UpdateThread.run(UpdateThread.java:83)
at java.lang.Thread.run(Thread.java:748)
"fileinstall-/openhab/addons" Id=38 in TIMED_WAITING on lock=org.apache.felix.fileinstall.internal.DirectoryWatcher@ea6bf5
at java.lang.Object.wait(Native Method)
at org.apache.felix.fileinstall.internal.DirectoryWatcher.run(DirectoryWatcher.java:316)
If you canβt access the Console because openHAB no longer responds (due to the high CPU usage), you could create a cronjob that periodically saves all this thread info to a file.
@Laurens This is the Top 10 of your threads sorted descending on CPU time descending with an extra computed Total CPU time % column (using a spreadsheet):
Id
Name
CPU time
Total CPU time %
86
features-1-thread-1
5119732
28.1%
153
HTTP Refresh Service
4918099
27.0%
447
pool-60-thread-1
3439530
18.9%
266
ESH-persist-Monitor-1
1150952
6.3%
38
Thread-14
512823
2.8%
37
Thread-13
504785
2.8%
256
pool-47-thread-1
338963
1.9%
178
upnp-main-3
232411
1.3%
35378
ESH-thingHandler-12
207675
1.1%
177
upnp-main-2
181246
1.0%
The features-1-thread-1 seems to be a Karaf thread. Itβll probably be used for starting and installing/uninstalling openHAB bundles and bindings. Did you start from scratch or install/uninstall any bindings? How long was openHAB running when you created the thread list? It may be a Karaf issue when this thread remains busy and its CPU time keeps increasing.
The HTTP Refresh Service thread seems to be used by the HTTP binding for refreshing the state of your items. So it will be more busy the more items you have and the more frequent these are refreshed.
Itβs not really clear what the pool-60-thread-1 is used for. Maybe you can create a stack trace for it and see what code it is executing?
This is my thread list of an openHAB 2.2.0 instance running without CPU issues for about 2 months now.
Id
Name
CPU time
Total CPU time %
213
Thread-74
49340842
16.4%
143
ESH-thingHandler-1
36876063
12.3%
151
ESH-thingHandler-4
36601637
12.2%
150
ESH-thingHandler-3
36326230
12.1%
152
ESH-thingHandler-5
35857121
11.9%
149
ESH-thingHandler-2
35318407
11.7%
215
Plugwise MessageProcessorThread
12027739
4.0%
38
Thread-14
10925330
3.6%
37
Thread-13
10898461
3.6%
214
Plugwise MessageSenderThread
3436501
1.1%
The stacktrace of Thread-74 contains βgnu.io.RXTXPortβ which means it is used for serial communications. In my setup that means it is used for communicating with my Plugwise Stick via USB. The thing handler threads are mostly busy with polling state for my other openHAB bindings (TCP and UDP communications in my setup).
Thanks for the research! My CPU is actually not going to 100%, but more hanging to 12% to 35% in idle modus. Itβs runnning on an Atom with 8 cores. It was already running for days at that time.
I have 1 HTTP URL on 5 minutes, 1 HTTP URL on 2 minutes and 1 HTTP URL on 6 minutes. All set with HTTP caching function.
How can I do a stacktrace for a thread (threads 86 and 447 in my case) from karaf console?
I have 119 rules and 225 items.
Installed bindings:
197 β Active β 80 β 2.0.0.201708311514 β resol Binding
208 β Active β 80 β 0.10.0.b1 β Astro Binding
216 β Active β 80 β 1.11.0 β openHAB Mail Action
217 β Active β 80 β 1.11.0 β openHAB Pushover Action
218 β Active β 80 β 1.11.0 β openHAB Denon Binding
219 β Active β 80 β 2.2.0 β Exec Binding
220 β Active β 80 β 1.11.0 β openHAB HTTP Binding
221 β Active β 80 β 1.11.0 β openHAB KNX Binding
222 β Active β 80 β 2.2.0 β Kodi Binding
223 β Active β 80 β 2.2.0 β Network Binding
224 β Active β 80 β 1.11.0 β openHAB SNMP Binding
225 β Active β 80 β 1.11.0 β openHAB WoL Binding
226 β Active β 80 β 2.2.0 β ZWave Binding
227 β Active β 80 β 1.11.0 β openHAB InfluxDB Persistence bundle
228 β Resolved β 75 β 2.2.0 β openHAB Basic UI Fragment, Hosts: 214
229 β Active β 80 β 2.2.0 β HABmin User Interface
230 β Active β 80 β 2.2.0 β HABPanel User Interface
231 β Resolved β 75 β 2.2.0 β openHAB Paper UI Theme Fragment, Hosts: 215
232 β Active β 80 β 2.2.0 β LG webOS Binding
233 β Active β 80 β 1.11.0 β openHAB Sonance Binding
234 β Active β 80 β 0.10.0.b1 β DMX Binding
235 β Active β 80 β 2.2.0 β Nest Binding
237 β Active β 80 β 1.11.0 β openHAB FritzboxTr064 Binding
The resol binding is coming from /usr/share/openhab2/addons. All others come from the internal addon place.
@Laurens
Looks like those threads were idle at that particular moment. You could repeatedly get stack traces from those threads to see what they are doing when they are not idle.
Since openHAB 2.3.0-SNAPSHOT build #1230 a new Karaf version has been integrated which you might want to try to see if it perhaps also fixes the high CPU due to the features-1-thread-1.
Somehow this topic wasnβt set to βWatchingβ on my end, so I missed all your answers.
I am trying to keep running on the latest nightly with ~5 days delay.
I still have the problem, that Openhab uses up pretty much all of my RAM and all of my CPU power. Since I use the same machine to serve some web pages this is pretty bad, since this almost always leads to killing my apache thread. @wborn I will try to use your cron suggestion, since I can never get into the Karaf Console anymore when I notice this happening. Both issues you listed were already merged in the version I used when reporting here.
Do you also see OutOfMemoryError exceptions in the log files?
When openHAB starts consuming all your RAM there may be a memory leak. To reclaim memory Java will start to continuously garbage collect unused memory. Because garbage collection is an expensive operation it will increase the CPU load.
In my experience the most common cause for memory leaks is adding objects to a list but never removing them. Such bugs could also exist in your own rules.