CPU spike in latest snapshots

I’m seeing recurring CPU spikes +/ every minute since I moved to snapshot #1369 and higher.
It seems to correlate with the following log entries, load jumps by about 100% for +/- 10 seconds on a RPi3 (which is a 4-core machine so that’s effectively 1 CPU core going to 100%). Load drops right back to ~20% which is normal level right whenever those messages appear, so I strongly believe they’re related.
The problem disappeared when I moved back to #1360 for a test.

Any ideas ? Is it any known issue to be related to the Karaf update, @wborn ?

EDIT:
I disabled the marketplace in addons.cfg/misc but problem persists so it does not seem to be related to that.

I was guessing that it is in fact org.apache.karaf.features.internal.service.FeaturesServiceImpl so
I’ve already set org.apache.karaf to debug level but I don’t get any other output off that class tree.
Don’t think this normal.


2018-09-24 16:10:51.082 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:10:51.156 [INFO ] [internal.service.FeaturesServiceImpl] - Done.
2018-09-24 16:11:53.372 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:11:53.418 [INFO ] [internal.service.FeaturesServiceImpl] - Done.
2018-09-24 16:12:49.279 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:12:49.334 [INFO ] [internal.service.FeaturesServiceImpl] - Done.
2018-09-24 16:13:49.872 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:13:49.951 [INFO ] [internal.service.FeaturesServiceImpl] - Done.
2018-09-24 16:14:47.550 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:14:47.600 [INFO ] [internal.service.FeaturesServiceImpl] - Done.
2018-09-24 16:15:51.107 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:15:51.158 [INFO ] [internal.service.FeaturesServiceImpl] - Done.
2018-09-24 16:16:49.144 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-24 16:16:49.200 [INFO ] [internal.service.FeaturesServiceImpl] - Done.

I have a RPi3 that runs openHAB 2.4.0 Build #1371 using Docker and it doesn’t have these load issues. Though its addons directory also does not have any KAR or JAR files which could make a difference. Does yours have any?

I used to have kars and thought of that already, too, so switched to online repo and deleted kars (and cache before restart), unfortunately it did not make a difference and is still spiking.

Maybe you can check which thread is consuming your CPU cycles?

If you’re into Java, you can also hookup a profiler to find the root cause. I’ve successfully used JProfiler with openHAB before.

That seems to be either of these guys (DirWatcher being very suspicious to use that much),
but what does it tell us ?
org.apache.karaf is already at debug level but does not output anything more than what I showed.

openhab> threads 97
Thread 97 features-3-thread-1 WAITING
Stacktrace:
java.lang.Object.wait line: -2
java.lang.Object.wait line: 502
org.apache.karaf.features.internal.download.impl.MavenDownloadManager$MavenDownloader.await line: 101
org.apache.karaf.features.internal.region.Subsystem.downloadBundles line: 537
org.apache.karaf.features.internal.region.Subsystem.downloadBundles line: 452
org.apache.karaf.features.internal.region.SubsystemResolver.resolve line: 224
org.apache.karaf.features.internal.service.Deployer.deploy line: 388
org.apache.karaf.features.internal.service.FeaturesServiceImpl.doProvision line: 1025
org.apache.karaf.features.internal.service.FeaturesServiceImpl.lambda$doProvisionInThread$13 line: 964
org.apache.karaf.features.internal.service.FeaturesServiceImpl$$Lambda$218/713629.call line: -1
java.util.concurrent.FutureTask.run line: 266
java.util.concurrent.ThreadPoolExecutor.runWorker line: 1149
java.util.concurrent.ThreadPoolExecutor$Worker.run line: 624
java.lang.Thread.run line: 748
openhab> threads 106
Thread 106 Dir Watcher WAITING
Stacktrace:
sun.misc.Unsafe.park line: -2
java.util.concurrent.locks.LockSupport.park line: 175
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await line: 2039
java.util.concurrent.LinkedBlockingDeque.takeFirst line: 492
java.util.concurrent.LinkedBlockingDeque.take line: 680
sun.nio.fs.AbstractWatchService.take line: 118
org.eclipse.smarthome.core.service.WatchQueueReader.run line: 210
java.lang.Thread.run line: 748

I’m sorry I’m not.

The load seems to be caused by Karaf continuously (re)deploying features. It seems to have already been an issue for some in 2.2.0. I.e. the features-1-thread-1 consuming CPU cycles in Openhab2 java cpu 100%. Perhaps @splatch knows what may be causing this?

I’ve failed to install and test latest snapshots (unstable network while pulling docker image), but I observed similar issue with some of updates with 2.3.0.

I did not debug the issue yet, however there are several situations when it might occur. First - is when you attempt to install or update feature which refers any of Karaf core features - such shell or features or any other which you expect to be there. Since shell is core feature (it is in fact part of framework feature) and by default feature resolver will attempt to refresh referenced features too it causes a semi restart of everything. Motivation for doing that is because your newly installed feature might contribute something to referenced core features, which moves us to second scenario.
Second situation which might happen is update which doesn’t affect base karaf features, but can be effectively caused by update process which fetches updates and passes them to Karaf feature resolver. Update of feature might fail causing (for some reason endless loop).

Feature resolver itself is relatively fresh thing based on external component developed as part of Apache Felix framework and it has its own devils. I would look for these two cases first and try to correlate troubles with cpu usage with feature updates as this mechanism does not unless requested.

Thanks for background explanation. Can you also advise how to approach troubleshooting ?
I’m not aware of any config type of change to match what you describe, and as written I’ve already ramped up org.apache.karaf logging to debug but all I get to see is those lines about every minute.

Thanks for your insights on this @splatch. I’ve seen this happen myself too when I update a bundle on which a lot of other bundles are dependent. Usually I can resolve that by just restarting openHAB. Do you still have the issue after restarting openHAB @mstormi?

Maybe you know of a way to reproduce your issue with a new openHAB installation @mstormi ? It could also be due to custom configurations in addons.cfg, runtime.cfg. Maybe you can share those with us?

yes

Isn’t a snapshot package upgrade right that ? Or do you mean me to start with a default config and add all the addons one-by-one until the problem shows up ?

Possible, but remind you it works with #1360

# The installation package of this openHAB instance
# Note: This is only regarded at the VERY FIRST START of openHAB
# Note: If you want to specify your add-ons yourself through entries below, set the package to "minimal"
# as otherwise your definition might be in conflict with what the installation package defines.
#
# Optional. If not set, the dashboard (https://<yourserver>:8080/) will ask you to choose a package.
#
# Valid options:
#   - minimal  : Installation only with dashboard, but no UIs or other add-ons. Use this for custom setups.
#   - simple   : Setup for using openHAB purely through UIs - you need to expect MANY constraints in functionality!
#   - standard : Default setup for normal users, best for textual setup
#   - expert   : Setup for expert users, especially for people migrating from openHAB 1.x
#   - demo     : A demo setup which includes UIs, a few bindings, config files etc.
#
# See http://docs.openhab.org/configuration/packages.html for a detailed explanation of these packages.
#
package = expert

# Access Remote Add-on Repository
# Defines whether the remote openHAB add-on repository should be used for browsing and installing add-ons.
# This not only makes latest snapshots of add-ons available, it is also required for the installation of
# any legacy 1.x add-on. (default is true)
#
remote = true
#remote = false

# Include legacy 1.x bindings. If set to true, it also allows the installation of 1.x bindings for which there is
# already a 2.x version available (requires remote repo access, see above). (default is false)
#
#legacy = true
legacy = false

# A comma-separated list of bindings to install (e.g. "sonos,knx,zwave")
#binding = astro,exec,http1,hue,max,mqtt1,mqttitude1,network,ntp,pioneeravr,weather1,zwave
# in Reihenfolge der Wichtigkeit/Anzahl items, die moeglichst frueh benötigt werden
#binding = astro,exec,expire1,http1,hue,kodi,max,mqtt1,mqttitude1,network,ntp,serial1,wol1,weather1
#binding = astro,exec,expire1,http1,hue,kodi,mqtt1,mqttitude1,network,ntp,serial1,wol1,weather1
#binding = astro,exec,expire1,http1,hue,kodi,mqtt1,mqttitude1,network,ntp,pioneeravr,serial1,valloxmv,wol1,weather1
#binding = astro,exec,expire1,http1,hue,kodi,max,mqtt1,mqttitude1,network,ntp,pioneeravr,serial1,valloxmv,wol1,weather1,zwave
#binding = astro,exec,expire1,http1,hue,kodi,max,mqtt1,mqttitude1,network,ntp,pioneeravr,valloxmv,wol1,weather1,zwave
binding = astro,exec,expire1,http1,hue,kodi,max,mqtt1,mqttitude1,network,ntp,pioneeravr,samsungtv,valloxmv,wol1,weather1,zwave

# A comma-separated list of UIs to install (e.g. "basic,paper")
ui = basic,paper,habmin,homebuilder
#ui = classic,basic,paper,habmin

# A comma-separated list of persistence services to install (e.g. "rrd4j,jpa")
persistence = rrd4j,mapdb
#persistence = mapdb

# A comma-separated list of actions to install (e.g. "mail,pushover")
action = telegram
#action = mqtt,xmpp,xbmc,telegram

# A comma-separated list of transformation services to install (e.g. "map,jsonpath")
transformation = map,jsonpath,regex,javascript,xslt,scale

# A comma-separated list of voice services to install (e.g. "marytts,freetts")
#voice =
voice = marytts

# A comma-separated list of miscellaneous services to install (e.g. "myopenhab")
#misc = market
misc =

##################### LOCALE ####################

# The default language that should be used. If not specified, the system default locale is used.
# The ISO 639 alpha-2 or alpha-3 language code (if there is no alpha-2 one).
# Example: "en" (English), "de" (German), "ja" (Japanese), "kok" (Konkani)
#
org.eclipse.smarthome.core.localeprovider:language=de

# The region that should be used.
# ISO 3166 alpha-2 country code or UN M.49 numeric-3 area code.
# Example: "US" (United States), "DE" (Germany), "FR" (France), "029" (Caribbean)
#
org.eclipse.smarthome.core.localeprovider:region="DE"

################ PERSISTENCE ####################

#  The persistence service to use if no other is specified.
#
org.eclipse.smarthome.persistence:default=rrd4j

################### AUDIO #######################

# This parameter defines the default audio source to use (if not set, the first available one will be used.
#
#org.eclipse.smarthome.audio:defaultSource=

# This parameter defines the default audio sink to use (if not set, the first available one will be used.
#
#org.eclipse.smarthome.audio:defaultSink=

##################### VOICE ####################

# This parameter defines the default text-to-speech service to use (if not set, the first available one will be used.
#
org.eclipse.smarthome.voice:defaultTTS=marytts

# This parameter defines the default speech-to-text service to use (if not set, the first available one will be used.
#
#org.eclipse.smarthome.voice:defaultSTT=

# The default voice to use if no specific TTS service or voice is specified.
#
org.eclipse.smarthome.voice:defaultVoice=bits3hsmm

# The default human language interpreter to use if no other is specified.
#
#org.eclipse.smarthome.voice:defaultHLI=

################ MISCELLANOUS ####################

# Setting this to true will automatically approve all inbox entries and create Things for them,
# so that they are immediately available in the system (default is false)
#
#autoapprove:enabled=true
autoapprove:enabled=false

# This setting allows to switch between a "simple" and an "advanced" mode for item management.
# In simple mode (autoLinks=true), links and their according items are automatically created for new Things.
# In advanced mode (autoLinks=false), the user has the full control about which items channels are linked to.
# Existing links will remain untouched. (default is true)
#
org.eclipse.smarthome.links:autoLinks=false

# This is the default configuration file, which comes with every openHAB distribution.
# You should do a copy of it with the name 'openhab.cfg' and configure your personal
# settings in there. This way you can be sure that they are not overwritten, if you
# update openHAB one day.


# aus OH1 uebernommen, klappt evtl. nicht
# Configuration folders (must exist as a subdirectory of "configurations"; the value
# tells the number of seconds for the next scan of the directory for changes. A
# value of -1 deactivates the scan).
# A comma separated list can follow after the refresh value. This list defines a filter
# for valid file extensions for the models.
# VORHER: items=items
#org.eclipse.smarthome.folder:items=120,items
#org.eclipse.smarthome.folder:sitemaps=10,sitemap
#org.eclipse.smarthome.folder:rules=60,rules
#org.eclipse.smarthome.folder:scripts=60,script
#org.eclipse.smarthome.folder:persistence=60,persist

# Configuration of thread pool sizes
org.eclipse.smarthome.threadpool:thingHandler=20
org.eclipse.smarthome.threadpool:discovery=5
org.eclipse.smarthome.threadpool:safeCall=20
#org.openhab.kodi:enableAutoDiscovery=false

org.apache.karaf.shell:sshIdleTimeout = 0

@mstormi In both cases you need to find a root bundle and/or feature which starts to cause troubles. Core bundles are likely to be “guilty” ones. A possible workaround while using feature commands is to hold bundle refresh with -r switch. They will be refreshed upon manual restart or system shutdown.

Err, and how to accomplish that ? Unload all modules (shown on bundle:list) one-by-one ? Would break a running OH right away, wouldn’t it ? Also Karaf help does not show me any -r on neither bundle:refresh nor feature:refresh commands.

Upgraded to #1372, moved back to .kar (no more online repo) and removed legacy package (not required anyway).
Problem persistent.
Enabled debugging for org.apache.felix, now seeing the following lines right when load increases by 100%:

2018-09-25 19:41:52.606 [DEBUG] [org.apache.felix.configadmin        ] - getConfiguration(pid=org.openhab.addons)
2018-09-25 19:41:52.607 [DEBUG] [org.apache.felix.configadmin        ] - Found cached configuration org.openhab.addons bound to mvn:org.openhab.core/org.openhab.core.karaf/2.4.0-SNAPSHOT
2018-09-25 19:41:52.609 [DEBUG] [org.apache.felix.configadmin        ] - getConfiguration(pid=org.ops4j.pax.url.mvn, location=null)
2018-09-25 19:41:52.611 [DEBUG] [org.apache.felix.configadmin        ] - Found cached configuration org.ops4j.pax.url.mvn bound to ?
2018-09-25 19:41:52.884 [DEBUG] [org.apache.felix.configadmin        ] - getConfiguration(pid=org.ops4j.pax.url.mvn, location=null)
2018-09-25 19:41:52.887 [DEBUG] [org.apache.felix.configadmin        ] - Found cached configuration org.ops4j.pax.url.mvn bound to ?

When load drops back, this is output:
2018-09-25 19:42:11.486 [INFO ] [internal.service.FeaturesServiceImpl] - No deployment change.
2018-09-25 19:42:11.596 [INFO ] [internal.service.FeaturesServiceImpl] - Done.

That first output made me look at the Maven(?) config file org.ops4j.pax.url.mvn.cfg to effectively point to local directories only.
I played with that and set org.ops4j.pax.url.mvn.defaultRepositories= <empty>

org.ops4j.pax.url.mvn.defaultRepositories=
#org.ops4j.pax.url.mvn.defaultRepositories=\
#    ${karaf.home.uri}${karaf.default.repository}@id=system.repository@snapshots, \
#    ${karaf.data.uri}tmp/kar@id=kar.repository@multi@snapshots, \
#    ${karaf.base.uri}${karaf.default.repository}@id=child.system.repository@snapshots

Now guess what - that has cured my CPU spike problem!
Seems Karaf/Maven/felix/whatever was searching those repositories every time.

Then again, openHAB did not start any more after that, java just hanging around, seemed Karaf didn’t want to proceed for no obvious reason…

I came to re-install the openhab2 packages and switched config to use online repo again instead of .kar. That did not succeed on first attempt although I had deleted the cache for sure
Got it to be working on second attempt (fingers crossed) - and no more spikes.
That Maven .cfg looks still looks the same as before though.
Now if I was to explain what happened and what’s the difference now … *shrug*.
Possibly the re-install step deleted something I am not aware of, but I’m lacking a proper explanation.