High CPU usage after migration to OH4

With upnp remember that the size of the upnp network is very important. Not necessarily the size of what’s configured. Jupnp tracks ALL upnp devices on the network once it’s active even if they aren’t configured as things. It may be prudent to dial up the jupnp thread pools to see if theres a contention issue.

Someone stated that the high CPU could be due to a bug in JS script engine.
Worth trying to uninstall JSScripting in case you use it to confirm it solves the problem.

FWIW I don’t seem to have this problem all the time. I had it in the afternoon when checking, but somehow it resolved itself - for now (without a restart). So maybe you are excluding them too fast, I’m not sure.

The reason why I mentioned sonos specifically is because of the thread name upnp-main-queue and the usage of openhab-transport-upnp in sonos, which I remembered.

Here is the full list:

  • autelis
  • avmfritz
  • deconz
  • fsinternetradio
  • heos
  • homematic
  • hue
  • kodi
  • konnected
  • lametrictime
  • lgwebos
  • loxone
  • magentatv
  • miele
  • onkyo
  • openwebnet
  • pioneeravr
  • pulseaudio
  • samsungtv
  • sonos
  • sonyaudio
  • squeezebox
  • upnpcontrol
  • wemo
  • yamahamusiccast
  • yamahareceiver
  • hueemulation

You are right about hue, it’s also on the list, and common between installed bindings for @rene54321 and me. I very recently migrated to API v2. However, I think hue only uses UPnP for discovery, so there shouldn’t be any difference between using API v1 and v2?

From the full list I use deconz, hue, kodi, lgwebos, miele, samsungtv, sonos, squeezebox and wemo.

Start removing samsungtv, it is known to be problematic.

Okay, I could temporarily modify some rules in order to remove dependencies towards channels from this binding, but out of curiosity: Can you share a bit more on this? I might have a closer look at this binding then.

I’ve been battling this issue since April first reported here and then in the safeCall thread here

At one point I found a Jruby rule with a syntax error. At that point the CPU was pinning at 100% every half an hour or 45 minutes. Disabling that rule made things calm down a little but it would still happen once or twice a day. I’ve since disabled most my rules but it still happens.
For awhile I was setting safeCall to 100 as described here by Cody but it did not survive reboots and seem to only put the problem off for awhile, not cure it.
Jan also mentioned a bug in the RRD persistence service (since fixed) might be the culprit but I’ve disabled all persistence services and the problem persists.
I use the following bindings:
Amazon Echo
IP camera
Hue (still on V1)

Apt install OH on Dell desktop with Mint Linux 19, I3, 8 Gbs RAM, 1 Tbs spinning HD
DSL and Jruby rules all UI based, no file configured anything
Heck, I just got home from work and it’s pinned right now!

I have read it is only spikes, meaning CPU is at 100% only at few moments in the day. So maybe I also have it but I do not see it.

Could be worth disabiny all persistence services in case the problem comes from one of them?

What is the easiest solution to monitor CPU usage with chart? Is it to install the systeminfo binding?

For me the is not the case. In my case, when it happens, the cpu pins at 100% and stays there until I do something about it. Either I must restart openHAB or run a script Cody wrote found in this post

Here is the script. I just use the run now button to run it

org.openhab.core.common.ThreadPoolManager.field_reader :pools
tp = org.openhab.core.common.ThreadPoolManager.pools["safeCall"]

def unblock_thread_pool(tp)
  (tp.maximum_pool_size + 1).times do
    tp.submit { sleep 1 }



I have also migrated to the new hue ai v2. But also have the CPU peaks without the binding at all. The high CPU usage is not all the time. I have attached my CPU usage over the last 24 hours shown by the white line.


With version 3.4.4 I had only 5% CPU usage now over 50%.

best regards René

I have probably also a higher CPU usage than before even if I am not sure and even if it seems to have 0 impact on my home automations.

I have now installed and setup the systeminfo binding (refresh every 3s for high priority channels) and I have the strange feeling that the CPU usage is even higher now. I can see in top my java process around 65% now. It was less before installing the systeminfo binding.
But at least, I will have charts now and see during a day if it is stable or not.

Is there someone having a clear understanding of the “Linux load average” ? For example, what means exactly 0,20 ? Is it a good value for a RPI3 ?
Is it normal to have “load average 0,16 0,17 0,18” and the %CPU of the java process at 62,2% ?

After some readings, I understand that a load average of 4 for a RPI3 with 4 core would be normal but would mean a high load.
So a value between 0,2 and 0,4 is probably considered for such processor as a low CPU load.

Be careful. On Linux, load average is about process load and includes waiting processes, e.g. for IO, that may not impact CPU usage. A high load average is indeed not what you want, but it could be more than the number of cores, depending on what the processes are doing. CPU usage on linux is by processor/core and adds up. There is a CPU problem if load average and CPU usage evolve in the same way.

Can confirm. On my rpi4/openhabian/OH4.0.1 install I track a number of parameters from the rpi using persistence. The ‘load average’ parameter is generally between 0.2 and 0.5 with occasional spikes up to 2-4 that seem to quickly settle.

No issues with my upgrade as I’ve been running the pre-prod OH4 versions since they came out, but this is load I’m seeing even since OH3.


I just read another article saying that :

You can calculate the overall CPU utilization using the idle time using the formula given below:
CPU utilization = 100 - idle time

while the iddle time is a value you can find in the third line of top.

I see in top the iddle time between 94 and 98, meaning an overall CPU usage between 2% and 6%.

For those mentioning a high CPU usage, where do you get this information ?

The number I usually look at is the per-process CPU usage in top. This one where it’s currently showing 21%:

That number is relative to a single CPU core. So for example the items-queue problem that number will always be >= 100%, because a single CPU core is completely pegged, and anything else is the other threads in the process using some CPU.

The percentage in the third line (by default) that starts with %Cpu(s) should sum to 100%, and is relative to all available CPU cores. You can press 1 to alternate between total and per-CPU-core though. This could be useful if you have a thread pinned to a specific CPU core, and using all of its resources (I’ve seen this in very high I/O environments where IRQ requests can only be processed by a specific core). If the idle number is near 0% consistently, it’s a good indication that your system as a whole is short on CPU resources.

The load average is defined as the number of processes (over the specific period of time) that are ready to be run (or running). The “general” rule of thumb is that if this number is greater than the number of CPU cores you have, that means the CPUs are overworked and accumulating a backlog. But like @Mherwege says, this can be deceptive depending on what exactly the load is. Consistently very high load numbers can mean your system is overworked, but not necessarily a strong signal of CPU-only starvation.

On an apt installed OH running on a Mint Linux host I simply use TOP which reads out in % percentage. When it would run away the java thread would go over 100%.
I would notice this because the cooling fan would throttle up and get very loud.
Load average is listed at the top and I think it is a 1 minute, 5 minute and 15 minute intervals. When the cpu runs away, it would read over 4
Here is what it looks like right now (normal)

load average: 0.58, 0.63, 0.34

I thought I had found my problem, but I wanted to test it for a few days and be sure. I had a shelly device that I don’t really use anymore. When I upgraded to OH4, I didn’t bother to reinitialize it, but I also didn’t remove the Thing from my Things. It occurred to me about a week ago that perhaps this was the problem. I had read another thread a few weeks back where a guy had a shelly device that was at the edge of his wifi coverage and it had caused problems when it was unreachable. So I deleted the shelly device.

I had previously disabled the vast majority of my Jruby rules because it was the only way to keep my system from constantly pinning the cpu at 100%. My system has been crippled by this problem since April. After deleting the shelly device and waiting a day or two, I figured I’d re-enable my Jruby rules and see what happen. Sure enough, my problem with 100% cpu usage is back.
I hate to point the finger at Jruby rules but I have a couple dozen DSL rules running which don’t cause a problem. As soon as I re-enable a few of my Jruby rules, the problem immediately returns. This problem is highly reproducible for me in case anybody wants me to try anything or give a clue as to how to track the problem down myself.
I have a very strong suspicion this has something to do with the safeCall queue problem discussed at length here

I just want to add that these same Jruby rules ran happily on OH3 with no problems. This problem started with the upgrade to OH4. But just for context, OH4 also introduced a new Jruby helper library, don’t know if that has anything to do with it

The problem is identical to the safeCall-queue. And the OH4 upgrade isn’t necessarily due to JRuby changes. It’s a bug in Java 17 (which OH4 switched to) in the LinkedTransferQueue that openHAB core uses. Certain workloads are clearly more likely to trigger the bug, even though they’re not buggy in and of themselves. I also wouldn’t be surprised if the bug only happens on certain hardware combinations - I didn’t quite follow the chatter on the upstream bug (https://bugs.java.com/bugdatabase/view_bug?bug_id=8301341) which is now fixed, but only available in a pre-release of Java 22. OH4 did also start using SafeCaller more for executing JSR rules (to help prevent errant rules from running amok with no way to stop them hit to restart openHAB), so are probably more likely to trigger the safeCall-queue incarnation.

1 Like

Thank you Cody for the link. It is interesting to me that the bug report is authored by an openHAB user (apparently) who uses openHAB in his steps to reproduce the problem.
If I understand what I’ve read, the problem is rare and not easily back-ported to prior versions so there is zero chance of it being fixed in the Java version currently used in openHAB. I would further assume that considering the effort required to move openHAB from Java version 11 to version 17, moving to version 22 isn’t going to happen any time soon.
When you say that openHAB did also start using SafeCaller more for executing JSR rules, I’m guessing you mean javascript? I guess this is why some users above who also reported suffering from this issue report having problems with javascript rules?
It seems I am in the minority and this is not anything which is going to be fixed any time soon.