I have also migrated to the new hue ai v2. But also have the CPU peaks without the binding at all. The high CPU usage is not all the time. I have attached my CPU usage over the last 24 hours shown by the white line.
I have probably also a higher CPU usage than before even if I am not sure and even if it seems to have 0 impact on my home automations.
I have now installed and setup the systeminfo binding (refresh every 3s for high priority channels) and I have the strange feeling that the CPU usage is even higher now. I can see in top my java process around 65% now. It was less before installing the systeminfo binding.
But at least, I will have charts now and see during a day if it is stable or not.
Is there someone having a clear understanding of the “Linux load average” ? For example, what means exactly 0,20 ? Is it a good value for a RPI3 ?
Is it normal to have “load average 0,16 0,17 0,18” and the %CPU of the java process at 62,2% ?
After some readings, I understand that a load average of 4 for a RPI3 with 4 core would be normal but would mean a high load.
So a value between 0,2 and 0,4 is probably considered for such processor as a low CPU load.
Be careful. On Linux, load average is about process load and includes waiting processes, e.g. for IO, that may not impact CPU usage. A high load average is indeed not what you want, but it could be more than the number of cores, depending on what the processes are doing. CPU usage on linux is by processor/core and adds up. There is a CPU problem if load average and CPU usage evolve in the same way.
Can confirm. On my rpi4/openhabian/OH4.0.1 install I track a number of parameters from the rpi using persistence. The ‘load average’ parameter is generally between 0.2 and 0.5 with occasional spikes up to 2-4 that seem to quickly settle.
No issues with my upgrade as I’ve been running the pre-prod OH4 versions since they came out, but this is load I’m seeing even since OH3.
That number is relative to a single CPU core. So for example the items-queue problem that number will always be >= 100%, because a single CPU core is completely pegged, and anything else is the other threads in the process using some CPU.
The percentage in the third line (by default) that starts with %Cpu(s) should sum to 100%, and is relative to all available CPU cores. You can press 1 to alternate between total and per-CPU-core though. This could be useful if you have a thread pinned to a specific CPU core, and using all of its resources (I’ve seen this in very high I/O environments where IRQ requests can only be processed by a specific core). If the idle number is near 0% consistently, it’s a good indication that your system as a whole is short on CPU resources.
The load average is defined as the number of processes (over the specific period of time) that are ready to be run (or running). The “general” rule of thumb is that if this number is greater than the number of CPU cores you have, that means the CPUs are overworked and accumulating a backlog. But like @Mherwege says, this can be deceptive depending on what exactly the load is. Consistently very high load numbers can mean your system is overworked, but not necessarily a strong signal of CPU-only starvation.
On an apt installed OH running on a Mint Linux host I simply use TOP which reads out in % percentage. When it would run away the java thread would go over 100%.
I would notice this because the cooling fan would throttle up and get very loud.
Load average is listed at the top and I think it is a 1 minute, 5 minute and 15 minute intervals. When the cpu runs away, it would read over 4
Here is what it looks like right now (normal)
I thought I had found my problem, but I wanted to test it for a few days and be sure. I had a shelly device that I don’t really use anymore. When I upgraded to OH4, I didn’t bother to reinitialize it, but I also didn’t remove the Thing from my Things. It occurred to me about a week ago that perhaps this was the problem. I had read another thread a few weeks back where a guy had a shelly device that was at the edge of his wifi coverage and it had caused problems when it was unreachable. So I deleted the shelly device.
I had previously disabled the vast majority of my Jruby rules because it was the only way to keep my system from constantly pinning the cpu at 100%. My system has been crippled by this problem since April. After deleting the shelly device and waiting a day or two, I figured I’d re-enable my Jruby rules and see what happen. Sure enough, my problem with 100% cpu usage is back.
I hate to point the finger at Jruby rules but I have a couple dozen DSL rules running which don’t cause a problem. As soon as I re-enable a few of my Jruby rules, the problem immediately returns. This problem is highly reproducible for me in case anybody wants me to try anything or give a clue as to how to track the problem down myself.
I have a very strong suspicion this has something to do with the safeCall queue problem discussed at length here
I just want to add that these same Jruby rules ran happily on OH3 with no problems. This problem started with the upgrade to OH4. But just for context, OH4 also introduced a new Jruby helper library, don’t know if that has anything to do with it
The problem is identical to the safeCall-queue. And the OH4 upgrade isn’t necessarily due to JRuby changes. It’s a bug in Java 17 (which OH4 switched to) in the LinkedTransferQueue that openHAB core uses. Certain workloads are clearly more likely to trigger the bug, even though they’re not buggy in and of themselves. I also wouldn’t be surprised if the bug only happens on certain hardware combinations - I didn’t quite follow the chatter on the upstream bug (https://bugs.java.com/bugdatabase/view_bug?bug_id=8301341) which is now fixed, but only available in a pre-release of Java 22. OH4 did also start using SafeCaller more for executing JSR rules (to help prevent errant rules from running amok with no way to stop them hit to restart openHAB), so are probably more likely to trigger the safeCall-queue incarnation.
Thank you Cody for the link. It is interesting to me that the bug report is authored by an openHAB user (apparently) who uses openHAB in his steps to reproduce the problem.
If I understand what I’ve read, the problem is rare and not easily back-ported to prior versions so there is zero chance of it being fixed in the Java version currently used in openHAB. I would further assume that considering the effort required to move openHAB from Java version 11 to version 17, moving to version 22 isn’t going to happen any time soon.
When you say that openHAB did also start using SafeCaller more for executing JSR rules, I’m guessing you mean javascript? I guess this is why some users above who also reported suffering from this issue report having problems with javascript rules?
It seems I am in the minority and this is not anything which is going to be fixed any time soon.
I switched back to the latest OH3 release. Same configuration as with OH4 (same amount of bindings, things, items and rules). Installed again Java 11 and have now back again my 4% CPU usage, CPU temp went down from 68°C to 49°C. Attached also the chart, switch to OH3 was completed at 4PM.
Same problems here. On a RPI 3b+. Thank you for the visualisation. Im interested if this will be adressed soon. Probably as mentioned above, a fix will take some time due to java 17 is the newest JDK with LTS and 21 will be released in September 23. Nevertheless if @ccutrer is right, than the fix is in pre version JDK22
I noticed that the CPU load on my Synology ARM is constantly high since a couple of days. I can confirm @ccutrer analysis: htop shows upnp-main-queue as the offending thread.
Thread dump shows:
"upnp-main-queue" #300878 prio=5 os_prio=0 cpu=296024047.59ms elapsed=298256.24s tid=0x566655e0 nid=0x57ab runnable [0x48fde000]
java.lang.Thread.State: RUNNABLE
at java.util.concurrent.LinkedTransferQueue.awaitMatch(java.base@17.0.8/LinkedTransferQueue.java:652)
at java.util.concurrent.LinkedTransferQueue.xfer(java.base@17.0.8/LinkedTransferQueue.java:616)
at java.util.concurrent.LinkedTransferQueue.poll(java.base@17.0.8/LinkedTransferQueue.java:1294)
at org.jupnp.QueueingThreadPoolExecutor$1.run(QueueingThreadPoolExecutor.java:194)
at java.lang.Thread.run(java.base@17.0.8/Thread.java:833)
I hope they will downport the fix for https://bugs.java.com/bugdatabase/view_bug?bug_id=8301341.
Or is it ok to run Openhab 4 also with a higher Java version than 17? The time of breaking changes (removal of Nashorn, JDK encapsulation) seems to be over for now.
openHAB 4.0.2 has a “fix” (just using the LinkedTransferQueue from Java 11) automatically included. So you shouldn’t need to worry about changing Java versions.
Installed from a downloaded deb ? Then you won’t see any updates. The package needs to be installed from the repository to get updates from the same repository.
I have the apt repo installed but it has it’s key expired. i have tried to put the new one but i don’t think thant is working. The update whent well a week before to 4.01 from 3.4. Strange…