openhabian running openHAB 3.2.0.M2 on a Pi4 4GB with a 32GB SD Card
I am seeing a very slow response to changes in Jython scripts. It can take 1-2 minutes after making a change to the script for the Rule to be added (as evidenced in the log). Normally changes are detected and Rule compiled in a few seconds.
This happens some (long period) of time after the system has been running and it is solved by rebooting the system.
Question: What information would be useful to see to know what the source of this delay is coming from? i.e. what debug logs should I enable and what should I look for? The same problem occurred when I was running 3.2.0 build #2469 as well as in 3.1.0.x
Look at top/htop periodically. Is the CPU pegged? What’s the RAM usage and swap doing? What about the system load?
See if there is some sort of pattern. For example, the CPU utilization from the Java processes (openHAB) grows over time until it consumes all the resources. Or maybe the CPU and RAM is fine but the system load is high. The system load is a number representing the number of processes that are waiting for some resource.
I’m not sure if there will be anything in the logs but you can change the logging to trace level and see if there is any pattern there. There is no error though so you’ll have to look at timestamps and such to figure out what seems to be going on.
Problems like these are really hard to diagnose. They are often heisenbugs too. As soon as you try to diagnose or measure it, the bad behavior disappears of changes.
Thanks Rich - Yes, I do see the CPU% getting pegged at 100+ right after i make a change (I insert a single CR for testing) this lasts until finally the “Rule Added” log occurs, then the CPU% returns to “normal” (meaning it ranges from a few % to 99 every 10-20 seconds or so - watching the one sec raw updates, not the averages which also start to go up but slower, of course.)
So clearly the CPU is being heavily taxed during that time … I haven’t rebooted yet to see what “Rule Added” does when the system is running fine.
Just be careful that you understand what the numbers mean though. The RPi 4 has 4 CPUs. So if all four are pegged, the CPU utilization will be 400%.
100% doesn’t seem like it’d be too much of a problem. But what is important is to compare what you observe now with what happens when the system is behaving.
Yes, I see it at 120% or so but never 400% … the average seems to go to 96% (Is that on the 0-100% scale?)
Agreed. Didn’t want to reboot until I gathered more data.
Another data point, it seems to be only when a Rule is being added that this occurs, i.e. the system seems to be behaving normally otherwise (but of course compiling could be very CPU intensive).
Each CPU core gets 100. 4 cores means a 400% scale.
A little more data … when I make many changes to a file, this behavior lasts much longer. i.e. it’s like the pending “file changed” messages get queued up and dealt with one at a time, which normally is fine since they are processed fairly quickly. But in this case, they keep the CPU “pegged” for a long time (5-10 minutes) until finally all of the “file changed” messages are exhausted. Of course I could turn off “Auto Save” but that’s not the root of the issue.
I’m referring to the High Water Marks at the top of the page, not the raw CPU%. I never see them > 100% but often see them at 100% … the raw often goes above 100.
Ok so I rebooted the system. Now when I perform the same tests, the “Rule Added” log occurs within seconds, the CPU% goes to 100-120% briefly, and the HWM never gets anywhere close to 100%.
So, basically something is getting slowed down dramatically during “Auto save” events.
Not sure when the next time this will happen …
Is the file being saved incrementally? Some editors will save a file as you type. That can build up quite a backlog of parsing.
Beyond that I don’t really know. There haven’t been any recent reports of this sort of behavior that I recall.
Yes, it is saving very often (not sure if it’s while I type), but every few seconds or so … but this isn’t an issue normally.
Also, if it wasn’t for the 45 minutes to an hour it takes for Zigbee to become usable, I wouldn’t care much about this bug and just reboot. … getting pretty frustrated with OH at this point
But it’s an issue that can build. Every save could kick off a reparse of the file. Only one parse of a given file can occur at a time so each change gets queued. Five saves, five parses of the file.
But this was always mainly an issue with Rules DSL and it’s supposed to be better in OH 3.
Correct. And when the system is behaving, this isn’t an issue. I can make dozen of changes and it auto-saves all the time, often with failed compiles and then after I finishes typing, it recompiles and its done. Works perfectly all the time… until it starts quickly getting slower and slower when even a single character change can take 45 seconds to complete a compile.
You really don’t know if you are looking at cause or effect here. Rule compiling will not be optimized to be particularly efficient - why should it, it’s not an everyday activity. So if something else is nibbling away resources, you’ll see the slower and slower result. Equally, it might be the rule compiling process that actually is the culprit, gobbling something up each time. Clearly your editor is giving it a thrashing to highlight the issue, but that gives no clue as to where the issue lies.
Agreed. The slowness seems to be only with the Rule compiling process as nothing else is noticeably slow (and the Rule compiling is incredibly slow) … but certainly, something else could be the culprit.
Unrelated I’m pretty sure, but the boot up time for my Zigbee network of 55 devices is also ridiculously slow (45-60 minutes to be fully useable) … but after the zigbee network is up, it seems to be mostly fine. (as I say, likely unrelated)
BTW, I am using VS Code on a Mac with Auto-Save on … doesn’t sound like a particularly unusual use-case.