HABApp performance considerations

Next topic :slight_smile: I’m in the midst of migrating from Jython and run into many new topics…

I have migrated 80% of my rules now, this sums up to 150 rule objects. Average number of triggers is 1 or 2, in addition many rules are triggered by time intervals regularly. I do a lot of energy monitoring, so a number of rules is triggered frequently. A rough estimate is the execution of 5 to 10 rules per second as the HABApp system load.

Observation:

After restarting the HAPApp service, the process system load is low - lower than that of openHAB running on the same machine (RPi5). CPU% shows 1 to 3% for both HABApp and openHAB. I log execution time by trigger executed. Initially they are all quick and finish faster than 0.01 seconds.

After about 5 minutes, I see the HABApp process suddenly in top position (top shell command) with 100% load. A short time later, HABApp starts issuing warnings on long rule execution time

2024-11-29 17:57:25.021 [WARN ] [HABApp.Worker                       ] - 12/16 threads have been in use and execution of Rule_Wind_Power_String._on_event took too long: 3.36s

The amount of threads goes up slowly and reaches the max number allowed. From now on, these warnings all show 16/16. I changed the thread pool size to 32, same effect. The number of threads used is going up to 32/32 and stays there.

When the max number of threads is reached, I get Starting time warnings like this:

2024-11-29 18:01:27.789 [WARN ] [HABApp.Worker                       ] - Starting of Rule_Security_System_Alarm._on_event took too long: 526.57s. Maybe there are not enough threads?

The time shown raises and is roughly the same for all warnings shown. I assume this is the time rule execution is deferred because no threads are available any more…

I have made sure no rules executed an endless loop. It is just that 100% load that first slows down executing triggers of rules and finally can’t process the rule load anymore and run out of threads.

Any hint what I can do to troubleshoot that? Thanks.

It seems you are doing something that congests the threads.
Do you use any sleeps or expensive io? Something cumulative that adds up?
Any idea what would make the rule take that long?

What does that mean?

Updated post…

What I meant with “log execution time” is this code in every rule’s triggered function:

	def _on_event(self, event = None):
		info(LOG, self.rule_name + " triggered by " + str(event))
		start_time = time.time()
		self.on_event(event) # do the actual work
		info(LOG, self.rule_name + " finished after a processing time {:.1f} s", time.time()-start_time)

The log output shows the named <0.01s processing time initially. It is going up to several seconds after 5 minutes (see below). I have made sure self.on_event() is not blocking for any rule by checking the pairing of the log outputs above.

I have added some thread monitoring showing all threads of the HABApp process - one during expected operation, and one one minute later when the system is throttled completely:

Report

The system is running as expected for around 5 minutes. During this time, the per rule execution time is mostly below 0.01s with some smaller (and expected) exceptions. ps is showing HABApp with 1 to 3% processor time and here is a thread report at this time:

Thread ID: 884201 | Name: MainThread           | User Time: 4.79s | System Time: 0.43s
Thread ID: 884203 | Name: Thread-1             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884204 | Name: HABApp_LogBufferEven | User Time: 0.11s | System Time: 0.01s
Thread ID: 884205 | Name: Thread-3             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884206 | Name: Thread-2             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884207 | Name: Thread-5             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884208 | Name: Thread-4             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884209 | Name: Thread-7             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884210 | Name: Thread-6             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884211 | Name: Thread-9             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884212 | Name: Thread-8             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884213 | Name: asyncio_0            | User Time: 0.00s | System Time: 0.00s
Thread ID: 884224 | Name: HabAppWorker_0       | User Time: 1.14s | System Time: 0.04s
Thread ID: 884233 | Name: HabAppWorker_1       | User Time: 0.36s | System Time: 0.01s
Thread ID: 884236 | Name: HabAppWorker_2       | User Time: 0.08s | System Time: 0.01s
Thread ID: 884237 | Name: HabAppWorker_3       | User Time: 0.06s | System Time: 0.01s
Thread ID: 884278 | Name: HabAppWorker_4       | User Time: 0.09s | System Time: 0.02s
Thread ID: 884279 | Name: HabAppWorker_5       | User Time: 0.05s | System Time: 0.01s
Thread ID: 884280 | Name: HabAppWorker_6       | User Time: 0.09s | System Time: 0.01s
Thread ID: 884281 | Name: HabAppWorker_7       | User Time: 0.05s | System Time: 0.02s
Thread ID: 884282 | Name: HabAppWorker_8       | User Time: 0.04s | System Time: 0.01s
Thread ID: 884349 | Name: HabAppWorker_9       | User Time: 0.05s | System Time: 0.01s
Thread ID: 884795 | Name: Thread-10            | User Time: 0.00s | System Time: 0.00s

Then, after these 5 minutes, HAPApp is going to 100% within a second and stays there. Worker thread HabAppWorker_10 start to show a lot of user time / processing.

Thread ID: 884201 | Name: MainThread           | User Time: 5.94s | System Time: 0.63s
Thread ID: 884203 | Name: Thread-1             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884204 | Name: HABApp_LogBufferEven | User Time: 0.17s | System Time: 0.04s
Thread ID: 884205 | Name: Thread-3             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884206 | Name: Thread-2             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884207 | Name: Thread-5             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884208 | Name: Thread-4             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884209 | Name: Thread-7             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884210 | Name: Thread-6             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884211 | Name: Thread-9             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884212 | Name: Thread-8             | User Time: 0.00s | System Time: 0.00s
Thread ID: 884213 | Name: asyncio_0            | User Time: 0.00s | System Time: 0.00s
Thread ID: 884224 | Name: HabAppWorker_0       | User Time: 1.18s | System Time: 0.04s
Thread ID: 884233 | Name: HabAppWorker_1       | User Time: 0.39s | System Time: 0.02s
Thread ID: 884236 | Name: HabAppWorker_2       | User Time: 0.11s | System Time: 0.03s
Thread ID: 884237 | Name: HabAppWorker_3       | User Time: 0.09s | System Time: 0.02s
Thread ID: 884278 | Name: HabAppWorker_4       | User Time: 0.11s | System Time: 0.03s
Thread ID: 884279 | Name: HabAppWorker_5       | User Time: 0.07s | System Time: 0.02s
Thread ID: 884280 | Name: HabAppWorker_6       | User Time: 0.12s | System Time: 0.01s
Thread ID: 884281 | Name: HabAppWorker_7       | User Time: 0.08s | System Time: 0.03s
Thread ID: 884282 | Name: HabAppWorker_8       | User Time: 0.16s | System Time: 0.02s
Thread ID: 884349 | Name: HabAppWorker_9       | User Time: 0.09s | System Time: 0.02s
Thread ID: 884795 | Name: Thread-10            | User Time: 0.00s | System Time: 0.00s
Thread ID: 884940 | Name: HabAppWorker_10      | User Time: 59.28s | System Time: 0.06s
Thread ID: 884941 | Name: HabAppWorker_11      | User Time: 0.02s | System Time: 0.01s
Thread ID: 884942 | Name: HabAppWorker_12      | User Time: 0.02s | System Time: 0.00s
Thread ID: 885080 | Name: HabAppWorker_13      | User Time: 0.01s | System Time: 0.00s
Thread ID: 885081 | Name: HabAppWorker_14      | User Time: 0.02s | System Time: 0.00s
Thread ID: 885082 | Name: HabAppWorker_15      | User Time: 0.01s | System Time: 0.00s

The odd thing is although I have instrumented all Rule triggered functions with output on user_time and active thread id, none of them seems to run on HabAppWorker_10. No rule uses excessive user_time at all, everything below 0.05s.

Is there anything except rules using the HABApp worker threads?

I will start to work with a Python profiler next, but any idea how to troubleshoot that is appreciated.

So, problem solved… Other than analyzed initially, there has been one rule with a trigger / event handling that did not return / terminate after Jython to Python3 migration. I had a bug in my analysis script to check proper pairing from call and return - which actually didn’t show the issue when I ran the analysis initially. This one rule has been scheduled to run every 10 minutes. Once started, it blocked thread HabAppWorker_10 (or any other). Due to the way CPython / GIL works, this actually throttled the execution of all threads and resulted in the problem described.

Thanks for looking into it, I hope this thread at least can help others when running is similar situations.

Maybe it makes sense to add a watchdog in a future HABApp version. A too long execution time gets a warning already, a watchdog could detect this even in case the function never returns.

Thanks.