HABApp potential thread / event contention issue

leif · May 3, 2022, 6:44am

The more I get used to HABApp and Python, the more useful it gets, and the more I like it! I’m relying more and more on it for all sorts of home automation tasks. Unfortunately I’m also running into a performance issue more often the more I use it.

I’m humbly asking for help here, there’s no way I can sort this one out on my own, this is going to get deep. Pretty please with sugar on top @Spaceman_Spiff

First of all – my whole system runs on MQTT. Mosquitto runs in a docker on the same 8gb RPi4 that openHAB and HABApp runs on, connected with 1g ethernet, and every. single. device in my entire system is MQTT, whether Homie or HA-style.

So, here’s what happens. I’ve written a script to apply scenes, by setting the state of multiple items at once. It can apply the state of openHAB items as well as set MQTT topics. Everything happens in an asynchronous, staggered fashion, precisely to avoid contention.

Here is a link to the entire script on pastebin

Here is how the script might be called:

	def fnStartWorkScene(self):
		scene.Apply(self, delayTime=0.3,
		  #log=log,
		  actions=[
			[ "garagemcu_WorkbenchLights=on" ],
			[ "garagemcu_SinkLights=on" ],
			[ "garagemcu_SideLights=on" ],
			[ "GaragePC=on" ],
			[ "GarageStatus=on" ],
			[ "GarageLights=on", "dimmer*GarageLightsDimmer=2" ],
			[ "WorkshopLights=on", "dimmer*WorkshopLightsDimmer=2" ],
			[ "number*mezzaninependants_Dimmer=0.5" ],
			[ "GarageAudio=on" ],
			[ "BasementAudio=on" ],
			[ "garagemcu_SinkOutlet=on" ],
			[ "garagemcu_BenchOutlet6=on" ],
			]
		  )

If it sounds like a lot of things for a garage, it is - my garage also holds my workshop and electronics lab.

If you look at the scene script link above, you can see that everything happens asynchronously with self.rule.run.soon or self.rule.run.at, only one item at a time.

Here is what works perfectly with no performance issues:

openHAB item:

String GarageScene "Garage Scene" {expire="1s,command=IDLE"}

openHAB sitemap:

Switch item=GarageScene label="Garage Scene []" icon=wallswitch  mappings=[OFF="OFF",WORK="WORK",COZY="COZY", HOUSEKEEPING="CLEAN"]

HABApp script:

		StringItem.get_item('GarageScene').listen_event(self.item_scene_command, ItemCommandEvent)

	def item_scene_command(self, event):
		assert isinstance(event, ItemCommandEvent)
		#log.info(f"current: {event}")
		if event.value=="OFF":
			self.fnAllStop(False)
		elif event.value=="WORK":
			self.fnStartWorkScene()
		elif event.value=="HOUSEKEEPING":
			self.fnStartHousekeepingScene()
		elif event.value=="COZY":
			self.fnStartCozyScene()

Again, the above works perfectly, performance is fine, no contention, no warnings.

So, what doesn’t work?

Well, here’s where it gets more complicated. My control paradigm for my house is not really scenes triggered from openHAB… it is 433 MHz wall mounted remotes, with messages coming in through MQTT. When I act on those, HABApp seems to run into contention issue.

So, here’s the flow.
Multiple ESP8266 and ESP32 units have 433 MHz receivers and run custom software with rc-switch filtered through my own superbuttons library.

Publishing raw 433 MHz received codes to MQTT would certainly cause a flood, so to avoid that, I wrote the superbuttons library which tracks and distills received codes down to the minimal number of messages while keeping maximum flexibility.

What gets published to MQTT (one topic for each receiver unit) is as follows. The button code in these examples is e6d798.

Let’s say that you press the button once, quickly.

e6d798*TALLY,1,0
e6d798*RELEASE,1,0
e6d798*DONE,1,0

Let’s say you hold it down for two seconds:

e6d798*TALLY,1,0
e6d798*SOLID,0,1
e6d798*MEDIUMPRESS,0,3
e6d798*LONGPRESS,0,7
e6d798*VERYLONGPRESS,0,15
e6d798*RELEASE,1,15
e6d798*DONE,1,15

Let’s say you triple-click the button:

e6d798*TALLY,1,0
e6d798*SOLID,0,1
e6d798*RELEASE,1,1
e6d798*TALLY,2,1
e6d798*RELEASE,2,1
e6d798*TALLY,3,1
e6d798*RELEASE,3,1
e6d798*DONE,3,1

So, there are a few messages, but not that many.
Especially if you hold a button down, for example to raise or lower volume or to dim a light, it saves quite a few messages.

The data that is posted is just enough to make it easy to script actions or scenes.

For example, if you wanted to select scenes based on how many times the button is pressed, you would simply look for the DONE message, with the first parameter being the number of presses, and ignore everything else.

For a volume up button, you might increment 1 for every TALLY message, start a repeating timer on LONGPRESS, and stop the timer on RELEASE.

The key is that this makes it possible to script anything without having to modify the ESP8266/ESP32 code. All codes are passed through, but in a distilled fashion, precisely to avoid flooding things downstream.

Of course, in a large room, you need more than one receiver to have full coverage. In the garage I have four receivers in different locations. The different receivers publish to different topics. The script that acts on a certain fixed wall-mounted remote only listens to one topic, but multiple receiver may still hear the remote, even if not perfectly reliably, so they will post messages too.

And, there’s where I’m running into problems.
If I trigger my scenes from 433 MHz remotes, then I get messages like this almost every time:

[2022-05-03 12:44:46,021] [            HABApp.Worker]  WARNING | Execution of GarageRemotes.run took too long: 1.38s
[2022-05-03 12:44:46,046] [            HABApp.Worker]  WARNING |    ncalls  tottime  percall  cumtime  percall filename:lineno(function)
[2022-05-03 12:44:46,050] [            HABApp.Worker]  WARNING |         1    0.000    0.000    1.384    1.384 /config/lib/leifutil/scene.py:98(run)
[2022-05-03 12:44:46,052] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.384    0.692 /usr/local/lib/python3.8/site-packages/HABApp/rule/scheduler/habappschedulerview.py:20(at)
[2022-05-03 12:44:46,054] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.383    0.692 /usr/local/lib/python3.8/site-packages/eascheduler/scheduler_view.py:18(at)
[2022-05-03 12:44:46,056] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.383    0.692 /usr/local/lib/python3.8/site-packages/eascheduler/jobs/job_one_time.py:11(_schedule_first_run)
[2022-05-03 12:44:46,057] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.382    0.691 /usr/local/lib/python3.8/site-packages/eascheduler/jobs/job_base.py:37(_set_next_run)
[2022-05-03 12:44:46,059] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.382    0.691 /usr/local/lib/python3.8/site-packages/HABApp/rule/scheduler/scheduler.py:49(add_job)
[2022-05-03 12:44:46,061] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.357    0.678 /usr/local/lib/python3.8/concurrent/futures/_base.py:416(result)
[2022-05-03 12:44:46,063] [            HABApp.Worker]  WARNING |         2    0.000    0.000    1.357    0.678 /usr/local/lib/python3.8/threading.py:270(wait)
[2022-05-03 12:44:46,064] [            HABApp.Worker]  WARNING |         4    1.356    0.339    1.356    0.339 {method 'acquire' of '_thread.lock' objects}
[2022-05-03 12:44:46,066] [            HABApp.Worker]  WARNING |         1    0.000    0.000    0.678    0.678 /usr/local/lib/python3.8/site-packages/HABApp/rule/scheduler/habappschedulerview.py:65(soon)
[2022-05-03 12:44:46,066] [            HABApp.Worker]  WARNING |         2    0.000    0.000    0.024    0.012 /usr/local/lib/python3.8/asyncio/tasks.py:911(run_coroutine_threadsafe)

Sometimes multiple ones during one scene application.
It’s always acquiring _thread.lock that takes time.
It never happens when scenes are triggered through an openHAB item… it really seems to be caused by there being more MQTT traffic than normal.
Still, there’s not all that many messages in total, I wouldn’t have thought this would be an issue, I mean we’re talking a multi-core multi-ghz machine after all.

Edit: Okay, maybe there are way more messages than I first thought. I set up a mosquitto subscription on homie/+/properties/rx433 and pushed a single scene select button once. Here is the result. The code was 7d784 but some receivers also misheard the code as 7d780, resulting in additional messages. Multiple receivers heard the code even though only the callback for garage-loft-ws281x actually acts on it. This is of course the nature of airwaves, signals go well beyond their useful range.

Multiple rule callbacks in multiple files will be called in response to each of these. Still, while it may look like a lot to the naked eye, we have to remember this is running on a quad core 1.5 GHz CPU, not a arduino uno. It’s tens of events, not thousands. Also, there’s a separate callback function for each receiver topic (and separate rules for most), although they are all going to the same HABApp, that is sort of the point.

Having written lots of multi-threaded software myself (realtime audio processing, C++) I know this is not an easy or particularly fun issue to track down.

Could it be that some part of HABApp is holding on to a thread lock while performing something time-consuming, when instead it might have copied the shared data, released the thread lock, and then done its time consuming action?

Help, please?

leif · May 4, 2022, 7:45am

Okay, it was a HABApp core performance issue. There is lock contention somewhere in the core, causing grid-lock when too many messages come in at the same time.

I have worked around the problem by completely bypassing the HABApp core, instead setting up my own MQTT connection and subscriptions which I’m using for all the remote handlers, getting separate callbacks for each topic rather than everything being tied to the same lock. This way my entire remote handler exists outside of HABApp’s lock structure.

Here’s how i did it: HABApp directmqtt library

Also necessary: directmqttconfig.py

class conf(object):
	def __init__(self):
		self.host="192.168.1.234"
		self.port=1883
		self.username="user"
		self.password="pass"

To use it, I just add from leifutil import directmqtt to the beginning of my rule file, and then instead of:

MqttItem.get_create_item(topic)
self.listen_event(topic, self.on_mqtt_message, ValueUpdateEvent)

…I simply do:
self.direct_mqtt_bench = directmqtt.DirectMqtt(self, self.on_mqtt_message, topics=[topic])

No other code changes necessary in each rule. The difference is night and day.

So, this solves the problem for now so that I can keep working, but I’d be happy to dump this workaround if HABApp’s basic core performance improves.

Spaceman_Spiff · May 4, 2022, 8:19am

Easier said than done.
You can run a benchmark with the -b command line switch and it gives me about 2.2k msg / sec for mqtt.
For all we know it could be a rule related issue where you do something in multiple rules which in combination causes the lock congestion.
If you can build something where it’s possible to reproduce the issue I’ll gladly look into it.

It’s okay to build your own workaround and it’s one of the great benefits that you can load up any python code and tailor it to your needs. Just make sure that when you report an issue that it’s not related to your code.

leif · May 4, 2022, 9:07am

Truth! It really is.

How do I do this with docker? I’ve never been able to properly run habapp from the command line, nor have I ever actually gotten console output to work. I’m still developing with log.info() and tailing the HABApp.log file as my only output. I skimmed through the documentation just now and was not able to find the reference to this, especially not right next to the Docker section.

It might be possible but also probably easier said than done. First of all, what is the lock protecting? What all is using the same lock? It would be nice if the log mentioned what it was waiting for, that would help tracking down the issue.

I’ve actually done this myself in my own C++ codebase. When I grab a mutex, I do it with a unique identifier, so that if another piece of code has to wait for it, there is a record of what it’s waiting for. In C++ i do this with a const char * to a hardcoded string, so that the string handling itself doesn’t itself need locks. Also, if something ever deadlocks, then there’s a record of both who has it and who is waiting.

Spaceman_Spiff · May 4, 2022, 12:34pm

You would have to modify the dockerfile and rebuild the docker image locally.
Imho the command line options are only feasible if you run an installation in a virtual environment

There are a couple of locks, e.g. for accessing the scheduler from threads and scheduling async coroutines from threads.
The log snipped indicates that something is blocking the scheduler or to be precise the thread to async path.

leif · May 4, 2022, 3:09pm

How about making it possible to supply command line options in the environment? That sounds much easier, one could then just modify docker-compose.yml instead of the entire container. There is a mechanism to supply environment variables this way which openHAB itself uses, for example:

    environment:
      OPENHAB_HTTP_PORT: "80"
      OPENHAB_HTTPS_PORT: "443"
      EXTRA_JAVA_OPTS: "-Duser.timezone=Asia/Bangkok"

I’m not against running HABApp in a more direct development environment, I am sure I will get there one of these days, but it would still be useful to be able to measure the performance of the actual runtime environment, as opposed to measuring the performance of running in the development environment.

How can we find out what is blocking the thread to async path? More detailed logging? I think that might be easier than attempting to duplicate my environment?

Spaceman_Spiff · May 6, 2022, 9:54am

This path is part of the python standard library so it’s not easy to add logging there.
Would it be possible to “record” your mqtt messages and re-publish them from a python script. That way you can check if the issue can be reproduced. You don’t have to use your home automation machine but can do this on your workstation by just starting a broker and connect HABapp locally.

leif · May 6, 2022, 10:50am

Ahh, I did not realize that.

That is a good idea! You are right – I keep forgetting how modular HABApp is. I will try that. Thank you.

I’m also learning to manage my expectations. For example, I tried to implement “push and hold to increase volume” like this (pseudo-code). I made my own repeating timer based on Thread. The volume control is a Homie MQTT item.

On remote press: self.curvalue=NumberItem.get_value(), self.curvalue+=5, oh_send_command(self.curvalue), start repeating timer (0.25 second interval)
On each timer: self.curvalue+=5, oh_send_command(self.curvalue)
On remote release: stop repeating timer

That did not work without stuttering and stalling, there was no chance in hell. So, rather than
trying to go through openHAB, I re-did it as follows:

On remote press: send_udp(audio_processor_ip, "volume+=2"), start repeating timer (0.125 second interval)
On each timer: send_udp(audio_processor_ip, "volume+=2")
On remote release: stop repeating timer

Since the audio processor is actually my own, I could make it accept UDP packets – in fact I had already added that feature and forgot about it.

This way, it works perfectly smoothly. It does break the structure a bit but I can live with making exceptions for heavy things like repeated actions, while sticking to the proper structure for simpler things like on/off and scenes.

Spaceman_Spiff · May 6, 2022, 5:43pm

Since you have already another script with mqtt set up you could use this to publish the messages without using HABapp. That way you ensure that there is no influence.
It would be just a bit copy paste. The script would just

connect to mqtt
publish the messages
disconnect.

That way you should be able to reproduce the issue.

leif · May 9, 2022, 5:53am

Hello Sebastian,

Just checking in. Today I decided to start tackling the project of capturing and replaying MQTT messages, in order to try to reproduce the issues I’ve seen.

So, I have a lot to learn.

I’ve only ever programmed python in two environments: JSR223/Jython in OH 2.4, and HABApp in Docker with OH 3. I’ve never properly set up a development environment.
Setting up Python on my workstation, and reading the Installing Packages guide, I suddenly realize that the term “virtual environment” has a very specific meaning in Python!

I really had no idea.

I now finally understand what you meant.

I’m gonna be a while, but I’m not giving up. I’ve been wanting to dive deeper into Python for a while, I guess today is the day. It’s even raining outside!

Okay, python and pycharm working nicely on my workstation, Mqtt Capture and Mqtt Replay tools written from scratch and working nicely.

Now I just have to figure out how to use them to reproduce the problem.

I will start with a clone of my Pi4 image, but start with fresh OH and HABApp containers, and see if I can make it happen.

leif · May 10, 2022, 6:00pm

Okay, so with fresh OH and HABApp containers, I was not able to reproduce the problem with MQTT replay and a few test items in OH. Everything flowed smoothly and was lightning fast.

So, I’ve now spent the rest all day trying other things… like faster SSDs in the RPi. Nothing made any appreciable difference.

Then, I discovered the docker stats command.

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O     PIDS
b0fe37a38f44   openhab_openhab_1   133.28%   0B / 0B             0.00%     8.21MB / 23.2MB   448MB / 0B    175
86d103476284   samba_samba_1       0.01%     0B / 0B             0.00%     121kB / 130kB     40.9MB / 0B   5
f4a36d129c6f   mosquitto_mqtt_1    1.65%     0B / 0B             0.00%     8.67MB / 17.3MB   6.09MB / 0B   1
f3b07ba3f563   unifi               29.63%    0B / 0B             0.00%     3.67MB / 596kB    281MB / 0B    132
38c1a9f43a85   habapp_habapp_1     48.89%    0B / 0B             0.00%     3.13MB / 692kB    0B / 0B       21

Hoooooooly crap. I was not expecting that. When HABApp is running, openhab and habapp use 30-40% each constantly, with occasional huge peaks.
If I shut HABApp down, OH drops to 2-3% idle.

So, that’s some crazy CPU usage. But, I figured, maybe it’s one of my scripts?
So, I moved ALL of my scripts out.
Usage dropped way down to single digit %, for both HA and OH.
Then I moved the scripts back, one by one.
Then suddenly I had moved ALL the scripts back.
Usage was still in the single digits, and performance was perfect!

So then, I shut the habapp container down… and started it again.
As you can probably guess, CPU is pegged again!

And then, I moved the scripts out again, folder by folder… and this time CPU usage STAYED high. With no scripts at all remaining.

Then I moved them all back in, and all back out. CPU usage dropped again!
…and then I moved the back in. CPU usage stayed low.

Restarting the container without scripts was totally fine too.

So, what do you think @Spaceman_Spiff?

Actually, I have an idea what you might say:

Do you think this might be the same issue that is solved in the beta? If so I’d sure love to try it.

I myself have my doubts as to whether it’s the same issue.
Because, I noticed now when modifying a particular script, that not only was CPU usage high, and would not go down when I moved the script out… the script kept running and HABApp would not reload it no matter what I did, until I restarted the HABApp container.

So, could it be that something in the auto file refresh part of HABApp got stuck in a tight loop?

By the way, this is the performance on my system when CPU is not pegged.

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O     PIDS
b0fe37a38f44   openhab_openhab_1   1.93%     0B / 0B             0.00%     22.9MB / 54.5MB   448MB / 0B    168
86d103476284   samba_samba_1       0.02%     0B / 0B             0.00%     2.01MB / 2.48MB   40.9MB / 0B   5
f4a36d129c6f   mosquitto_mqtt_1    1.03%     0B / 0B             0.00%     26.9MB / 54.8MB   6.09MB / 0B   1
f3b07ba3f563   unifi               6.71%     0B / 0B             0.00%     10.7MB / 1.59MB   283MB / 0B    132
df5b4d6d6dc3   habapp_habapp_1     2.40%     0B / 0B             0.00%     2.52MB / 340kB    0B / 0B       19

Edit:
Turns out I still had one script that was using an MqttItem.
Once I changed that to use my own directmqtt object, HABApp seems to starts up fine, without getting into the pegged CPU state.

Spaceman_Spiff · May 10, 2022, 8:07pm

No - I think this is something different.

Does the high cpu load disappear when you disable the mqtt connection in the config file?
Maybe there really is too much traffic coming in from mqtt? Is it more than these 10 messages?
Have you tried subscribing more selectively?

For me CPU is at < 1% with very light mqtt traffic.

leif · May 11, 2022, 5:16am

That’s not quite it. HABApp sometimes ran just fine with low CPU load with the MQTT connection and MqttItems – as long as it didn’t get into the high CPU load state first!

Using the MqttItem simply seems to make it more likely that it gets into the high CPU load state. It doesn’t seem to be the amount of messages.

Mosquitto itself is using 2% CPU, processing ~2200 received and ~6900 sent per minute.

Since replacing the last (i think) MqttItem with directmqtt, I have not seen it get into high CPU state.

To me, the fact that it also can work fine with MqttItems, seems to indicate some kind of race condition that sometimes triggers some kind of tight loop.

I don’t really have enough information to be able to get to the root cause of this issue now, I’m just glad it’s stable again. If I run across it again I will see if I can figure out what really triggers it.

But, I’m wondering about one thing. (For some reason my second edit with this question got lost).

The normal load is now OH 3% and HA 5%.
But, if I update a single NumberItem four times a second, the load becomes OH 10% and HA 14%.
Is it supposed to be that slow?

I’m even getting the object at init, so I don’t have to look it up later:
self.itemWatts=NumberItem.get_item("EnergyMeter_Power")

And then, in the MQTT message handler, I decode the incoming JSON and post this one value:
self.itemWatts.oh_post_update(watts_readout)

If I comment out just the oh_post_update, but leave everything else, load is OH 3% and HA 5%. Just posting this one numeric value adds huge load. Any idea what that could be? Is it really supposed to be that slow?

Spaceman_Spiff · May 11, 2022, 7:09am

No, not at all! I just tested it and it still is below 1% cpu usage.
Are you using BasicAuth on the Openhab side as described in the docs?

Rule which will forward ~10 msgs/sec from mqtt to openhab:

from HABApp import Rule
from HABApp.mqtt.items import MqttItem
from HABApp.mqtt.events import MqttValueUpdateEvent
from HABApp.openhab.items import StringItem
from uuid import uuid4


class TestRule(Rule):
    def __init__(self):
        super().__init__()
        self.task_pub = self.run.every(None, 0.1, self.pub)

        self.mqtt_item = MqttItem.get_create_item('test_bench/topic')
        self.mqtt_item.listen_event(self.rec, MqttValueUpdateEvent)

        self.oh_item = StringItem.get_item('BenchTestItem')

    def pub(self):
        self.mqtt_item.publish(str(uuid4()))

    def rec(self, event):
        self.oh_item.post_value(event.value)


TestRule()

Edit:
With 40 msgs/sec from an external script to mqtt forwarding takes ~1% CPU usage. However this is my main machine so running it on an embedded device will propably take more.

leif · May 11, 2022, 7:42am

I sure am! Belt and suspenders.

2022-05-11_14-31-56

Thank you for the bench test rule! I’ve tested. Here are my results.

Idle:
OH 3%, HA 5%

TestRule interval 0.1 to a brand new BenchTestItem which is not displayed anywhere:
OH 3%, HA ~17% !!

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O     PIDS
241b7491c6a2   habapp_habapp_1     16.30%    0B / 0B             0.00%     373MB / 132MB     213kB / 0B    21
b0fe37a38f44   openhab_openhab_1   3.04%     0B / 0B             0.00%     217MB / 305MB     446MB / 0B    133
86d103476284   samba_samba_1       0.01%     0B / 0B             0.00%     6.18MB / 7.39MB   40.9MB / 0B   6
f4a36d129c6f   mosquitto_mqtt_1    1.62%     0B / 0B             0.00%     422MB / 785MB     6.09MB / 0B   1
f3b07ba3f563   unifi               1.29%     0B / 0B             0.00%     158MB / 22.4MB    329MB / 0B    120

After that test, I added the BenchTestItem to my sitemap.
But, with oh_item.post_value, the value is not actually posted to openHAB.
So, I changed that to oh_item.oh_post_update.
That brought it to its knees. HA is showing ~40%, OH is showing ~22%, and it’s not keeping up, it’s stalling frequently. No error messages or warnings in HABApp.log though!

Changing the interval to 100 (from 0.1) immediately brought load down to OH 3%, HA 5% again.

Edit: I just got memory stats working, ram is definitely not the problem. This is idle load.

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
241b7491c6a2   habapp_habapp_1     3.57%     52.43MiB / 7.629GiB   0.67%     4.85MB / 1.29MB   31.2MB / 73.7kB   19
b0fe37a38f44   openhab_openhab_1   4.53%     728.1MiB / 7.629GiB   9.32%     1.84MB / 1.95MB   447MB / 73.1MB    133
86d103476284   samba_samba_1       0.01%     27.73MiB / 7.629GiB   0.35%     145kB / 129kB     40.9MB / 3.82MB   6
f4a36d129c6f   mosquitto_mqtt_1    1.22%     11.73MiB / 7.629GiB   0.15%     6.46MB / 10.9MB   6.22MB / 0B       1
f3b07ba3f563   unifi               1.71%     891.6MiB / 7.629GiB   11.41%    2.78MB / 510kB    271MB / 9.17MB    127

leif · May 11, 2022, 9:20am

@Spaceman_Spiff I just wrote the following test code of my own, to benchmark the OH rest api without HABApp, and ran it in python on my workstation.

import requests
import time
from uuid import uuid4

headers = {'Content-Type': 'text/plain', 'Accept': 'application/json'}

try:
	while True:
		myuuid=str(uuid4())
		print(myuuid)
		x = requests.post("http://openhab.lan:80/rest/items/BenchTestItem", data=myuuid, headers=headers)
		time.sleep(0.1)
except KeyboardInterrupt:
	pass

With BenchTestItem present in the sitemap, OpenHAB CPU jumps up to 25%.
If I don’t include it in the sitemap, CPU usage is “just” 15%, but that’s still huge for a few text updates.

This is certainly not HABApp’s fault.

Any ideas? Would still appreciate any pointers from you since you have a lot of experience using the REST API.

Spaceman_Spiff · May 11, 2022, 9:24am

Sorry - missed that one.
After the change I get ~1.5% CPU on OH and ~6% on HABApp.
This seems high but might translate to the percentages you are seeing.
I guess we’ll have to dig further but imho this does not correlate with the thread congestion.

leif · May 11, 2022, 9:25am

I agree, these are separate issues. They did all contribute to the unacceptably poor performance I was experiencing though, which made me dive in and take a look.

You mentioned you’re running HA on your workstation, but what are you running OH on?

Spaceman_Spiff · May 11, 2022, 9:37am

I am just spinning up instances of OH, Mosquitto and HA on my workstation and connect them all through localhost.

I’ve tested OH 3.2 and 3.3 and it’s roughly the same:

3.2

Bench item operations ... done!

            |  dur   | per sec | median |  min   |  max   |  mean
create item | 6.365s |  47.132 | 19.0ms | 18.0ms | 0.343s | 21.2ms
update item | 6.932s |  43.276 | 20.0ms | 18.6ms | 0.619s | 23.1ms
delete item | 2.433s | 123.300 | 8.00ms | 6.93ms | 23.0ms | 8.11ms

Bench item state update .... done!

                      |  dur   | per sec | median |  min   |  max   |  mean
             rtt idle |  15.0s | 114.920 | 9.00ms | 7.62ms | 66.5ms | 8.70ms
       async rtt idle |  15.0s | 118.704 | 8.00ms | 6.99ms | 16.0ms | 8.42ms
      rtt load (+10x) |  15.0s |  13.624 | 73.0ms | 50.0ms | 0.132s | 73.4ms
async rtt load (+10x) |  15.1s |  13.752 | 72.0ms | 70.0ms | 84.0ms | 72.7ms

3.3

Bench item operations ... done!

            |  dur   | per sec | median |  min   |  max   |  mean
create item | 7.025s |  42.704 | 22.0ms | 21.0ms | 42.0ms | 23.4ms
update item | 5.500s |  54.544 | 17.0ms | 14.9ms | 41.0ms | 18.3ms
delete item | 2.644s | 113.463 | 8.00ms | 6.96ms | 15.0ms | 8.81ms

Bench item state update .... done!

                      |  dur   | per sec | median |  min   |  max   |  mean
             rtt idle |  15.0s | 109.070 | 9.00ms | 7.87ms | 42.0ms | 9.17ms
       async rtt idle |  15.0s | 108.571 | 9.00ms | 7.59ms | 19.0ms | 9.21ms
      rtt load (+10x) |  15.0s |  12.716 | 78.0ms | 59.0ms | 0.117s | 78.6ms
async rtt load (+10x) |  15.1s |  12.673 | 78.0ms | 64.9ms | 99.0ms | 78.9ms

leif · May 11, 2022, 9:40am

Oh wait, that’s a huge jump! You mentioned ~1% before.
If you’re really seeing ~6%, that’s a jump of 5 percentage points. I saw a jump of about 12 percentage points.

It’s not inconceivable that your workstation is two and a half times faster than my raspberry pi!

That means we’re actually seeing the same performance, the only difference is your much beefier machine.

Considering the small amount of actual work that is being done – that is, updating a tiny text string – that is hugely inefficient. I mean, if we compare the performance of MQTT to the performance of sending a message to OH through the REST API, that’s a couple of orders of magnitude.

Have you ever considered making some kind of (optional) OH add-on for HA to communicate more directly with the OH eventbus, bypassing the REST API for anything performance critical?
In fact, one might even consider using MQTT for the communication layer between the HA Openhab-module and HA itself?