HABApp potential thread / event contention issue

leif · May 4, 2022, 3:09pm

How about making it possible to supply command line options in the environment? That sounds much easier, one could then just modify docker-compose.yml instead of the entire container. There is a mechanism to supply environment variables this way which openHAB itself uses, for example:

    environment:
      OPENHAB_HTTP_PORT: "80"
      OPENHAB_HTTPS_PORT: "443"
      EXTRA_JAVA_OPTS: "-Duser.timezone=Asia/Bangkok"

I’m not against running HABApp in a more direct development environment, I am sure I will get there one of these days, but it would still be useful to be able to measure the performance of the actual runtime environment, as opposed to measuring the performance of running in the development environment.

How can we find out what is blocking the thread to async path? More detailed logging? I think that might be easier than attempting to duplicate my environment?

Spaceman_Spiff · May 6, 2022, 9:54am

This path is part of the python standard library so it’s not easy to add logging there.
Would it be possible to “record” your mqtt messages and re-publish them from a python script. That way you can check if the issue can be reproduced. You don’t have to use your home automation machine but can do this on your workstation by just starting a broker and connect HABapp locally.

leif · May 6, 2022, 10:50am

Ahh, I did not realize that.

That is a good idea! You are right – I keep forgetting how modular HABApp is. I will try that. Thank you.

I’m also learning to manage my expectations. For example, I tried to implement “push and hold to increase volume” like this (pseudo-code). I made my own repeating timer based on Thread. The volume control is a Homie MQTT item.

On remote press: self.curvalue=NumberItem.get_value(), self.curvalue+=5, oh_send_command(self.curvalue), start repeating timer (0.25 second interval)
On each timer: self.curvalue+=5, oh_send_command(self.curvalue)
On remote release: stop repeating timer

That did not work without stuttering and stalling, there was no chance in hell. So, rather than
trying to go through openHAB, I re-did it as follows:

On remote press: send_udp(audio_processor_ip, "volume+=2"), start repeating timer (0.125 second interval)
On each timer: send_udp(audio_processor_ip, "volume+=2")
On remote release: stop repeating timer

Since the audio processor is actually my own, I could make it accept UDP packets – in fact I had already added that feature and forgot about it.

This way, it works perfectly smoothly. It does break the structure a bit but I can live with making exceptions for heavy things like repeated actions, while sticking to the proper structure for simpler things like on/off and scenes.

Spaceman_Spiff · May 6, 2022, 5:43pm

Since you have already another script with mqtt set up you could use this to publish the messages without using HABapp. That way you ensure that there is no influence.
It would be just a bit copy paste. The script would just

connect to mqtt
publish the messages
disconnect.

That way you should be able to reproduce the issue.

leif · May 9, 2022, 5:53am

Hello Sebastian,

Just checking in. Today I decided to start tackling the project of capturing and replaying MQTT messages, in order to try to reproduce the issues I’ve seen.

So, I have a lot to learn.

I’ve only ever programmed python in two environments: JSR223/Jython in OH 2.4, and HABApp in Docker with OH 3. I’ve never properly set up a development environment.
Setting up Python on my workstation, and reading the Installing Packages guide, I suddenly realize that the term “virtual environment” has a very specific meaning in Python!

I really had no idea.

I now finally understand what you meant.

I’m gonna be a while, but I’m not giving up. I’ve been wanting to dive deeper into Python for a while, I guess today is the day. It’s even raining outside!

Okay, python and pycharm working nicely on my workstation, Mqtt Capture and Mqtt Replay tools written from scratch and working nicely.

Now I just have to figure out how to use them to reproduce the problem.

I will start with a clone of my Pi4 image, but start with fresh OH and HABApp containers, and see if I can make it happen.

leif · May 10, 2022, 6:00pm

Okay, so with fresh OH and HABApp containers, I was not able to reproduce the problem with MQTT replay and a few test items in OH. Everything flowed smoothly and was lightning fast.

So, I’ve now spent the rest all day trying other things… like faster SSDs in the RPi. Nothing made any appreciable difference.

Then, I discovered the docker stats command.

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O     PIDS
b0fe37a38f44   openhab_openhab_1   133.28%   0B / 0B             0.00%     8.21MB / 23.2MB   448MB / 0B    175
86d103476284   samba_samba_1       0.01%     0B / 0B             0.00%     121kB / 130kB     40.9MB / 0B   5
f4a36d129c6f   mosquitto_mqtt_1    1.65%     0B / 0B             0.00%     8.67MB / 17.3MB   6.09MB / 0B   1
f3b07ba3f563   unifi               29.63%    0B / 0B             0.00%     3.67MB / 596kB    281MB / 0B    132
38c1a9f43a85   habapp_habapp_1     48.89%    0B / 0B             0.00%     3.13MB / 692kB    0B / 0B       21

Hoooooooly crap. I was not expecting that. When HABApp is running, openhab and habapp use 30-40% each constantly, with occasional huge peaks.
If I shut HABApp down, OH drops to 2-3% idle.

So, that’s some crazy CPU usage. But, I figured, maybe it’s one of my scripts?
So, I moved ALL of my scripts out.
Usage dropped way down to single digit %, for both HA and OH.
Then I moved the scripts back, one by one.
Then suddenly I had moved ALL the scripts back.
Usage was still in the single digits, and performance was perfect!

So then, I shut the habapp container down… and started it again.
As you can probably guess, CPU is pegged again!

And then, I moved the scripts out again, folder by folder… and this time CPU usage STAYED high. With no scripts at all remaining.

Then I moved them all back in, and all back out. CPU usage dropped again!
…and then I moved the back in. CPU usage stayed low.

Restarting the container without scripts was totally fine too.

So, what do you think @Spaceman_Spiff?

Actually, I have an idea what you might say:

Do you think this might be the same issue that is solved in the beta? If so I’d sure love to try it.

I myself have my doubts as to whether it’s the same issue.
Because, I noticed now when modifying a particular script, that not only was CPU usage high, and would not go down when I moved the script out… the script kept running and HABApp would not reload it no matter what I did, until I restarted the HABApp container.

So, could it be that something in the auto file refresh part of HABApp got stuck in a tight loop?

By the way, this is the performance on my system when CPU is not pegged.

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O     PIDS
b0fe37a38f44   openhab_openhab_1   1.93%     0B / 0B             0.00%     22.9MB / 54.5MB   448MB / 0B    168
86d103476284   samba_samba_1       0.02%     0B / 0B             0.00%     2.01MB / 2.48MB   40.9MB / 0B   5
f4a36d129c6f   mosquitto_mqtt_1    1.03%     0B / 0B             0.00%     26.9MB / 54.8MB   6.09MB / 0B   1
f3b07ba3f563   unifi               6.71%     0B / 0B             0.00%     10.7MB / 1.59MB   283MB / 0B    132
df5b4d6d6dc3   habapp_habapp_1     2.40%     0B / 0B             0.00%     2.52MB / 340kB    0B / 0B       19

Edit:
Turns out I still had one script that was using an MqttItem.
Once I changed that to use my own directmqtt object, HABApp seems to starts up fine, without getting into the pegged CPU state.

Spaceman_Spiff · May 10, 2022, 8:07pm

No - I think this is something different.

Does the high cpu load disappear when you disable the mqtt connection in the config file?
Maybe there really is too much traffic coming in from mqtt? Is it more than these 10 messages?
Have you tried subscribing more selectively?

For me CPU is at < 1% with very light mqtt traffic.

leif · May 11, 2022, 5:16am

That’s not quite it. HABApp sometimes ran just fine with low CPU load with the MQTT connection and MqttItems – as long as it didn’t get into the high CPU load state first!

Using the MqttItem simply seems to make it more likely that it gets into the high CPU load state. It doesn’t seem to be the amount of messages.

Mosquitto itself is using 2% CPU, processing ~2200 received and ~6900 sent per minute.

Since replacing the last (i think) MqttItem with directmqtt, I have not seen it get into high CPU state.

To me, the fact that it also can work fine with MqttItems, seems to indicate some kind of race condition that sometimes triggers some kind of tight loop.

I don’t really have enough information to be able to get to the root cause of this issue now, I’m just glad it’s stable again. If I run across it again I will see if I can figure out what really triggers it.

But, I’m wondering about one thing. (For some reason my second edit with this question got lost).

The normal load is now OH 3% and HA 5%.
But, if I update a single NumberItem four times a second, the load becomes OH 10% and HA 14%.
Is it supposed to be that slow?

I’m even getting the object at init, so I don’t have to look it up later:
self.itemWatts=NumberItem.get_item("EnergyMeter_Power")

And then, in the MQTT message handler, I decode the incoming JSON and post this one value:
self.itemWatts.oh_post_update(watts_readout)

If I comment out just the oh_post_update, but leave everything else, load is OH 3% and HA 5%. Just posting this one numeric value adds huge load. Any idea what that could be? Is it really supposed to be that slow?

Spaceman_Spiff · May 11, 2022, 7:09am

No, not at all! I just tested it and it still is below 1% cpu usage.
Are you using BasicAuth on the Openhab side as described in the docs?

Rule which will forward ~10 msgs/sec from mqtt to openhab:

from HABApp import Rule
from HABApp.mqtt.items import MqttItem
from HABApp.mqtt.events import MqttValueUpdateEvent
from HABApp.openhab.items import StringItem
from uuid import uuid4


class TestRule(Rule):
    def __init__(self):
        super().__init__()
        self.task_pub = self.run.every(None, 0.1, self.pub)

        self.mqtt_item = MqttItem.get_create_item('test_bench/topic')
        self.mqtt_item.listen_event(self.rec, MqttValueUpdateEvent)

        self.oh_item = StringItem.get_item('BenchTestItem')

    def pub(self):
        self.mqtt_item.publish(str(uuid4()))

    def rec(self, event):
        self.oh_item.post_value(event.value)


TestRule()

Edit:
With 40 msgs/sec from an external script to mqtt forwarding takes ~1% CPU usage. However this is my main machine so running it on an embedded device will propably take more.

leif · May 11, 2022, 7:42am

I sure am! Belt and suspenders.

2022-05-11_14-31-56

Thank you for the bench test rule! I’ve tested. Here are my results.

Idle:
OH 3%, HA 5%

TestRule interval 0.1 to a brand new BenchTestItem which is not displayed anywhere:
OH 3%, HA ~17% !!

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O           BLOCK I/O     PIDS
241b7491c6a2   habapp_habapp_1     16.30%    0B / 0B             0.00%     373MB / 132MB     213kB / 0B    21
b0fe37a38f44   openhab_openhab_1   3.04%     0B / 0B             0.00%     217MB / 305MB     446MB / 0B    133
86d103476284   samba_samba_1       0.01%     0B / 0B             0.00%     6.18MB / 7.39MB   40.9MB / 0B   6
f4a36d129c6f   mosquitto_mqtt_1    1.62%     0B / 0B             0.00%     422MB / 785MB     6.09MB / 0B   1
f3b07ba3f563   unifi               1.29%     0B / 0B             0.00%     158MB / 22.4MB    329MB / 0B    120

After that test, I added the BenchTestItem to my sitemap.
But, with oh_item.post_value, the value is not actually posted to openHAB.
So, I changed that to oh_item.oh_post_update.
That brought it to its knees. HA is showing ~40%, OH is showing ~22%, and it’s not keeping up, it’s stalling frequently. No error messages or warnings in HABApp.log though!

Changing the interval to 100 (from 0.1) immediately brought load down to OH 3%, HA 5% again.

Edit: I just got memory stats working, ram is definitely not the problem. This is idle load.

CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O         PIDS
241b7491c6a2   habapp_habapp_1     3.57%     52.43MiB / 7.629GiB   0.67%     4.85MB / 1.29MB   31.2MB / 73.7kB   19
b0fe37a38f44   openhab_openhab_1   4.53%     728.1MiB / 7.629GiB   9.32%     1.84MB / 1.95MB   447MB / 73.1MB    133
86d103476284   samba_samba_1       0.01%     27.73MiB / 7.629GiB   0.35%     145kB / 129kB     40.9MB / 3.82MB   6
f4a36d129c6f   mosquitto_mqtt_1    1.22%     11.73MiB / 7.629GiB   0.15%     6.46MB / 10.9MB   6.22MB / 0B       1
f3b07ba3f563   unifi               1.71%     891.6MiB / 7.629GiB   11.41%    2.78MB / 510kB    271MB / 9.17MB    127

leif · May 11, 2022, 9:20am

@Spaceman_Spiff I just wrote the following test code of my own, to benchmark the OH rest api without HABApp, and ran it in python on my workstation.

import requests
import time
from uuid import uuid4

headers = {'Content-Type': 'text/plain', 'Accept': 'application/json'}

try:
	while True:
		myuuid=str(uuid4())
		print(myuuid)
		x = requests.post("http://openhab.lan:80/rest/items/BenchTestItem", data=myuuid, headers=headers)
		time.sleep(0.1)
except KeyboardInterrupt:
	pass

With BenchTestItem present in the sitemap, OpenHAB CPU jumps up to 25%.
If I don’t include it in the sitemap, CPU usage is “just” 15%, but that’s still huge for a few text updates.

This is certainly not HABApp’s fault.

Any ideas? Would still appreciate any pointers from you since you have a lot of experience using the REST API.

Spaceman_Spiff · May 11, 2022, 9:24am

Sorry - missed that one.
After the change I get ~1.5% CPU on OH and ~6% on HABApp.
This seems high but might translate to the percentages you are seeing.
I guess we’ll have to dig further but imho this does not correlate with the thread congestion.

leif · May 11, 2022, 9:25am

I agree, these are separate issues. They did all contribute to the unacceptably poor performance I was experiencing though, which made me dive in and take a look.

You mentioned you’re running HA on your workstation, but what are you running OH on?

Spaceman_Spiff · May 11, 2022, 9:37am

I am just spinning up instances of OH, Mosquitto and HA on my workstation and connect them all through localhost.

I’ve tested OH 3.2 and 3.3 and it’s roughly the same:

3.2

Bench item operations ... done!

            |  dur   | per sec | median |  min   |  max   |  mean
create item | 6.365s |  47.132 | 19.0ms | 18.0ms | 0.343s | 21.2ms
update item | 6.932s |  43.276 | 20.0ms | 18.6ms | 0.619s | 23.1ms
delete item | 2.433s | 123.300 | 8.00ms | 6.93ms | 23.0ms | 8.11ms

Bench item state update .... done!

                      |  dur   | per sec | median |  min   |  max   |  mean
             rtt idle |  15.0s | 114.920 | 9.00ms | 7.62ms | 66.5ms | 8.70ms
       async rtt idle |  15.0s | 118.704 | 8.00ms | 6.99ms | 16.0ms | 8.42ms
      rtt load (+10x) |  15.0s |  13.624 | 73.0ms | 50.0ms | 0.132s | 73.4ms
async rtt load (+10x) |  15.1s |  13.752 | 72.0ms | 70.0ms | 84.0ms | 72.7ms

3.3

Bench item operations ... done!

            |  dur   | per sec | median |  min   |  max   |  mean
create item | 7.025s |  42.704 | 22.0ms | 21.0ms | 42.0ms | 23.4ms
update item | 5.500s |  54.544 | 17.0ms | 14.9ms | 41.0ms | 18.3ms
delete item | 2.644s | 113.463 | 8.00ms | 6.96ms | 15.0ms | 8.81ms

Bench item state update .... done!

                      |  dur   | per sec | median |  min   |  max   |  mean
             rtt idle |  15.0s | 109.070 | 9.00ms | 7.87ms | 42.0ms | 9.17ms
       async rtt idle |  15.0s | 108.571 | 9.00ms | 7.59ms | 19.0ms | 9.21ms
      rtt load (+10x) |  15.0s |  12.716 | 78.0ms | 59.0ms | 0.117s | 78.6ms
async rtt load (+10x) |  15.1s |  12.673 | 78.0ms | 64.9ms | 99.0ms | 78.9ms

leif · May 11, 2022, 9:40am

Oh wait, that’s a huge jump! You mentioned ~1% before.
If you’re really seeing ~6%, that’s a jump of 5 percentage points. I saw a jump of about 12 percentage points.

It’s not inconceivable that your workstation is two and a half times faster than my raspberry pi!

That means we’re actually seeing the same performance, the only difference is your much beefier machine.

Considering the small amount of actual work that is being done – that is, updating a tiny text string – that is hugely inefficient. I mean, if we compare the performance of MQTT to the performance of sending a message to OH through the REST API, that’s a couple of orders of magnitude.

Have you ever considered making some kind of (optional) OH add-on for HA to communicate more directly with the OH eventbus, bypassing the REST API for anything performance critical?
In fact, one might even consider using MQTT for the communication layer between the HA Openhab-module and HA itself?

leif · May 11, 2022, 9:46am

Okay, 109 per second… that is really not many for a multi gigahertz modern workstation.

Creating and deleting, fine – that’s not done that often… but updating could be a lot faster couldn’t it?

Spaceman_Spiff · May 11, 2022, 10:14am

Well - there is #2891 but so far there has been no progress. Unfortunately my Java skills are far too limited and HABApp has been eating all my precious spare time so thankfully @J-N-K with all his wisdom and java magic has been jumping in.
As you see from the issue with a minimal working example the 800msg/sec is that maximum that can be theoretically achieved (on my machine).
Add some overhead from HABApp internals, logging, etc. and you’re down to 100 msg/sec.
You have to pay something for all the convenience.

PS:
Internally HABApp can handle 1.2k / 3.3k msg/sec and with mqtt it’s 500 / 1.2k msgs/sec for async/sync.
While this is not much, I also think it’s enough (at least it is for me ).
And who knows - may the performance will even improve (or worsen ) over time.

leif · May 11, 2022, 11:51am

This sounds like plenty enough for me too :).
But, I don’t understand how REST 800/sec + HABAPP 1200/sec becomes 100/sec? I mean, if you cascaded two 800/sec things you should have 400/sec?

#2891 looks very promising! It looks like @J-N-K already did the work of adding the functionality, and it is waiting for review / merging?

Spaceman_Spiff · May 11, 2022, 12:14pm

800/sec is the theoretical maximum. That’s just a couple of functions in one file for receiving/transmitting.
Add event bus, rule engine , etc. and that goes down.
However event logging eats a large chunk of that performance because it’s thread safe.

Disabled event log:

+------------------------------------------------------------------------------+
|                                    HABApp                                    |
+------------------------------------------------------------------------------+

Bench events .. done!

               |  dur   | per sec | median |  min   |  max   |  mean 
      rtt idle | 1.714s |  29.17k |  0.0us |  0.0us | 1.43ms | 34.3us
async rtt idle | 6.032s |   1.63k | 1.00ms |  0.0us | 21.0ms | 0.61ms

+------------------------------------------------------------------------------+
|                                     MQTT                                     |
+------------------------------------------------------------------------------+

Bench events .. done!

               |  dur   | per sec | median |  min   |  max   |  mean 
      rtt idle | 5.953s |   2.99k |  0.0us |  0.0us | 15.0ms | 0.33ms
async rtt idle | 6.020s | 570.264 | 2.00ms | 0.81ms | 6.00ms | 1.75ms

+------------------------------------------------------------------------------+
|                                   openHAB                                    |
+------------------------------------------------------------------------------+

Bench item operations ... done!

            |  dur   | per sec | median |  min   |  max   |  mean 
create item | 5.701s |  52.622 | 18.0ms | 16.7ms | 77.1ms | 19.0ms
update item | 4.842s |  61.959 | 15.0ms | 13.6ms | 66.5ms | 16.1ms
delete item | 2.199s | 136.435 | 7.00ms | 5.79ms | 27.8ms | 7.33ms

Bench item state update .... done!

                      |  dur   | per sec | median |  min   |  max   |  mean 
             rtt idle |  15.0s | 135.547 | 7.00ms | 6.00ms | 12.0ms | 7.38ms
       async rtt idle |  15.0s | 135.671 | 7.00ms | 5.93ms | 64.0ms | 7.37ms
      rtt load (+10x) |  15.0s |  15.697 | 63.0ms | 60.0ms | 79.0ms | 63.7ms
async rtt load (+10x) |  15.0s |  15.655 | 63.0ms | 61.0ms | 0.125s | 63.9ms

You can see that for openhab it’s 10% performance increase so it’s not that critical.
For internal/mqtt it’s another issue.

Correct. Let’s just hope the maintainers decide to invest their time there.

leif · May 11, 2022, 2:37pm

Oh wow. Okay, understood. Definitely the OH communication with REST is currently the bottleneck. Makes sense – making a new HTTP connection for each command is a large number of layers to get through just to flip a bit.

Let’s hope indeed. I see J-N-K seems to be the most active openHAB contributor at the moment, so that bodes well, but for some reason many of his newer change suggestions have been approved already, just not this one.

Thanks so much for all your help so far @Spaceman_Spiff