The more I get used to HABApp and Python, the more useful it gets, and the more I like it! I’m relying more and more on it for all sorts of home automation tasks. Unfortunately I’m also running into a performance issue more often the more I use it.
I’m humbly asking for help here, there’s no way I can sort this one out on my own, this is going to get deep. Pretty please with sugar on top @Spaceman_Spiff
First of all – my whole system runs on MQTT. Mosquitto runs in a docker on the same 8gb RPi4 that openHAB and HABApp runs on, connected with 1g ethernet, and every. single. device in my entire system is MQTT, whether Homie or HA-style.
So, here’s what happens. I’ve written a script to apply scenes, by setting the state of multiple items at once. It can apply the state of openHAB items as well as set MQTT topics. Everything happens in an asynchronous, staggered fashion, precisely to avoid contention.
Here is a link to the entire script on pastebin
Here is how the script might be called:
def fnStartWorkScene(self):
scene.Apply(self, delayTime=0.3,
#log=log,
actions=[
[ "garagemcu_WorkbenchLights=on" ],
[ "garagemcu_SinkLights=on" ],
[ "garagemcu_SideLights=on" ],
[ "GaragePC=on" ],
[ "GarageStatus=on" ],
[ "GarageLights=on", "dimmer*GarageLightsDimmer=2" ],
[ "WorkshopLights=on", "dimmer*WorkshopLightsDimmer=2" ],
[ "number*mezzaninependants_Dimmer=0.5" ],
[ "GarageAudio=on" ],
[ "BasementAudio=on" ],
[ "garagemcu_SinkOutlet=on" ],
[ "garagemcu_BenchOutlet6=on" ],
]
)
If it sounds like a lot of things for a garage, it is - my garage also holds my workshop and electronics lab.
If you look at the scene script link above, you can see that everything happens asynchronously with self.rule.run.soon or self.rule.run.at, only one item at a time.
Here is what works perfectly with no performance issues:
openHAB item:
String GarageScene "Garage Scene" {expire="1s,command=IDLE"}
openHAB sitemap:
Switch item=GarageScene label="Garage Scene []" icon=wallswitch mappings=[OFF="OFF",WORK="WORK",COZY="COZY", HOUSEKEEPING="CLEAN"]
HABApp script:
StringItem.get_item('GarageScene').listen_event(self.item_scene_command, ItemCommandEvent)
def item_scene_command(self, event):
assert isinstance(event, ItemCommandEvent)
#log.info(f"current: {event}")
if event.value=="OFF":
self.fnAllStop(False)
elif event.value=="WORK":
self.fnStartWorkScene()
elif event.value=="HOUSEKEEPING":
self.fnStartHousekeepingScene()
elif event.value=="COZY":
self.fnStartCozyScene()
Again, the above works perfectly, performance is fine, no contention, no warnings.
So, what doesn’t work?
Well, here’s where it gets more complicated. My control paradigm for my house is not really scenes triggered from openHAB… it is 433 MHz wall mounted remotes, with messages coming in through MQTT. When I act on those, HABApp seems to run into contention issue.
So, here’s the flow.
Multiple ESP8266 and ESP32 units have 433 MHz receivers and run custom software with rc-switch filtered through my own superbuttons library.
Publishing raw 433 MHz received codes to MQTT would certainly cause a flood, so to avoid that, I wrote the superbuttons library which tracks and distills received codes down to the minimal number of messages while keeping maximum flexibility.
What gets published to MQTT (one topic for each receiver unit) is as follows. The button code in these examples is e6d798.
Let’s say that you press the button once, quickly.
e6d798*TALLY,1,0
e6d798*RELEASE,1,0
e6d798*DONE,1,0
Let’s say you hold it down for two seconds:
e6d798*TALLY,1,0
e6d798*SOLID,0,1
e6d798*MEDIUMPRESS,0,3
e6d798*LONGPRESS,0,7
e6d798*VERYLONGPRESS,0,15
e6d798*RELEASE,1,15
e6d798*DONE,1,15
Let’s say you triple-click the button:
e6d798*TALLY,1,0
e6d798*SOLID,0,1
e6d798*RELEASE,1,1
e6d798*TALLY,2,1
e6d798*RELEASE,2,1
e6d798*TALLY,3,1
e6d798*RELEASE,3,1
e6d798*DONE,3,1
So, there are a few messages, but not that many.
Especially if you hold a button down, for example to raise or lower volume or to dim a light, it saves quite a few messages.
The data that is posted is just enough to make it easy to script actions or scenes.
For example, if you wanted to select scenes based on how many times the button is pressed, you would simply look for the DONE message, with the first parameter being the number of presses, and ignore everything else.
For a volume up button, you might increment 1 for every TALLY message, start a repeating timer on LONGPRESS, and stop the timer on RELEASE.
The key is that this makes it possible to script anything without having to modify the ESP8266/ESP32 code. All codes are passed through, but in a distilled fashion, precisely to avoid flooding things downstream.
Of course, in a large room, you need more than one receiver to have full coverage. In the garage I have four receivers in different locations. The different receivers publish to different topics. The script that acts on a certain fixed wall-mounted remote only listens to one topic, but multiple receiver may still hear the remote, even if not perfectly reliably, so they will post messages too.
And, there’s where I’m running into problems.
If I trigger my scenes from 433 MHz remotes, then I get messages like this almost every time:
[2022-05-03 12:44:46,021] [ HABApp.Worker] WARNING | Execution of GarageRemotes.run took too long: 1.38s
[2022-05-03 12:44:46,046] [ HABApp.Worker] WARNING | ncalls tottime percall cumtime percall filename:lineno(function)
[2022-05-03 12:44:46,050] [ HABApp.Worker] WARNING | 1 0.000 0.000 1.384 1.384 /config/lib/leifutil/scene.py:98(run)
[2022-05-03 12:44:46,052] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.384 0.692 /usr/local/lib/python3.8/site-packages/HABApp/rule/scheduler/habappschedulerview.py:20(at)
[2022-05-03 12:44:46,054] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.383 0.692 /usr/local/lib/python3.8/site-packages/eascheduler/scheduler_view.py:18(at)
[2022-05-03 12:44:46,056] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.383 0.692 /usr/local/lib/python3.8/site-packages/eascheduler/jobs/job_one_time.py:11(_schedule_first_run)
[2022-05-03 12:44:46,057] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.382 0.691 /usr/local/lib/python3.8/site-packages/eascheduler/jobs/job_base.py:37(_set_next_run)
[2022-05-03 12:44:46,059] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.382 0.691 /usr/local/lib/python3.8/site-packages/HABApp/rule/scheduler/scheduler.py:49(add_job)
[2022-05-03 12:44:46,061] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.357 0.678 /usr/local/lib/python3.8/concurrent/futures/_base.py:416(result)
[2022-05-03 12:44:46,063] [ HABApp.Worker] WARNING | 2 0.000 0.000 1.357 0.678 /usr/local/lib/python3.8/threading.py:270(wait)
[2022-05-03 12:44:46,064] [ HABApp.Worker] WARNING | 4 1.356 0.339 1.356 0.339 {method 'acquire' of '_thread.lock' objects}
[2022-05-03 12:44:46,066] [ HABApp.Worker] WARNING | 1 0.000 0.000 0.678 0.678 /usr/local/lib/python3.8/site-packages/HABApp/rule/scheduler/habappschedulerview.py:65(soon)
[2022-05-03 12:44:46,066] [ HABApp.Worker] WARNING | 2 0.000 0.000 0.024 0.012 /usr/local/lib/python3.8/asyncio/tasks.py:911(run_coroutine_threadsafe)
Sometimes multiple ones during one scene application.
It’s always acquiring _thread.lock that takes time.
It never happens when scenes are triggered through an openHAB item… it really seems to be caused by there being more MQTT traffic than normal.
Still, there’s not all that many messages in total, I wouldn’t have thought this would be an issue, I mean we’re talking a multi-core multi-ghz machine after all.
Edit: Okay, maybe there are way more messages than I first thought. I set up a mosquitto subscription on homie/+/properties/rx433 and pushed a single scene select button once. Here is the result. The code was 7d784 but some receivers also misheard the code as 7d780, resulting in additional messages. Multiple receivers heard the code even though only the callback for garage-loft-ws281x actually acts on it. This is of course the nature of airwaves, signals go well beyond their useful range.
Multiple rule callbacks in multiple files will be called in response to each of these. Still, while it may look like a lot to the naked eye, we have to remember this is running on a quad core 1.5 GHz CPU, not a arduino uno. It’s tens of events, not thousands. Also, there’s a separate callback function for each receiver topic (and separate rules for most), although they are all going to the same HABApp, that is sort of the point.
Having written lots of multi-threaded software myself (realtime audio processing, C++) I know this is not an easy or particularly fun issue to track down.
Could it be that some part of HABApp is holding on to a thread lock while performing something time-consuming, when instead it might have copied the shared data, released the thread lock, and then done its time consuming action?
Help, please?