HABApp potential thread / event contention issue

leif · May 11, 2022, 9:46am

Okay, 109 per second… that is really not many for a multi gigahertz modern workstation.

Creating and deleting, fine – that’s not done that often… but updating could be a lot faster couldn’t it?

Spaceman_Spiff · May 11, 2022, 10:14am

Well - there is #2891 but so far there has been no progress. Unfortunately my Java skills are far too limited and HABApp has been eating all my precious spare time so thankfully @J-N-K with all his wisdom and java magic has been jumping in.
As you see from the issue with a minimal working example the 800msg/sec is that maximum that can be theoretically achieved (on my machine).
Add some overhead from HABApp internals, logging, etc. and you’re down to 100 msg/sec.
You have to pay something for all the convenience.

PS:
Internally HABApp can handle 1.2k / 3.3k msg/sec and with mqtt it’s 500 / 1.2k msgs/sec for async/sync.
While this is not much, I also think it’s enough (at least it is for me ).
And who knows - may the performance will even improve (or worsen ) over time.

leif · May 11, 2022, 11:51am

This sounds like plenty enough for me too :).
But, I don’t understand how REST 800/sec + HABAPP 1200/sec becomes 100/sec? I mean, if you cascaded two 800/sec things you should have 400/sec?

#2891 looks very promising! It looks like @J-N-K already did the work of adding the functionality, and it is waiting for review / merging?

Spaceman_Spiff · May 11, 2022, 12:14pm

800/sec is the theoretical maximum. That’s just a couple of functions in one file for receiving/transmitting.
Add event bus, rule engine , etc. and that goes down.
However event logging eats a large chunk of that performance because it’s thread safe.

Disabled event log:

+------------------------------------------------------------------------------+
|                                    HABApp                                    |
+------------------------------------------------------------------------------+

Bench events .. done!

               |  dur   | per sec | median |  min   |  max   |  mean 
      rtt idle | 1.714s |  29.17k |  0.0us |  0.0us | 1.43ms | 34.3us
async rtt idle | 6.032s |   1.63k | 1.00ms |  0.0us | 21.0ms | 0.61ms

+------------------------------------------------------------------------------+
|                                     MQTT                                     |
+------------------------------------------------------------------------------+

Bench events .. done!

               |  dur   | per sec | median |  min   |  max   |  mean 
      rtt idle | 5.953s |   2.99k |  0.0us |  0.0us | 15.0ms | 0.33ms
async rtt idle | 6.020s | 570.264 | 2.00ms | 0.81ms | 6.00ms | 1.75ms

+------------------------------------------------------------------------------+
|                                   openHAB                                    |
+------------------------------------------------------------------------------+

Bench item operations ... done!

            |  dur   | per sec | median |  min   |  max   |  mean 
create item | 5.701s |  52.622 | 18.0ms | 16.7ms | 77.1ms | 19.0ms
update item | 4.842s |  61.959 | 15.0ms | 13.6ms | 66.5ms | 16.1ms
delete item | 2.199s | 136.435 | 7.00ms | 5.79ms | 27.8ms | 7.33ms

Bench item state update .... done!

                      |  dur   | per sec | median |  min   |  max   |  mean 
             rtt idle |  15.0s | 135.547 | 7.00ms | 6.00ms | 12.0ms | 7.38ms
       async rtt idle |  15.0s | 135.671 | 7.00ms | 5.93ms | 64.0ms | 7.37ms
      rtt load (+10x) |  15.0s |  15.697 | 63.0ms | 60.0ms | 79.0ms | 63.7ms
async rtt load (+10x) |  15.0s |  15.655 | 63.0ms | 61.0ms | 0.125s | 63.9ms

You can see that for openhab it’s 10% performance increase so it’s not that critical.
For internal/mqtt it’s another issue.

Correct. Let’s just hope the maintainers decide to invest their time there.

leif · May 11, 2022, 2:37pm

Oh wow. Okay, understood. Definitely the OH communication with REST is currently the bottleneck. Makes sense – making a new HTTP connection for each command is a large number of layers to get through just to flip a bit.

Let’s hope indeed. I see J-N-K seems to be the most active openHAB contributor at the moment, so that bodes well, but for some reason many of his newer change suggestions have been approved already, just not this one.

Thanks so much for all your help so far @Spaceman_Spiff

Spaceman_Spiff · May 12, 2022, 4:19am

Could you try with the event log set to WARNING and see if congestion happens, too?
Are you using the HABApp.core.lib.handler.MidnightRotatingFileHandler?

leif · May 12, 2022, 5:22am

I think it is set to warning already? Below is my entire logging.yml. There’s still a “MyRule” config from the very beginning which I never actually log anything to or use – the only log I look at is HABApp.log. events.log was constantly flooded (multi gigabytes a day) until I changed the level to WARNING last week, and after that absolutely nothing hits events.log, it’s zero size.

formatters:
  HABApp_format:
    format: '[%(asctime)s] [%(name)25s] %(levelname)8s | %(message)s'

  MyRule_format:
    format: '[%(asctime)s] [%(name)s] %(message)s'
    datefmt: '%H:%M.%S'


handlers:
  # There are several Handlers available:
  #  - logging.handlers.RotatingFileHandler:
  #    Will rotate when the file reaches a certain size (see python logging documentation for args)
  #  - HABApp.core.lib.handler.MidnightRotatingFileHandler:
  #    Will wait until the file reaches a certain size and then rotate on midnight
  #  - More handlers:
  #    https://docs.python.org/3/library/logging.handlers.html#rotatingfilehandler

  HABApp_default:
    class: HABApp.core.lib.handler.MidnightRotatingFileHandler
    filename: 'HABApp.log'
    maxBytes: 1_048_576
    backupCount: 3

    formatter: HABApp_format
    level: DEBUG

  EventFile:
    class: HABApp.core.lib.handler.MidnightRotatingFileHandler
    filename: 'events.log'
    maxBytes: 1_048_576
    backupCount: 3

    formatter: HABApp_format
    level: DEBUG

  BufferEventFile:
    class: logging.handlers.MemoryHandler
    capacity: 10
    formatter: HABApp_format
    target: EventFile
    level: DEBUG

  MyRuleHandler: # <-- This is the name of the handler
    class: HABApp.core.lib.handler.MidnightRotatingFileHandler
    filename: 'myrule.log'
    maxBytes: 10_000_000
    backupCount: 3
    
    formatter: MyRule_format  # use this format
    level: DEBUG
    
loggers:
  HABApp:
    level: INFO
    handlers:
      - HABApp_default
    propagate: False

  HABApp.EventBus:
    level: WARNING
    handlers:
      - BufferEventFile
    propagate: False

  MyRule:              # <-- Name of the logger
    level: DEBUG       # <-- minimum Logging level, e.g. use INFO if you don't want the output of log.debug()
    handlers:
      - MyRuleHandler  # This logger uses the MyRuleHandler
    propagate: False

I would still very much like to have a way to apply a filter to the event log, to change events with certain names to DEBUG level, so they’re not normally logged, without disabling everything else.

For example, this MQTT topic:
homie/main-energy-meter/properties/json-total
Which is updated four times a second, 24/7, containing:
{"U":[245.3,247.4,247.3],"A":[6.58,-5.89,-5.36],"W":[985,-205,-702],"VA":[1613,1459,1324],"VAr":[-72,-923,-65],"PF":[0.610,-0.140,-0.529],"Hz":50.04}
As you can imagine this absolutely hammers the events.log.
This is one of the most important topics in my system. It is my current grid power consumption status, and it is monitored by my virtual smart meters and fed to each of my solar inverters, so they can prevent exporting energy back to the grid (which I would get charged for – the utility meter counts usage both ways).
It needs to go into habapp because that’s how I sum up the current net usage in watts so I can keep an eye on it and display it in my sitemap. This is, incidentally, the single NumberItem update that was using 20% between OH and HA. I have since implememented averaging in the HA rule, and I now only update the NumberItem every 1.5 seconds, so CPU use is now manageable.

Anyway. This value needs to stay at the resolution it is, but it does not normally need to be logged in events.log – at all! If I wanted to log the energy usage over time, I think influxDB would be more useful :).

So, having RegEx → Log Level filters in the HA logging system would really be handy.

Hey by the way. You mentioned logging being slow because of being thread-safe.
Wouldn’t it be possible to be both efficient and thread-safe by keeping the logging in its own thread, so that only the buffer needs to be thread safe?
That is how I implemented it in my own logger in my code base, so that I can generate log items from a high priority low latency Audio DSP thread without it stalling waiting for a disk write. In fact, in my case, I made my buffer lock-free, and if completely flooded it will actually drop log items rather than stall the DSP thread.

If so, the disk I/O thread would be a good place to implement regex filtering, as it would then not affect throughput, provided there are cores left over.

Spaceman_Spiff · May 12, 2022, 8:19am

Does congestion still happen with the disabled event log?

At least for the event log I need to build something like that.
I think for the normal logging from rules it’s not necessary since there is not that much traffic.

leif · May 12, 2022, 12:33pm

YES it did. Disabling the event log didn’t seem to help at all. It has been disabled (or rather, set to WARNING) since last week, that was actually the first thing I did when I started to troubleshoot the performance issue. Everything else has happened since then, with event log disabled.

Ah! Okay this explains why it affects performance.

Agreed.

Spaceman_Spiff · May 13, 2022, 6:24am

That’s a pity because that was currently the only way how I could somehow reproduce the congestion.

As I said, this is already possible. I’ll add an example to the docs.

leif · May 13, 2022, 6:52am

Wait, what?? It is? You did? Yes, please do.