ESPHome Audio Streaming on ESP32-S3 – Any OpenHAB Integration Ideas?

Hi everyone,

I’ve been exploring the ESP32-S3 for offline voice control, and I wanted to share some info and see if anyone has insights or is already working on it.

Background

  • The ESP32-S3 chip has been around since 2021 and comes with AI/DSP acceleration, which makes it ideal for audio tasks like microphone input, keyword spotting, and simple speech processing.

  • ESPHome added experimental audio pipeline support starting mid-2023, and by ESPHome 2023.8 the voice_assistant: component was introduced.

What voice_assistant: Does

  • Streams microphone input in real time from ESP32-S3 devices.

  • Uses ESPHome’s native API (protobuf) for efficient, low-latency audio streaming.

  • Currently designed only for Home Assistant, where the audio can be sent to any STT engine (Whisper, Piper, Google STT, etc.).

Limitation for OpenHAB

  • OpenHAB has no native support for ESPHome audio streaming.

  • That means the efficient streaming path from the ESP32-S3 is not accessible natively.

  • Workarounds today would require:

    1. Sending audio via HTTP uploads (WAV/RAW clips) or MQTT (Base64), which is less efficient.

    2. Implementing a bridge/gateway that understands the ESPHome audio protocol and forwards audio to OpenHAB/STT.

    3. Developing a dedicated OpenHAB binding to support ESPHome’s protobuf audio stream.

Why This Is Exciting

  • ESP32-S3 + I²S microphone = a low-cost, distributed, offline voice control solution.

  • With wake-word detection running locally (e.g., microWakeWord), the device can wake up locally and stream audio only when needed.

  • OpenHAB could greatly benefit from the same efficiency and low-latency pipeline if someone implements support.

So my question to the community:

  • Has anyone started working on an OpenHAB binding or middleware for ESPHome audio?

  • Any ideas for a lightweight approach to integrate the ESP32-S3 voice_assistant: pipeline into OpenHAB?

Would love to hear thoughts, tips, or if someone has already experimented with this.

Thanks!

PS: Read this Creation an voice audio satellite with the help of an Esp32 - #11 by moe
But did not see any result…

PPS: Waveshare ESP32-S3 1.46 Inch Round Display Development Board, 412 x 412, Supports Wi-Fi & BLT, Accelerometer and Gyroscope Sensor, Onboard Speaker and Microphone, with Protective Cover Glass: Amazon.de: Computer & Accessories
Hardware is getting realy cheap (on Ali even 50% of that)

I think there is an ESPHome add-on on the marketplace.It seems like adding audio support to that would be easier than creating a whole new add-on.

But in general, an add-on is probably the “correct” approach.

I think Willow - Open Source Echo/Google Home Quality Speech Hardware for $50 is based on the ESP32-S3 too so there might be something you can take from there as an alternative to ESPHome or in addition to ESPHome.

ESPHome add-on on the marketplace

That was my intention :folded_hands:

I think Willow - Open Source Echo/Google Home Quality Speech Hardware for $50 is based on the ESP32-S3 too so there might be something you can take from there as an alternative to ESPHome or in addition to ESPHome.

Not jumped into code, but ESPHome is prepared for customExtensions. So I hope we stay with that “Jack of all trades” :sweat_smile:

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.