Revisiting local LLM-powered Voice Assistants in 2026?

binderth · February 10, 2026, 10:21am

Currently I’m running 8 Alexa echos from amazon and experience mainly these three issues:

it’s Jeff Bezos
it’s cloud based
the current amazonechocontrol binding is somewhat unstable

So I’m wondering who has a stable, convenient voice assistant based on local LLM, which would cover my main use cases:

smarthome integration (turn on lights, tell me states of items, …)
playing music (currently from spotify, but eyeing on Deezer or anything which pays artists more and keeps AI-music out
using the speakers as an intercom (for like what echos do with drop-in or announcements)

There are some more or less old threads, which do or do not meet the requirements:

Does anyone have one or two experiences with those solutions?
There is a fairly new and ambitious project with a prepared HomeAssistant pipeline on github: FutureProofHomes/Satellite1-ESPHome: Open Source ESPHome Firmware for Your Private AI-Powered Satel… and this is the docs: FutureProofHomes

daniel_grufman · February 12, 2026, 12:20pm

I’m also looking for some assistance on the Voice Assistant.

Anyone?

florian-h05 · February 12, 2026, 12:45pm

With https://github.com/openhab/openhab-webui/pull/2285 being merged, we now have an easy way to access (especially during development/testing) openHAB’s dialog system.
I am currently working on modernising the Piper JNI wrapper used by the Piper TTS voice add-on, next thing will probably be to look into modernising the Whisper JNI wrapper used by the Whisper STT voice add-on as well. At the current state, those are already really good and provide a solid foundation for TTS and STT.

The only thing currently missing is a good human language interpreter (HLI), but there are plans and there is WIP to implement one based on LLMs.

Once that’s finished, we need hardware. You can either buy hardware yourself and create your own firmware based on our PCM Audio WebSocket (supported since openHAB 5.1; documentation PR is already open).
Though the better option would be to have the (unofficial) ESPHome binding support the ESPHome Voice protocol, which would allow you to buy a Home Assistant Voice device.
Pine64 is also working on cheap hardware (they told me about 30$), the PineVox, but that isn’t finished yet (they told me will likely be finished in three months). IIRC at the moment they are only supporting HA’s Wyoming (of which the voice satellite part is kind of deprecated in favour of ESPHome Voice), but this could either be made compatible with openHAB (if PineVox used ESPHome Voice, again we only need to support that; or by having Wyoming satellite support; or by extending PineVox with our WS API).

daniel_grufman · February 12, 2026, 3:45pm

Awesome detailed response!

Impressed by all the hard work you put into the product. Thank you.

Regards /Daniel

binderth · February 12, 2026, 4:20pm

Wow! That’s good news, thanks for that!

If I understand correctly, the “FutureProofhomes”-hardware stack is built on something like this with added ESP32: ReSpeaker Mic Array v2.0 - Seeed Studio - which then triggers a (local) LLM, which in turn can trigger some endpoints on let’s say a smarthome system like openHAB.

Do you have more information on PineVox? For me it seems an abandoned track?

florian-h05 · February 12, 2026, 6:27pm

I had a swift look at FutureproofHome at it seems (from looking at the website, haven’t looked at the code) that they are compatible with ESPHome, so it seems we could tick many hardware support boxes by supporting ESPHome voice protocol.

Not yet, I’m waiting for the release/focussing on the core tasks at the moment. But we talked to them a bit on FOSDEM and it’s in development and not finished yet.

binderth · February 13, 2026, 8:40am

yes, indeed. The Satellite1 DevKit is in itself ESPHome based and supports all of it. In HA it appears automatically as an ESPHome device. If OH also supports the ESPHome voice protocol, it would work ootb, I think.

As I understand it, you can use the DevKit to build your own smart speaker (like ceiling speakers or use the 3D-printed enclosures provided by FutureProofHomes themselves.

For me, this finally sounds like a way to avoid having to feed cloud services like amazon. I just wished, the reaction time would be a bit faster. This really is a drawback, that the processing and thinking time is longer as you see in like echo-devices.

florian-h05 · February 13, 2026, 8:47am

I think that really depends on the hardware you are running on, and where the LLM runs that you use. Cloud-based LLMs have good inference speed but latency, local LLMs depend on your hardware. BTW, the reaction time is also a reason to link into the JNI wrappers. Especially for Whisper, using a newer version of whisper.cpp could really improve performance depending on your platform as support for special hardware acceleration is added over time. For example on Apple Silicon, latest whisper.cpp can accelerate using Metal and Core ML.

florian-h05 · June 22, 2026, 4:34pm

So I have great news:

I’ve finished a LLM-powered (voice) assistant implementation for openHAB 5.2.
It supports retaining context across multiple turns (by persisting conversation history on the server) and interacts with openHAB through tool calling. Furthermore, it is now possible to control which Items and with which permission (no access, read only, read & write) are exposed to the voice system (that includes the LLM-powered functionalities). openHAB Core does most of the heavy lifting, allowing for relatively simple implementations on the binding side. Right now I’ve only implemented a Google Gemini integration for the new LLM-powered human language interpreter, but we’ll extend that support at least to OpenAI.
This new feature also includes a new chat page in Main UI, aiming at replacing HABot.

There is no ESPHome Voice support yet, that depends on the ESPHome binding implementing it, and I also haven’t looked into improving Whisper (yet).

TL:DR; You should be able to chat with openHAB through Main UI‘s new chat UI and new voice feature.

jimtng · June 23, 2026, 2:48pm

I installed Gemini binding, then in Settings → Voice, set “Default Human Language Interpreter” to Gemini.

Then I tried Chat → Turn off all lights, but I got “GeminiHandler is not initialized”

florian-h05 · June 23, 2026, 3:04pm

You need to setup a Thing with an API key.

milo · June 23, 2026, 3:16pm

Possible to use it with

florian-h05 · June 23, 2026, 6:56pm

Yes. Just install the Gemini binding, setup an account Thing, set Gemini as standard HLI and you can talk to openHAB through that WS (you of course need STT and TTS setup). Docs should also be in place on next.openhab.org.

jimtng · June 23, 2026, 11:26pm

I created a thing + api key. When I typed “Turn on the shower light” it showed me a switch control and performed the command successfully, but also showed this message:

“Failed to communicate with Gemini: Gemini generateContent request resulted failed with HTTP 400 Bad Request”

vanja · June 24, 2026, 5:04am

Hi,

I’m using mycroft. It has openhab skill, so manipulating openhab items is not a problem.

mycroft was a company that went under, but it’s legacy lives within several forks:

I’m using raspi-ovos, image of mycroft assistant running on OVOS OS on RPI3.

With such a weak hardware, you have to use cloud based speech to text converters. you can play with different ones, just change the config, and you use whisper, change again, you are anonymously connecting to google etc…

If you have beefier HW, you can run the STT engine locally

Forum:

vanja · June 24, 2026, 5:10am

btw, if anyone is interested what happened:

seime · June 24, 2026, 6:18am

There is a pr Audio Support by ccutrer · Pull Request #86 · seime/openhab-esphome · GitHub to add some audio support to the esphome binding that you can test.

holger_hees · July 4, 2026, 8:41am

I’ve looked at a few devices now, including the HomeAssistant Voice Preview Edition

but I’m currently leaning towards the Satellite1.1 Smart Speaker, mainly because it has a significantly better speaker and superior microphone placement.

If I understand correctly, the ESPHome binding combined with ccutrer’s audio PR should be sufficient.

I’m not expecting a fully functional solution right now, I just wanted something to experiment with. And I hope it’s not a dead end in terms of hardware.

any opinions?

dalgwen · July 4, 2026, 11:15am

ESPHome audio protocol seems to be the winner in the “future proof” audio pipeline contest. So I think you are right, at least on this hardware part. I’m very interested in tests and opinion about hardware assistant, so I would be glad if you provide yours if/when you will have integrated it

The pull request seems to be stalled (?), but I’m sure it will happen. (If not, I’m clearly willing to do it one day… But I’m very slow and hope someone else will do it before me )

I think you are right…
Beside the not-ready ESPHome binding, one concern of mine is the wakeword.
AFAIK, openHAB can not actually handle remote wakeword. The openHAB assistant pipeline is not as flexible as the home assistant one, for now (IMHO it could benefit from a write over and a way to design real and complex pipelines from GUI, but its a big work). You may work around this with scripting, to trigger a one-time instant audio pipeline. But it’s not very convenient and will require tinkering.
You also can do a full remote wakeword recognition. I think it’s unfortunately a good idea for now. The ESP32 board use microwakeword, and it seems it “could be better”, if I believe what I read in comments. Especially for the not-english model. ESP32 is not very powerful so doing good wakeword recognition alongside other tasks (playing music,etc.) is demanding for its hardware.
I would like to see more rapsberry-like powered assistant in the future.

florian-h05 · July 4, 2026, 12:51pm

openHAB’s DialogProcessor registers itself as DTListener (dialog trigger listener) on an DTService (dialog trigger service) and supports both Keyword Spotter-based DTService and arbitrary DTService, see:

github.com/openhab/openhab-core

bundles/org.openhab.core.voice/src/main/java/org/openhab/core/voice/internal/DialogProcessor.java

e9848dc68


      
          if (dtService instanceof KSService ksService) {
              AudioFormat fmt = ksFormat;
              if (fmt == null) {
                  logger.warn("No compatible audio format found for ks '{}' and source '{}'", ksService.getId(),
                          dialogContext.source().getId());
                  return null;
              }
              AudioStream stream = dialogContext.source().getInputStream(fmt);
              streamKS = stream;
              dtServiceHandle = ksService.spot(this, stream, dialogContext.locale(), keyword);
          } else if (dtService instanceof BasicDTService basicDTService) {
              dtServiceHandle = basicDTService.registerListener(this);
          } else {

The Audio PCM WebSocket could also be easily extended to allow keyword spotting on the server.
See openHAB WebSocket API | openHAB.

IMO a first refactor would be an important step, it has grown to quite a lot of code with some really big classes …