Revisiting local LLM-powered Voice Assistants in 2026?

Currently I’m running 8 Alexa echos from amazon and experience mainly these three issues:

  1. it’s Jeff Bezos
  2. it’s cloud based
  3. the current amazonechocontrol binding is somewhat unstable

So I’m wondering who has a stable, convenient voice assistant based on local LLM, which would cover my main use cases:

  1. smarthome integration (turn on lights, tell me states of items, …)
  2. playing music (currently from spotify, but eyeing on Deezer or anything which pays artists more and keeps AI-music out
  3. using the speakers as an intercom (for like what echos do with drop-in or announcements)

There are some more or less old threads, which do or do not meet the requirements:

Does anyone have one or two experiences with those solutions?
There is a fairly new and ambitious project with a prepared HomeAssistant pipeline on github: FutureProofHomes/Satellite1-ESPHome: Open Source ESPHome Firmware for Your Private AI-Powered Satel… and this is the docs: FutureProofHomes

4 Likes

I’m also looking for some assistance on the Voice Assistant.

Anyone?

With https://github.com/openhab/openhab-webui/pull/2285 being merged, we now have an easy way to access (especially during development/testing) openHAB’s dialog system.
I am currently working on modernising the Piper JNI wrapper used by the Piper TTS voice add-on, next thing will probably be to look into modernising the Whisper JNI wrapper used by the Whisper STT voice add-on as well. At the current state, those are already really good and provide a solid foundation for TTS and STT.

The only thing currently missing is a good human language interpreter (HLI), but there are plans and there is WIP to implement one based on LLMs.

Once that’s finished, we need hardware. You can either buy hardware yourself and create your own firmware based on our PCM Audio WebSocket (supported since openHAB 5.1; documentation PR is already open).
Though the better option would be to have the (unofficial) ESPHome binding support the ESPHome Voice protocol, which would allow you to buy a Home Assistant Voice device.
Pine64 is also working on cheap hardware (they told me about 30$), the PineVox, but that isn’t finished yet (they told me will likely be finished in three months). IIRC at the moment they are only supporting HA’s Wyoming (of which the voice satellite part is kind of deprecated in favour of ESPHome Voice), but this could either be made compatible with openHAB (if PineVox used ESPHome Voice, again we only need to support that; or by having Wyoming satellite support; or by extending PineVox with our WS API).

10 Likes

Awesome detailed response!

Impressed by all the hard work you put into the product. Thank you.

Regards /Daniel

Wow! That’s good news, thanks for that!

If I understand correctly, the “FutureProofhomes”-hardware stack is built on something like this with added ESP32: ReSpeaker Mic Array v2.0 - Seeed Studio - which then triggers a (local) LLM, which in turn can trigger some endpoints on let’s say a smarthome system like openHAB. :wink:

Do you have more information on PineVox? For me it seems an abandoned track?

I had a swift look at FutureproofHome at it seems (from looking at the website, haven’t looked at the code) that they are compatible with ESPHome, so it seems we could tick many hardware support boxes by supporting ESPHome voice protocol.

Not yet, I’m waiting for the release/focussing on the core tasks at the moment. But we talked to them a bit on FOSDEM and it’s in development and not finished yet.

1 Like

yes, indeed. The Satellite1 DevKit is in itself ESPHome based and supports all of it. In HA it appears automatically as an ESPHome device. If OH also supports the ESPHome voice protocol, it would work ootb, I think.

As I understand it, you can use the DevKit to build your own smart speaker (like ceiling speakers or use the 3D-printed enclosures provided by FutureProofHomes themselves.

For me, this finally sounds like a way to avoid having to feed cloud services like amazon. I just wished, the reaction time would be a bit faster. This really is a drawback, that the processing and thinking time is longer as you see in like echo-devices.

I think that really depends on the hardware you are running on, and where the LLM runs that you use. Cloud-based LLMs have good inference speed but latency, local LLMs depend on your hardware. BTW, the reaction time is also a reason to link into the JNI wrappers. Especially for Whisper, using a newer version of whisper.cpp could really improve performance depending on your platform as support for special hardware acceleration is added over time. For example on Apple Silicon, latest whisper.cpp can accelerate using Metal and Core ML.

1 Like