Hi everyone,
I recently finished a voice control project for my OpenHAB setup, and I thought some of you might be interested in my experience and the tools I built.
Full disclosure right out of the gate: this is not an offline system, and it is a paid service (though there is a free DEV tier available). I’m using the Gemini Live API for voice processing. The result is basically a mini-Jarvis tailored for my OpenHAB environment.
Here’s the core idea: The Gemini Voice API supports streaming local function calling. This means the model figures out what you want to do based on your function descriptions, and right in the middle of the conversation, it pushes an event down to your local agent. The agent catches this event and fires off the specific command you’ve defined.
To make this work, I wrote a local voice agent that runs directly in your environment. Mine runs on a Raspberry Pi right next to my OpenHAB instance, but there are binaries for Windows, Linux, and macOS. I also built a web interface to manage it, which includes a function editor, billing, etc.
In a nutshell, the architecture looks like this: A very simple hardware interface handles voice I/O and wake-word detection. This connects to the voice assistant agent, which is responsible for executing your custom functions. Out of the box, it supports:
-
MQTT
-
Webhooks (great for n8n)
-
Exec (local shell commands)
-
Direct GPIO pin control on the controller itself
If you’re curious and want to check out more details, you can find it here: https://voice-assistant.io
Any feedback or questions are welcome!
Small demo: https://youtu.be/fAkMAtZZeuk