Building a Local Voice Assistant with OpenHab Integration

Hi everyone,

I am working on building a local voice assistant that will run on a server where OpenHab is installed. Here are the details of my setup:

Server Specifications:

CPU: Ryzen 5900X
RAM: 64GB
GPU: RTX 3060 8GB

Satellite Specifications:

Devices: Dell OptiPlex 3000 Wyse
CPU: Celeron N5150
RAM: 16GB
Storage: 256GB
Peripherals: USB mic and speaker (e.g., Jabra)

Plan:

  1. Capture Sound: Use satellites to capture sound.
  2. Send to Server: Transmit audio data over the network to the server for processing.
  3. MQTT Command: Send MQTT commands to OpenHab.
  4. Feedback: Send the response as a sound file back to the satellite for feedback.

Testing Results with Whisper Models:

tiny: 45.6 sec (~4.15x) - Fastest, least accurate
base: 51.3 sec (~4.67x) - Better, still phonetic
small: 224.3 sec (~20.4x) - Most accurate but too slow

Given these results, I believe sending the data to the server is the best solution unless someone can recommend a better approach for the satellite hardware.

Language Preference: I want to use Romanian as the conversation language since the system will primarily be used by my mother.

Questions:

  1. Satellite Setup: What software should I install on the satellite side to implement wake word detection and STT+TTS with the current hardware?
  2. Server Setup: If STT+TTS needs to be installed on the server, what software should I use?

I hope this helps! Looking forward to your thoughts and suggestions.

Hello,

There are many, many possibilities.

As we are on an openHAB forum, I will advocate for a solution using what openHAB can already provides.

Here is ONE example for a working audio transmission architecture:

  • the satellite could use linux and pulseaudio for sound. Setup pulseaudio to load module-simple-protocol-tcp.
  • install pulseaudio binding on the openHAB server, and set up a bridge thing to connect to your satellite. Create a sink and a source thing for this bridge (available in autodiscovery once the bridge is created). Enable inside the two things the sink and source configuration switch for openHAB to use the audio transfer functionnality.
  • install the whisper binding. With this binding, openHAB can host the whisper service (simpler) or connect to an external service exposing an openAI API (you then have to use the whisper binding in dev version, as this functionnality is not yet in an official openHAB addon release).
  • If you choose to use an external service for whisper, you can use “speaches-ai” on your server, and it will leverage your 3060 for a lightning fast hardware inference.
  • install any TTS addon available for openHAB. For example, I think pipertts is pretty good, but I don’t know if it has a good Romanian voice.
  • Now it is time to link all this with openHAB. For the “assistant commands openHAB” part, it depends : you can let openHAB handle it with one of the HLI addon. Or you can output the result of the STT process to a string Item and then handle it by yourself (using a JSSR223 script langage, or sending it to MQTT for external processing, or whatever you are confortable with)

You may have seen that I didn’t mention the wakeword service. It’s because openHAB doesn’t respect your premise : you want the satellite to implement the wake word detection.
But if you use it the openHAB way, the satellite will have to stream the audio to the server continuously. openHAB then feed the audio to a wake word spotter service (which is currently Rustspotter, as it is the only one available for the moment) (spoiler, I’m working on another one)

An alternative is to code yourself the wake word part on the satellite (using python ?):

  • with for example openwakeword (which seems to be the best opensource software), or porcupine by picovoice (probably the best private software with a free plan)
  • then, when spotting the wakeword, send an http request to openHAB to activate an item. Have some rule triggered when the item is activated. The rule can then trigger a listenAndAnswer command (see openHAB multimedia documentation). This command will launch all the “Audio satellite in - TTS - Command interpreter - STT - Audio satellite out” chain.

Another alternative is to wait several months until I finish my own solution (a python audio in/out satellite using openwakeword and the wyoming protocol, with a relevant binding for openHAB) :joy:

2 Likes

Looking forward to checking it out!

1 Like

Thank you for your detailed answer Gwendal,
I have installed on the satellites Ubuntu Server 24.04 and tried the competition “way of voice” with Wyoming protocol and Rhasspy. The only success i had was with the wake word.
This weekend i will try your solution and I will keep you posted.

In the worst case I can use Google services for a while, but the goal is to have everything done on my server.

Thank you again.

Any thought of using ESPHome? Stumbled onto this project today GitHub - pham-tuan-binh/glados-respeaker: Building Potato GLaDOS with Home Assistant and ESPHome to make a glados potato but its using Home Assistant and ESPHome.

We have a ESPHome binding for openhab already that has been very active it might be ideal to make a translation layer for Home Assistant Assist devices or atleast the ESPHome knock off haha. The binding doesn’t currently support voice but I have a feeling that it just isn’t configured and wouldn’t be to hard to add.

It uses microWakeWord and runs on a Seeed Respeaker Lite Voice Kit which is pretty cheap and accessible. All the home assistant crap would have to be rewritten but the ESPHome base seemed pretty handy! Plus Home Assistant Voice Assist just seems to cover the satellite, TTS and STT so knock on wood you could setup satellites with some updates to the ESPHome Binding

One more thing we could add to the chain would be to use the Chatgpt Binding with a local instance of LocalAI. This would likely end up a little nicer then just trying to use HABot especially because you can use the systemMessage option to pass in “personalities” like used with the GLaDOS project.

Hello, nice to see other people with the same chain of thought !

A close subject was also discussed here.
If you know other relevant threads or people working on something, let me know.

I have a ESP32-S3 box for more than a year, and never had time to open and test it
 I would also like openHAB to support its audio capability.
But i’m not so sure that it “wouldn’t be to hard to add”, though :sweat_smile:

seime, author of espHOME binding says that the binding is open to contrib on this subject.
I’m still working on the Wyoming protocol add-on for openHAB (low level java classes for audio input/output are done, but not much more unfortunately), but I definitely also think about other solutions like the espHOME devices, or the Home Assistant Voice PE (assuming they are using the same procotol ?). Not enough time ! :smiling_face_with_tear:

I have this working:

Not sure if it would cope with Romanian though😕