Completely local voice control using community add-ons

Hello everyone I spent the past weeks refining a configuring and workflow to have completely local voice control in OpenHAB.

The overall design solely relies on marketplace add-ons except for ollama-cpp. In total the tools used are:

  1. Whisper - Speech to Text
  2. Piper - Text to Speech
  3. Rustpotter - Wakeword detection
  4. Cuevox - Rule interpreter
  5. HABSpeaker - Wireless microphone and speaker
  6. Ollama-cpp - Large Language Model (LLM) server

I covered my solution in a guide on my blog here: OpenHAB voice control for home automation | Dantali0n

I hope this way it can be easily indexed by search engines and many people who look for it can find it online.

The steps are elaborate but not hard it basically boils down to installing all the add-ons and configuring them, recording a whole bunch of audio and training the wake word detection and creating some glue logic items and things.

The audio for speech recognition is streamed to OpenHAB using an Android app called HABSpeaker developed by @Miguel_M.A.D they also developed the rustpotter wake word detection so big thanks to them for making this possible. I also want to thank @JanMattner for the collaboration on improving the cuevox rule interpreter: Cuevox - A Rule Based Voice Interpreter [4.0.0.0;4.9.9.9] - #9 by JanMattner

I wanted to submit this topic here so users could try out to get the setup working for them and so that I could help them with a discussion here if things are unclear. I can then go back and iteratively improve the blog post.

Hope that is okay.

All the best,
Corne (Dantali0n)

12 Likes

Awesome - thanks for this!! :grinning_face:

I’ve been looking at how to integrate something like this, but got stuck after getting whisper and piper working, so will definitely be checking out the blog for implementing wakeword detection, a working interpreter, and LLM, when i get the time to dive into this.

I will definitely post back once i try this out, also when you have been running this for some time do feel free to update the post with some experiences for the setup.

Also what language have you ended up using. I haven’t read in detail your blog, but i see the dutch language there, how has that fared? I’m in Denmark, my initial trials with both TTS and STT weren’t great in terms of accuracy :zany_face:

BR
Mark

I have done everything in English because I found there was a large gap in accuracy and quality between multilingual models and the ones that solely recognize English. For TTS, in Piper specifically ,I use ‘Ljspeech’. For Whisper I use ‘tiny.en’ because otherwise the delay before the response becomes to large.

For a similar reason I use ‘Qwen3:1.7b’ in Ollama because larger models would add to much delay. The rustpotter model does use ‘large’ but I found it to be to inaccurate and trigger on many false positives otherwise. The delay of rustpotter mainly seems related to the length of your wakeword, shorter wakewords having a smaller delay.

All of this together runs on a i5 4690k and it is struggling a bit at times.

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.