Searching for: Offline Voice Assistant + Speech-To-Text + Privacy

Hello,
i know there are similar topics already around from the past.
Im trying to sum up what i found and ask if my view is correct and what could be the best direction to take.

My current setup:

  • HomeServer with Docker
  • openHAB running in Docker
  • Google Home Assistant Speakers and openHAB connected to Google Home Assistant

What i am looking for:

  1. Offline voice assistant
  2. Privacy regarding voice commands
  3. (optional) Custom howord for activating voice commands
  4. Availalbility of each triggered command as text in openHAB-Rules (Speech-to-Text)
  5. Home-Speakers/-Microphone → Re-Use of Google Home Devices would be nice, but i think this is limited to Google Home Assistant only
  6. YouTube Music support

What i found:

From the other topics i found these possibilities and try my best to do a quick sum up of each one (as a non-expert):

Google Home Assistant

  • Already working for me
  • Lack of privacy
  • If Speech-to-Text is possible (where im not sure), i would need a paid Google Cloud Service

Mycroft

  • Not fully offline
  • Privacy should be OK
  • Hardware has to be acquired from Mycroft (Mycroft Mark II)
  • YouTube Music not working
  • Connection to openHAB via Binding
  • Pre-Defined Hotwords can be chosen

Snips

  • Seems to be not as maintained as others?

Sepia

  • Powerful and maintained
  • Not sure about the possiblies regarding HomeSpeaker-Hardware
  • Communication to openHAB via MQTT

Bottom line

  1. Did i get the possiblities more or less right?
  2. Is it possible to use e.g. Sepia in parallel to my current setup, but only for Speech-to-Text conversion of all my voice-commands, recognized from Google-Home-Assistant?
  3. Are there Hardware-Solutions for HomeSpeakers i missed?
  4. Does anyone have advices for me where to go, or is the current situation limited by the described topics

Thanks in advance and sorry if i got one or the other not fully correct.

Edit:
Seems that i did miss one very important topic, related to Sepia which is very helpful for me:

SEPIA Open Assistant - Privacy respecting, self-hosted voice-control - Apps & Services / 3rd Party - openHAB Community

Talk to your Smart Home via SEPIA Open Assistant | by Florian Quirin | SEPIA Framework | Medium

Im very impressed and i think this looks very promising to me

There are multiple tts engines available in Linux. I have used this one previouslywhich is nice.

1 Like

Thank you!
Basically I am searching for
Speech
To
Text
To analyze the commands in openHAB more individual than e.g. Google Assistant is providing it for me

That’s not right, you can also deploy it on a Raspi.

To put this into some greater context:
I’d advise anyone with this starting point to reconsider the absolute requirement that speech recognition has to be offline.
This results in A LOT of work to you ahead at installation time as well as on any extension works, lots of issues in maintaining and use (offline speech recognition is never as good as online is) and lack of applicability.
There’s always a tradeoff between privacy and efforts.
Funnily, this ‘absolute’ requirement is very often issued by people that in turn don’t pay nowhere near the same level of attention to keeping ALL of their communications and private data private in applications ‘next door’ (you already own Google Home devices, you use YouTube Music, you shop @Amazon, you are on Fazebuck etc).
You can, just for example, as-well install an Alexa with a dummy account outside your LAN and have it talk very well-defined and -controlled to your openHAB through myOpenHAB only.
Simple to setup and maintain and ultimately in fact not really any less ‘private’ let alone ‘secure’.
And the Alexa skill for openHAB is really good and flexible. The amazonechocontrol binding also provides TTS.

1 Like

Thx for the hint with Alexa talking to myOpenHAB
@mstormi could you just give me a tiny explanation about the hardware i could use in this case?

This results in A LOT of work to you …

I fully understand and will experiment a bit with Sepia for fun → For sure i will see the limitations soon

The amazonechocontrol binding also provides TTS.

Again im in Search for STT and NOT TTS → For sure this is again a topic where i would like to analyze the voice commands as strings by myself, but in the end i will see, that the effort will be extremely high :sweat_smile:
→ Anyways i will experiment a bit because its fun: sepia/stt-server - Docker Image | Docker Hub

huh? Any Echo.

With the Alexa skill you can define arbitrary names for your items and with the amazonechocontrol you can even get the literal ‘last voice command’ to further analyze.

1 Like

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.