Hello,
i know there are similar topics already around from the past.
Im trying to sum up what i found and ask if my view is correct and what could be the best direction to take.
My current setup:
HomeServer with Docker
openHAB running in Docker
Google Home Assistant Speakers and openHAB connected to Google Home Assistant
What i am looking for:
Offline voice assistant
Privacy regarding voice commands
(optional) Custom howord for activating voice commands
Availalbility of each triggered command as text in openHAB-Rules (Speech-to-Text)
Home-Speakers/-Microphone → Re-Use of Google Home Devices would be nice, but i think this is limited to Google Home Assistant only
YouTube Music support
What i found:
From the other topics i found these possibilities and try my best to do a quick sum up of each one (as a non-expert):
Google Home Assistant
Already working for me
Lack of privacy
If Speech-to-Text is possible (where im not sure), i would need a paid Google Cloud Service
Mycroft
Not fully offline
Privacy should be OK
Hardware has to be acquired from Mycroft (Mycroft Mark II)
YouTube Music not working
Connection to openHAB via Binding
Pre-Defined Hotwords can be chosen
Snips
Seems to be not as maintained as others?
Sepia
Powerful and maintained
Not sure about the possiblies regarding HomeSpeaker-Hardware
Communication to openHAB via MQTT
Bottom line
Did i get the possiblities more or less right?
Is it possible to use e.g. Sepia in parallel to my current setup, but only for Speech-to-Text conversion of all my voice-commands, recognized from Google-Home-Assistant?
Are there Hardware-Solutions for HomeSpeakers i missed?
Does anyone have advices for me where to go, or is the current situation limited by the described topics
Thanks in advance and sorry if i got one or the other not fully correct.
Thank you!
Basically I am searching for
Speech
To
Text
To analyze the commands in openHAB more individual than e.g. Google Assistant is providing it for me
That’s not right, you can also deploy it on a Raspi.
To put this into some greater context:
I’d advise anyone with this starting point to reconsider the absolute requirement that speech recognition has to be offline.
This results in A LOT of work to you ahead at installation time as well as on any extension works, lots of issues in maintaining and use (offline speech recognition is never as good as online is) and lack of applicability.
There’s always a tradeoff between privacy and efforts.
Funnily, this ‘absolute’ requirement is very often issued by people that in turn don’t pay nowhere near the same level of attention to keeping ALL of their communications and private data private in applications ‘next door’ (you already own Google Home devices, you use YouTube Music, you shop @Amazon, you are on Fazebuck etc).
You can, just for example, as-well install an Alexa with a dummy account outside your LAN and have it talk very well-defined and -controlled to your openHAB through myOpenHAB only.
Simple to setup and maintain and ultimately in fact not really any less ‘private’ let alone ‘secure’.
And the Alexa skill for openHAB is really good and flexible. The amazonechocontrol binding also provides TTS.
Thx for the hint with Alexa talking to myOpenHAB
→ @mstormi could you just give me a tiny explanation about the hardware i could use in this case?
This results in A LOT of work to you …
I fully understand and will experiment a bit with Sepia for fun → For sure i will see the limitations soon
The amazonechocontrol binding also provides TTS.
Again im in Search for STT and NOT TTS → For sure this is again a topic where i would like to analyze the voice commands as strings by myself, but in the end i will see, that the effort will be extremely high
→ Anyways i will experiment a bit because its fun: sepia/stt-server - Docker Image | Docker Hub
With the Alexa skill you can define arbitrary names for your items and with the amazonechocontrol you can even get the literal ‘last voice command’ to further analyze.