OH voice control using Rhasspy and GStreamer - installation and configuration

openhabgs · November 15, 2020, 3:51pm

Summary

Goal of this tutorial is to share my experiences as it took a lot of searching in many forums - so others might save this time. My project is aimed to have full voice control on an openhab installation, using multiple hardware (“satellites”) for audio input. After intense testing of several stt (speach to text) solutions for performance, speed and resource demand I decided to use Rhasspy. As the openhab server most often is located somewhere without direct user access, the audio input of commands needs to be collected by separate devices (Satellite Rhasspy installation, Mobile Android App, Microphones on Laptops, Tablets and workstations) to provide full coverage. This is done via MQTT and/or UDP.

System (Hard/Software)

Raspberri 4
Openhab 2.5.10.
Mosqitto MQTT server and MQTT binding
Rhasspy (docker installation)
GStreamer

Rhasspy setup
Install Rhasspy as described here.
Important: Modify the docker start command as follows:


docker run -d -p 12101:12101 -p 12333:12333/udp\
      --name rhasspy \
      --restart unless-stopped \
      -v "$HOME/.config/rhasspy/profiles:/profiles" \
      -v "/etc/localtime:/etc/localtime:ro" \
      --device /dev/snd:/dev/snd \
      rhasspy/rhasspy \
      --user-profiles /profiles \
      --profile de

as port 12333/udp will be needed later to use Gstreamer

Set up and test Rhasspy in the “Satellite” configuration as base station as described here.

For testing purposes I set up a second Rhasspy installation on a Raspberri 3, configured as Satellite and having the ReSpeaker-Mic-Array as powerful Microphone for Audiorecording.

After intense testing, I decided to use

MQTT: external (connect to your MQTT server)
Wakeword: Porcupine
SpeechtoText: Kaldi
IntentRecognition: Fsticuffs
AudioPlaying: aplay
Dialogue management: Rhasspy

Connect Rhasspy to Openhab

In the Rhasspy setup you will find some preconfigured sentences which you can use for testing, e.g.
[ChangeLightState] light_name = (wohnzimmerlampe) {name} light_state = (ein | aus) {state} schalte (die | das) <light_name> <light_state>

Go to inbox and manually create a generic MQTT thing (e.g. “Rhasspy”), add a channel (e.g. “Lightswitch”) and enter as MQTT state topic “hermes/intent/ChangeLightState”.
Connect the channel to an item (e.g. “String Sprachbefehl_Lichtschalter”).
Now you will receive the Rhasspy-output as string into that item, whenever Rhasspy detects a voice command.

A typical rule for testing could look like:
rule “Lichtschalter” when Item Sprachbefehl_Lichtschalter received update then var String intentItemSlot = transform(“JSONPATH”, “$.slots[?(@.slotName==‘name’)].value.value”, Sprachbefehl_Lichtschalter.state.toString) var String intentStatusSlot = transform(“JSONPATH”, “$.slots[?(@.slotName==‘state’)].value.value”, Sprachbefehl_Lichtschalter.state.toString) logInfo (“rhasspy.rules testing”, "Rhasspy raw JSON: " + Sprachbefehl_Lichtschalter.state.toString) logInfo (“rhasspy.rules testing”, "Rhasspy intentItemSlot: " + intentItemSlot) logInfo (“rhasspy.rules testing”, "Rhasspy intentStatusSlot: " + intentStatusSlot) end
and would provide the following log entries:
2020-11-15 16:09:22.786 [INFO ] [e.model.script.rhasspy.rules testing] - Rhasspy raw JSON: {“input”: “schalte die sofalampe aus”, “intent”: {“intentName”: “ChangeLightState”, “confidenceScore”: 1.0}, “siteId”: “RhasspySatS10”, “id”: null, “slots”: [{“entity”: “name”, “value”: {“kind”: “Unknown”, “value”: “sofalampe”}, “slotName”: “name”, “rawValue”: “sofalampe”, “confidence”: 1.0, “range”: {“start”: 12, “end”: 21, “rawStart”: 12, “rawEnd”: 21}}, {“entity”: “state”, “value”: {“kind”: “Unknown”, “value”: “aus”}, “slotName”: “state”, “rawValue”: “aus”, “confidence”: 1.0, “range”: {“start”: 22, “end”: 25, “rawStart”: 22, “rawEnd”: 25}}], “sessionId”: “d7118b4d-df15-2b76-e0f2-10ba12a70615”, “customData”: null, “asrTokens”: [[{“value”: “schalte”, “confidence”: 1.0, “rangeStart”: 0, “rangeEnd”: 7, “time”: null}, {“value”: “die”, “confidence”: 1.0, “rangeStart”: 8, “rangeEnd”: 11, “time”: null}, {“value”: “sofalampe”, “confidence”: 1.0, “rangeStart”: 12, “rangeEnd”: 21, “time”: null}, {“value”: “aus”, “confidence”: 1.0, “rangeStart”: 22, “rangeEnd”: 25, “time”: null}]], “asrConfidence”: null, “rawInput”: “schalte die sofalampe aus”, “wakewordId”: null, “lang”: null} 2020-11-15 16:09:22.800 [INFO ] [e.model.script.rhasspy.rules testing] - Rhasspy intentItemSlot: sofalampe 2020-11-15 16:09:22.803 [INFO ] [e.model.script.rhasspy.rules testing] - Rhasspy intentStatusSlot: aus

BTW, for testing purposes I found the “MQTT explorer” to be extremely useful.

Audio input from Satellites

In order to be flexible one should be able to issue voice commands anywhere in the house.
I identified 3 possible devices (there might be more):

Satellite Raspi with ReSpeaker-Mic-Array - e.g. in the living room
Mobile phone(s)
All laptops, workstations, tablets (as one is sitting near them anywhere most of the time…)

The setup of a “satellite” installation of Rhasspy on another Raspi (or any other hardware) is described here in detail. I personally find this a bit “overshot” as the only function is to collect audio input and send via MQTT to the base station.

The Rhasspy-mobile-app for Android literally worked “out of the box”. I set it up for MQTT, no wake word and activated “silence detection”.

Important: Don’t forget to enter the IDs of your satellites (e.g. “RhasspySATS10”) for the mobile phone app in the base settings of Rhasspy - otherwise the commands will be ignored.

Audio input using Gstreamer

The idea is to speak into your laptop or tablet’s microphone an openhab command. The tool for this can be “Gstreamer” - the set up proved to be the most tricky part.

First you need to install it on all devices (Windows PC, Raspberry etc. ) you want to use in the setup.
The installation for the Raspi looks like:
sudo apt-get install gstreamer1.0-tools sudo apt-get install gstreamer1.0-plugins-good sudo apt-get install gstreamer1.0-plugins-bad sudo apt-get install gstreamer1.0-plugins-ugly

The next trap might be that the port 12333 used for the udp-transport of audio is not open.
In order to fix this, install and run
sudo apt-get install ufw pi@RhasspySat1:~ $ sudo ufw allow 12333 Rules updated Rules updated (v6)

You might also need to do the same on your windows devices (firewall) and allow udp in the Router.

Finally you will have to set the Audio Recording on the base Rhasspy to “Local command”.
The example provided here on the Rhasspy site did not work for me - these settings finally worked:

Record program: gst-launch-1.0
Record arguments: udpsrc port=12333 ! rawaudioparse use-sink-caps=false format=pcm pcm-format=s16le sample-rate=16000 num-channels=1 ! queue ! audioconvert ! audioresample ! filesink location=/dev/stdout

To use my Windows computers as input devices, I installed Gstreamer and then enter e.g. the following string in “cmd”:
C:\gstreamer\1.0\msvc_x86_64\bin>


gst-launch-1.0  autoaudiosrc ! audioconvert ! audioresample ! audio/x-raw, rate=16000, channels=1, format=S16LE ! udpsink host=192.168.178.37 port=12333

Setting pipeline to PAUSED … Pipeline is live and does not need PREROLL … Pipeline is PREROLLED … Setting pipeline to PLAYING … New clock: GstAudioSrcClock Redistribute latency… 0:00:02.0 / 99:99:99.
This works well, even you put to sleep mode the PC or reboot the Raspi, the connection continues to work. The above command can be entered in a Windows batch file which is located in the Windows startup folder : How to find auto startup folder in Windows 10/11 (Where is it)?.

Another advantage of the “continuous streaming” done by Gstreamer is that you can train and set wakewords on the base Rhasspy and they will be detected from everywhere.