Cloud Free Voice Control

Hi all,

I’ve decided to share my experience with offline voice control and openHAB, as it’s working surprisingly well and I haven’t seen it around here (yet). All of this is still in heavy development, so if you have any kind of improvement ideas, pls. feel free to share them with me.

What’s involved:
I’m running some Raspberry Pis as satellites around the house to capture the voice commands. (lately it’s some Raspberry Pi Zero 2 Ws with Seeed ReSpeaker Mic Hats)
Those satellites use the picovoice hotword detection (“Porcupine”) as well as picovoice’s speech to intend engine “Rhino”.
The software on the satellites provides the intensions via MQTT to openHab.
Within openHAB I’m using a rule to interpret the received commands to actions as well as VoiceRSS to output feedback via my Sonos Speakers.

Setting up the satellites:
Start with a fresh Raspbian installation and follow the instructions for the ReSpeaker. (e.g. 4 Mic Array or the 2 Mic Hat). Actually this should work with any kind of microphone, but I’ve had the ReSpeakers around.

Once the ReSpeakers (and their nice LEDs) are working we can install Porcupine and Rhino. I guess it also works with the picovoice SDK, but I started with both enginges separately and don’t really see a reason why to migrate.

Now we have to create our model and/or wake work via the picovoice console.
I’m using the built in wake words (why should I call this thing anything but Jarvis :grin:) and a german Rhino context model: Jarvis.txt (3.2 KB) This file should actually be called *.yml, but the board does not allow *.yml files.

If I recall correctly, we should have nearly everything available on the satellites except the MQTT functionality, so we install the Paho MQTT with

pip3 install paho-mqtt

Now we’re ready to run everything on our satellite and post it on our MQTT server. So lets create the everything we need.

Create a folder for jarvis. (I’m doing this directly in the home folder)

cd ~
mkdir jarvis

From the seeed examples we’ll use the code that deals with the LEDs on the Respeakers and copy them into our folder.
Depending on the type of ReSpeaker (we should know this from the Respeaker Setup) used this can be

cp 4mics_hat/interfaces/* jarvis/

or

cp mic_hat/interfaces/* jarvis/

Now let’s move into the folder:

cd jarvis

In case we want to use a context model that’s not english we have to download the corresponding Rhino library.
For German that would be:

wget https://github.com/Picovoice/rhino/blob/master/lib/common/rhino_params_de.pv?raw=true

Last but no least we need to get the Rhino Context model file (*.rhn, not the *.yml I’ve posted here), which was created earlier via the picovoice console and can be downloaded from there, into the folder. (I’ve used SCP to copy it there)

Now we can create the actual code that provides the command to MQTT:

nano jarvis.py
#!/usr/bin/env python3
import struct
import pyaudio
import pvporcupine
import pvrhino
import time
import pixels as p 
import paho.mqtt.client as mqtt
import yaml
import random
import json

p.pixels.wakeup()


mqtt_broker_url = "xxx.xxx.xxx" #Set this to your MQTT Broker URL that's known to openHAB
mqtt_broker_port = 1883
mqtt_base_topic ="jarvis"

access_key = "xxx123xxx123xxx123" #Picovoice API Code
rhino_context = "xxxxxx.rhn" # Filename of Rhino's Context Model
rhino_model = "rhino_params_de.pv" #In case you use a german rhino model. Can be changed to other languages
listen_seconds = 7 #How long does Jarvis listen after the wakeword was detected

porcupine = None
rhino = None
pa = None
audio_stream = None
mqtt_client = None

def say(text): #You'll realize that this is not used at the moment, but I've used it for debugging and might use it going forward as well.
    mqtt_client.connect(mqtt_broker_url, mqtt_broker_port)
    mqtt_client.publish(topic=mqtt_base_topic + "/say", payload=text, qos=0, retain=False)
    mqtt_client.disconnect()

def sendCommand(json):
    mqtt_client.connect(mqtt_broker_url, mqtt_broker_port)
    mqtt_client.publish(topic=mqtt_base_topic + "/command", payload=json, qos=0, retain=False)
    mqtt_client.disconnect()

try:
    porcupine = pvporcupine.create(access_key=access_key, keywords=["jarvis"])
    rhino = pvrhino.create(access_key=access_key, context_path=rhino_context, model_path=rhino_model)

    pa = pyaudio.PyAudio()

    audio_stream = pa.open(
                    rate=porcupine.sample_rate,
                    channels=1,
                    format=pyaudio.paInt16,
                    input=True,
                    frames_per_buffer=porcupine.frame_length)

    mqtt_client = mqtt.Client()

    time.sleep(0.7)
    p.pixels.off()
    while True:
        pcm = audio_stream.read(porcupine.frame_length, exception_on_overflow = False)
        pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)

        keyword_index = porcupine.process(pcm)

        if keyword_index >= 0:
            print("Hotword Detected")
            p.pixels.wakeup()
            is_finalized = False
            timeout = time.time() + listen_seconds
            while True:
                pcm = audio_stream.read(rhino.frame_length, exception_on_overflow = False)
                pcm = struct.unpack_from("h" * rhino.frame_length, pcm)
                is_finalized = rhino.process(pcm)
                if is_finalized:
                    inference = rhino.get_inference()
                    if inference.is_understood:
                        p.pixels.think()
                        sendCommand(json.dumps(inference))
                        break;
                if (time.time() > timeout):
                    break;
            p.pixels.speak()
            time.sleep(2)
            p.pixels.off()
finally:
    if porcupine is not None:
        porcupine.delete()

    if rhino is not None:
        rhino.delete()

    if audio_stream is not None:
        audio_stream.close()

    if pa is not None:
        pa.terminate()

OK, now we get a nicely usable JSON file posted to our MQTT Server in case the command was known and understood.

I’m running the jarvis.py as a system service on the satellites.

Setting up openHAB:
I assume we have a working openHAB instance including a functioning MQTT setup.

First we have to create Jarvis thing in openHab which has 2 channels.

UID: mqtt:topic:localMQTTBroker:jarvis
label: Jarvis MQTT
thingTypeUID: mqtt:topic
configuration: {}
bridgeUID: mqtt:broker:localMQTTBroker
channels:
  - id: say
    channelTypeUID: mqtt:string
    label: say
    description: ""
    configuration:
      stateTopic: jarvis/say
  - id: command
    channelTypeUID: mqtt:string
    label: command
    description: null
    configuration:
      stateTopic: jarvis/command

I’m still using textual item creation, so I’ve created two items for the two channels:

String jarvis_Say "Jarvis sagt: [%s]" (others)  {channel="mqtt:topic:localMQTTBroker:jarvis:say"}
String jarvis_Command "Kommando: [%s]" (others)  {channel="mqtt:topic:localMQTTBroker:jarvis:command"}

Now we should get the input from our satellites to openHAB and we can start interpreting them via a rule. The intentions or commands provided via the satellites are partially translated via map files. Create your own based on your Rhino context model and what you’re trying to achieve.

I’m using the following:

jarvisColor.map:
blau=240
grün=120
orange=30
pink=300
lila=270
rot=0
gelb=45
weiß=T100
weiss=T100
warm=T100
kalt=T0

jarvisIntensity.map:
Null\ Prozent=0
Zehn\ Prozent=10
Zwanzig\ Prozent=20
Dreissig\ Prozent=30
Vierzig\ Prozent=40
Fünfzig\ Prozent=50
Sechzig\ Prozent=60
Siebzig\ Prozent=70
Achtzig\ Prozent=80
Neunzig\ Prozent=90
Hundert\ Prozent=100

jarvisLocation.map:
Badezimmer=Badezimmer
Bad=Badezimmer
Schlafzimmer=Schlafzimmer
Flur=Flur
Küche=Küche
Wohnzimmer=Wohnzimmer
Ess\ bereich=Essbereich
Büro=Büro
Eingangsbereich=Flur
=not\ defined

jarvisScene.map:
Ambiente=abend
Abendstimmung=abend
Abendessen=abendessen
Fernsehen=tv
Gammel\ Stimmung=tv
Fernseh\ Stimmung=tv
Fauler\ Sonntag=tv

jarvisState.map:
an=1
aus=0
ein=1
auf=1
ab=0

Here’s my rule at the moment. (I’m sure this is far from ideal and I could potentially also use openHAB’s context model to make things more efficient)

rule "Say Rule"
when
    Item jarvis_Say received update
then
    say(jarvis_Say.state.toString, "voicerss:deAT", "sonos:PLAY5:RINCON_XXXXXXXXXXXXXXXX", new PercentType(50))
end


rule "Command Rule"
when
    Item jarvis_Command received update
then
    
    val cmd = jarvis_Command.state.toString
    var intend = transform("JSONPATH", "$[1]", cmd)
    var location = transform("MAP", "jarvisLocation.map", transform("JSONPATH", "$[2].location", cmd))
    if(location == cmd) location = "not defined"
    
    var success = false
    switch(intend) {
        case "changeState":
        {
            var state = transform("JSONPATH", "$[2].state", cmd)
            var onOff = if( transform("MAP", "jarvisState.map", state) == "1") ON else OFF
            switch(location)   {
                case "Badezimmer": {
                    EG_Bad_L.sendCommand(onOff)
                    success = true
                }
                case "Schlafzimmer": {
                    //OG_Schlafz_L.sendCommand(onOff)
                    success = true
                }
                case "Flur": {
                    EG_Eing_L.sendCommand(onOff)
                    success = true
                }
                case "Wohnzimmer": {
                    EG_Wohn_L.sendCommand(onOff)
                    success = true
                }
                case "Essbereich": {
                    EG_Ess_L.sendCommand(onOff)
                    success = true
                }
                case "Büro": {
                    EG_Buero_L.sendCommand(onOff)
                    success = true
                }
                case "Küche": {
                    EG_Kueche_L.sendCommand(onOff)
                    success = true
                }
                case "not defined": {
                    if(onOff == ON) {
                        EG_Ess_I.sendCommand(onOff)
                        grgbwLamp_WohnzTV.sendCommand(onOff)
                        EG_Eing_L.sendCommand(onOff)
                    }
                    else egAllLightsKNX.sendCommand(OFF)
                    success = true
                }
            }
        }
        case "changeColor":
        {
            var colorCommand = transform("JSONPATH", "$[2].color", cmd)
            colorCommand = transform("MAP", "jarvisColor.map", colorCommand)
            
            if(colorCommand.startsWith("T"))
            {
                var temp = Integer::parseInt(colorCommand.substring(1))
                switch(location)   {
                    case "Wohnzimmer": {
                        if(rgbwLamp_WohnzTV.state == ON) rgbwLamps_EssbInd_Temp.sendCommand(temp)
                        if(rgbwLamps_Living.state == ON) rgbwLamps_Living_Temp.sendCommand(temp)
                        success = true
                    }
                    case "Essbereich": {
                        if(rgbwLamps_EssbInd.state == ON) rgbwLamps_EssbInd_Temp.sendCommand(temp)
                        success = true
                    }
                    case "Küche": {
                        if(ikZb_Kueche.state == ON) ikZb_Kueche_Temp.sendCommand(temp)
                        success = true
                    }
                    case "not defined": {
                        if(rgbwLamp_WohnzTV.state == ON) rgbwLamp_WohnzTV_Temp.sendCommand(temp)
                        if(rgbwLamps_Living.state == ON) rgbwLamps_Living_Temp.sendCommand(temp)
                        if(rgbwLamps_EssbInd.state == ON) rgbwLamps_EssbInd_Temp.sendCommand(temp)
                        if(ikZb_Kueche.state == ON) ikZb_Kueche_Temp.sendCommand(temp)
                        success = true
                    }
                }
            }
            else
            {
                var hue = Integer::parseInt(colorCommand)
                switch(location)   {
                    case "Wohnzimmer": {
                        if(rgbwLamp_WohnzTV.state == ON) rgbwLamp_WohnzTV_Color.sendCommand(hue+",100,100")
                        if(rgbwLamps_Living.state == ON) rgbwLamps_Living_Color.sendCommand(hue+",100,100")
                        success = true
                    }
                    case "Essbereich": {
                        if(rgbwLamps_EssbInd.state == ON) rgbwLamps_EssbInd_Color.sendCommand(hue+",100,100")
                        success = true
                    }
                    case "not defined": {
                        if(rgbwLamp_WohnzTV.state == ON) rgbwLamp_WohnzTV_Color.sendCommand(hue+",100,100")
                        if(rgbwLamps_Living.state == ON) rgbwLamps_Living_Color.sendCommand(hue+",100,100")
                        if(rgbwLamps_EssbInd.state == ON) rgbwLamps_EssbInd_Color.sendCommand(hue+",100,100")
                        success = true
                    }
                }
            }
            success = true
        }
        case "setLightIntesity":
        {
            var intensity = Integer::parseInt(transform("MAP", "jarvisIntensity.map", transform("JSONPATH", "$[2].intensity", cmd)))
            //logInfo("Jarvis Light intensity", "Setting light level to " + intensity + " in location " + location)
            switch(location)   {
                case "Flur": {
                    ikZb_Eingang_Dim.sendCommand(intensity)
                    success = true
                }
                case "Wohnzimmer": {
                    rgbwLamps_Living_Dim.sendCommand(intensity)
                    rgbwLamp_WohnzTV_Dim.sendCommand(intensity)
                    success = true
                }
                case "Essbereich": {
                    ikZb_Essbereich_Dim.sendCommand(intensity)
                    success = true
                }
                case "Küche": {
                    ikZb_Kueche_Dim.sendCommand(intensity)
                    success = true
                }
                case "not defined": {
                    if(rgbwLamp_WohnzTV.state == ON) rgbwLamp_WohnzTV_Dim.sendCommand(intensity)
                    if(rgbwLamps_Living.state == ON) rgbwLamps_Living_Dim.sendCommand(intensity)
                    if(rgbwLamps_EssbInd.state == ON) rgbwLamps_EssbInd_Dim.sendCommand(intensity)
                    if(ikZb_Eingang.state == ON) ikZb_Eingang_Dim.sendCommand(intensity)
                    if(ikZb_Kueche.state == ON) ikZb_Kueche_Dim.sendCommand(intensity)
                    success = true
                }
            }
        }
        case "setRaffstores":
        {
            jarvis_Say.postUpdate("Ich weiß, dass ich das können sollte, aber so weit bin ich leider noch nicht.")
            success = true
        }
        case "setScene":
        {
            var scene = transform("JSONPATH", "$[2].scene", cmd)
            var targetScene =  transform("MAP", "jarvisScene.map", scene)
            scn_House.postUpdate(targetScene)
            success = true
        }
        case "setRaffstoreLamellen":
        {
            jarvis_Say.postUpdate("Ich weiß, dass ich das können sollte, aber so weit bin ich leider noch nicht.")
            success = true
        }
        case "setCleaning":
        {
            if(location == "not defined") roboRockcontrol.sendCommand("START")
            else
            {
                val segmentCommand = "{\"segment_ids\": [\"" + transform("MAP", "rockoSegments.map", location) + "\"],\"iterations\": 1,\"customOrder\": true}" 
                roboRockSegment.sendCommand(segmentCommand)

            }
            jarvis_Say.postUpdate("OK, wir kümmern uns drum.")
            success = true
        }
        case "stopCleaning":
        {
            roboRockcontrol.sendCommand("HOME")
            //jarvis_Say.postUpdate("OK, wir kümmern uns darum.")
            success = true
        }
    }
    if(!success) jarvis_Say.postUpdate("Ich fürchte das geht nicht.")
end

I hope this essay is useful for someone. As said: I’m at the very beginning of this whole thing and there’s much room for improvement, but I’m sure there are some smart people in the community that can actually give me some hints. :slight_smile:

Best, C

7 Likes

Using VoiceRSS doesn’t correlate with your “Cloud Free” declaration. :thinking:

Actually the output commands are still cached locally and not called from the cloud upon execution.
Anyways that’s not part of the voice control per se and just used for feedback mainly during development.

1 Like

Good point, :+1:

Thanks for the write up, will have to check it out over the Christmas break.

How does it compare to this method used by HA?

2 Likes

Hi,

that’s actually a pretty neat solution too, thanks for sharing it. (Didn’t know it so far)
Seems like this is mainly a Home Assistant integration of Rhasspy, so if I’d like to integrate it with openHAB, I’d directly go for Rhasspy. It’s incredibly close to picovoice as such. Could be that they both started from what was left from snips ai after they partnered with Sonos.

I stumbled across picovoice originally and I like the UI (the “console”) to create the context models as it’s really easy to use. (No coding needed)

1 Like

This topic was automatically closed 41 days after the last reply. New replies are no longer allowed.