Integrate a Snips voice assistant with openHAB - Walkthrough

r27 · January 7, 2019, 6:48pm

Sure, here is my asound.conf file.

pcm.!default {
  type asym
  playback.pcm {
    type plug
    slave {
      pcm "hw:1,0"
      rate 48000
      format "S16_LE"
      channels 2
    }
  }
  capture.pcm {
    type plug
    slave.pcm "hw:1,0"
  }
}

If you don’t have this file then
copy text
then ssh to pi and do:
nano /etc/asound.conf
paste: ctrl + v
write: ctrl + o
exit: ctrl + x

I have two units and I am happy with performance, including range.

miba · January 7, 2019, 8:13pm

Thank you very much, Ra!

EinfachArne · January 20, 2019, 11:10pm

Hey guys,

I hope that this is the right place for my question. Is there anybody interested on a proper App for Snips that allows to use OpenHAB similar to Alexa, etc? At least I would like to offer my support for such a project.

ruebox · May 11, 2019, 6:36pm

Hi.
Snips seems to be a good and private approach for voice recognition also in combination with openhab.
In the tutorials it seems that a login at the snips server is required.
Is this really the case or can I run snips completely offline?

Thx

JGKK · May 11, 2019, 7:07pm

You have to login online to make your assistant. You use the snips servers to teach the model that you download to your raspberry pi. Once you have downloaded this model you can run it completely offline.
This also means that to make changes to your assistant you always need to log into the web console and you do all your model retraining in the cloud.
But as the actual use of the trained assistant happens on your hardware offline snips servers never actually hear your voice and nothing gets recorded or send to them in daily use.
Best regards Johannes

sepia-assistant · June 6, 2019, 5:58pm

If you want to try a 100% open-source voice-control that supports openHAB as well and does not require interaction with any foreign server SEPIA Open Assistant might be interesting as well

Lolodomo · September 30, 2019, 9:37pm

For information, I will start working on a way to integrate Snips (or at least a part of Snips features) in openHAB.

miba · October 1, 2019, 4:17am

Excellent!

Lolodomo · October 28, 2019, 2:42pm

I explained my intentions in the Git issue.
I would appreciate help to define the set of intents we should define by default in openHAB.
Once done, I would need another help to setup and publish this set of intents in different languages in Snips Console. Of course I will provide the French one.

Lolodomo · December 10, 2019, 11:40am

Very bad news: with Snips now bought by Sonos, this will be the end of Snips as open. They plan to close Snips console at the end of January. We have until this date to deploy a voice assistant from the console to our hardware. Then there will be no way to update/enhance the voice assistant.

By the way, as my work on openHAB side was almost finished, I will finish it and publish it. And I will try to publish something on Snips console before they close it, at least for French users.

But the Snips solution has unfortunately no future for us.

miba · December 10, 2019, 12:25pm

Indeed. Seems like I have wasted lots of time to get this integrated into my openHAB setup. What a pity!

Is there any equivalent successor in sight? SEPIA Open Assistant doesn’t make a big impression on me yet.

aditya_Sharma · December 10, 2019, 3:41pm

Please sign this petition if you want to let SNIPS be open for the community

In the mean time I think its worth having a look at RHASSPY which seems like a worthy replacement for SNIPS.

sepia-assistant · December 10, 2019, 7:15pm

SEPIA:

has a cross-platform client that works on any browser, Android and iOS (kind of), that can be customized (25+ skins) and that can work similar to a smart-speaker when in always-on mode using the integrated hotword detection (Porcupine)
has a headless prototype of the same client for people who don’t want to use a display
can use native speech recognition or its own STT server (Kaldi)
can use native, cloud and platform based speech-to-text (Google, Windows, Apple, eSpeak, etc.)
has a central server that runs on any RPi (>=1GB RAM) and transfers data between clients and users via Websocket connection
has a multi-turn dialog module and a customizable NLU module (including a Python interface for stuff like Rasa) that understands over a dozen user intents out-of-the-box in English and German
has a service for each of the 12+ intents (radio, smart home, to-do lists, alarm, news, weather, navigation, …)
has a teach-ui to create your own commands on the fly and a SDK to create custom services in Java that every user can install individually via website
can be accessed from home or when traveling
has extensions like the Mesh-Node server that securely connects to the main server and is integrated in the teach-ui and SDK
supports openHAB and FHEM
… and much more …

@miba what is required to impress you? I can work on that

JGKK · December 11, 2019, 6:16am

@sepia-assistant a few questions:
Has it got the tools inbuild for offline Training of the statistical language model, dictionary and acoustic model that you need for pocketsphinx or Kaldi? A choice of different Hotword services with custom wakeword for offline use for example the integration for the training service api of snowboy to do create wakewords from the interface? Which vad do you use and how do you handle multiple voice input sides/ satellite sites? Do you have an open api over Mqtt or Rest to do your own intent parsing in addition to the python interface? Just a few thoughts of the top of my head. Basics that are needed for an offline speech assistant alternative.
Edit I don’t see any easy way to set up a multiple site voice setup with hotword coalescing or similar and only a fairly limited way to train your own complex intents with easy slot value extraction is this right? You would have to do all this yourself on top of sepia? Do you use the general german model for offline speech recognition or do you train a custom one based on the intents?
Best regards Johannes

JGKK · December 11, 2019, 6:32am

I would second to have a look at Rhasspy as it ties together many nice open source projects in a good way or use the services that you find there to build something of your own.
You can just install the Docker image to try it out.
https://rhasspy.readthedocs.io/en/latest/

miba · December 11, 2019, 6:33am

I am sorry, Florian - I did not mean to sound so negative!

Please correct me if I’m wrong but as far as I can tell there are no regular contributors to this project other than you. A community apparently has not yet emerged. That can and should change, of course, and I wish you every success! I will definitely observe the project.

sepia-assistant · December 11, 2019, 11:09am

@JGKK thanks for your questions, here are some answers:

Has it got the tools inbuild for offline Training of the statistical language model, dictionary and acoustic model that you need for pocketsphinx or Kaldi?

The STT server has an endpoint to adapt the language model and I’ve recently added an endpoint to the SEPIA server to export all custom commands as well. Automatic conversion of missing words to phonemes for the dictionary is not yet built in but on the top of the priority list. Actually I spoke to Michael from Rhasspy yesterday and he mentioned that he’s using Zamia (Kaldi) as well so it might be that models trained with Rhasspy are 100% compatible with SEPIA . I will check this out soon. About acoustic model training: obviously there are recipes for Kaldi, but I would not recommend any “normal” user to do this, since Peter from Zamia is training them on strong graphics card and it usually takes over a week . To my experience it is usually not required if your LM has sizes typically for Snips or Rhasspy.
A word about Pocketsphinx. I’ve been working with this intensely when I built ILA voice assitant including all the above mentioned things (AM, LM, dict. adaptation) but ultimately gave up because the technology stack is not up-to date and WER was usually way too high. Because of that I’ve decided not to support this in SEPIA.

A choice of different Hotword services with custom wakeword for offline use for example the integration for the training service api of snowboy to do create wakewords from the interface?

Porcupine is the only service that is integrated deeply into the client because of its browser support but … you can use any hotword detection or any remote trigger you want via SEPIA’s remote action endpoint as demonstrated in this little video (there is a Python library to help with the integration). There is no web-interface to build your own hotword.

Which vad do you use and how do you handle multiple voice input sides/ satellite sites?

The client has support for the Mozilla VAD library by Kelly Davis, but VAD is usually handled on the STT server which has support for WebRTC VAD. Currently I’m limiting input to 4s though. What exactly do you mean by “voice input sides/ satellite sites”? Different clients that get activated at the same time by a user speaking the hotword? Each hotword trigger can target a specific device ID and user ID. If there are 2 devices with the same ID and same user logged in the last active device would be triggered (the Websocket server is keeping track of activation state).

Do you have an open api over Mqtt or Rest to do your own intent parsing in addition to the python interface?

Yes. For example there are REST APIs for intent recognition (interpret) and dialog management (answer) and in theory you can access the same APIs via the Websocket server. There is no official support for MQTT protocol yet. The Python interface is one of many modules that can supply the ‘interpret’ endpoint with results.

[…] a fairly limited way to train your own complex intents with easy slot value extraction is this right?

With the Teach-UI inside the app you can define custom sentences for intents that already exist. With the Java SDK you can define arbitrarily complex (or simple) services that can use SEPIA’s existing “slots” or define your own including questions SEPIA should ask you if parameters are missing to fulfill an intent. Existing parameters/slots that you can use out-of-the-box for services like smart home are for example: DeviceType (lights, heater, shutter, sensor etc.), Action (on, off, toggle, set, show, …), DeviceValue (70%, 20°C, 11, etc.), TimeDate (Tomorrow at 8a.m., …), Room (living-room, office, hallway, …) etc…

Do you use the general german model for offline speech recognition or do you train a custom one based on the intents?

“General” is the default model of the STT server. When you define your own LM you can use the SEPIA control HUB to switch between the models.

@miba:

Please correct me if I’m wrong but as far as I can tell there are no regular contributors to this project other than you. A community apparently has not yet emerged. That can and should change, of course, and I wish you every success! I will definitely observe the project.

Yes that is unfortunately true, it seems I’m pretty bad at marketing . When I uploaded the source code around 1.5 years ago to GitHub the project was already pretty big since I was working on it with a small start-up (I was the only programmer for most of the time) and it was meant to be a replacement for Siri and Google Assistant that gives you back control over your data. The start-up went separate ways and I decided to make my code 100% open-source and since then I’ve been working on breaking down everything into smaller parts and rewriting stuff for developers (besides adding new features ).
Because of this history SEPIA was always more focussed on the app and end-users when compared to Rhasspy or Snips. It is kind of similar to Mycroft at least in its goal to offer a voice assistant that works out-of-the-box with minimal configuration yet offers developers tools to improve it and build own services. The same is valid for the openHAB integration: install SEPIA, add your openHAB server, control your devices.

I’ve decided to contact Michael from Rhasspy to start a discussion about how we could bring together the best of both worlds. Rhasspy basically covers SEPIA’s interpret and STT modules and offers a nice web-interface to manage the things I discussed above. From what I’ve seen so far both projects might actually work very well together … let’s see .

(Sorry for the wall of text )

JGKK · December 11, 2019, 11:25am

Thanks for the thorough answer I will keep an eye on your project. For now I’ll stay with my own solution based on nodered:see_no_evil:
I actually have had very good experiences when using pocketsphinx python with a custom 3 gram lm and a custom dictionary when working with a small vocabulary like in a smart home focused environment (a few hundred words of vocabulary and maybe a few thousand sentences that the language model is based on). And in my case it gave me better performance when running everything on one raspberry pi than kaldi did.
In quiet environments I have a stt Hit Rate of well over 95 percent.
With satellites and Hotword coalescing I mean when you have several voice input sites throughout your home but several of them overlap. So you only want the closest one to trigger if two Heard- the wakeword. I implemented a time stamp based approach but you have to integrate a custom offset for the delay certain microphones have.
Johannes

sepia-assistant · December 11, 2019, 12:07pm

Thanks for the thorough answer I will keep an eye on your project. For now I’ll stay with my own solution based on nodered

From all I’ve seen so far Node-red looks pretty cool. It’s been on my to-do list to check out possible synergies with SEPIA for a long time now ^_^. I like the idea of building and connecting stuff with graphical interfaces like this or Blockly for example. Actually I’m planning to do something similar for the SEPIA SDK so people don’t have to write code in Java .

I actually have had very good experiences when using pocketsphinx python with a custom 3 gram lm and a custom dictionary when working with a small vocabulary like in a smart home focused environment (a few hundred words of vocabulary and maybe a few thousand sentences that the language model is based on).

I always had the feeling that it is super-sensitive to the hardware you use and ambient noise. What kind of microphone do you use? It’s around 2-3 years ago that I last used it, maybe I should give it another shot.

when you have several voice input sites throughout your home but several of them overlap. So you only want the closest one to trigger if two Heard- the wakeword. I implemented a time stamp based approach but you have to integrate a custom offset for the delay certain microphones have.

Ok then I understood you right. I haven’t focused on this issue yet. By default SEPIA behaves as described above when you separate the remote trigger from the client (same device ID + same user ID = last active client gets the trigger). When both clients use the client-side wakeword trigger it might be possible in SEPIA to filter identical messages with very close timestamps inside the server. I will put this on my to-do list

aditya_Sharma · December 11, 2019, 1:03pm

wow! Thats sounds really really wonderful. Thanks for all that you’ve been doing. Really appreciate.