Some new voice add-ons: PorcupineKS, GoogleSTT, WatsonSTT, VoskSTT

Hi openHAB community.

I was working the last months on adding services related to voice control.

This is the list:
Keyword Spotter Services:
PorcupineKS: Requires a PicoVoice API key, limited to three devices. README.md
SnowboyKS: PR

Speech-to-Text Services:
GoogleSTT: Uses Google Cloud, 60 min/month free tier. README.md
WatsonSTT: Uses IBM Cloud, 500 min/month free tier. README.md
VoskSTT: works offline, you need to download the model for your language. README.md

I think I won’t add more of those services, I still miss to have a good keyword spotter that do not requires license but I haven’t found anything, let me now if you known a good tool for it.

This is the one I’m working on right now, which is what motivates me to build the other ones. It’s a customizable Human Language Interpreter, which allow you to define your own templates to match command to write/read from your items. Allows to fallback to another interpreter, to use this one just for customization.

Human Language Interpreter:
ActionTemplateInterpreter: template system powered by OpenNLP library. README.md PR

Note that the only audio source ready to use on openHAB (at least the one I was using to test all this) is the System Audio Source which you should select in the openHAB audio config. Other general configurations for this services are under the voice category.

Let me know if you are able to take advance of this services and how good they perform to you. For the HLI I will open another post when is fully ready but it’s on a functional state, you can test the examples at the end of the readme.

Hope they are useful to you, for me having a customizable voice system (on openHAB) was my new year challenge! :smiley:

If you need some guide on how to setup them don’t hesitate to ask.

UPDATE:
I have added a PR here to access to the Pulseaudio sources from openHAB, as commented there, still need some work to make it run perfect with the dialog processor.

Also I like to add that I’m able to start the dialog processor through the console command “voice startdialog”. This was added by the user lolodomo (Github) among other commands, rule actions, rest endpoints… Thank you very much for the help!

5 Likes

Thank you for your work !

I saw your previous topic on the subject but unfortunately didn’t have time to check it.
I’m also very interested in a fully autonomous openHAB voice assistant, in a distributed model across the house. This is my (very long) goal.

Amongst many thing, I plan to make another AudioSource service beside the local one already existing (Within the pulseaudio binding, as I already did the pulseaudio Sink).
(and I would like to have an external KS, and it’s another discussion I hope to develop one day)

For the lack of “open” KS, did you check Precise ? It’s the one used (and created) by Mycroft.
It is fully open source, and there are some precompiled binary available.
Unfortunately, there is no java sdk, only a wrapper in python, so probably a good chunk of reverse engineering is needed.

As I’m always late, I don’t know when I will have time to try your work, but I wanted to let you know that it is much appreciated !

1 Like

Thank you for the comment, really nice to know there is other people that want a " fully autonomous openHAB voice assistant". I think we are not so far to have something that covers most of the common scenarios.

I had already checked precise, but it’s build on top of “pocketsphinx” which is like a full speech recognition toolkit so I think it should perform poorly compared to specific tools for keyword spotting, that is why I decide looking for other options.

I found out yesterday that the code for model generation with snowboy is available, so I have created another add-on for that one. I have cross-compiled the binary so it supports macOS, Debian X86_64, Debian arm7vl and Debian aarch64 (these were almost all the supported platforms by the tool).
It perform a little worst that porcupine (some false positives when I tried a single world model trained by me) but at least do not requires license. I have been blocked multiples times in porcupine (three devices monthly limit) because of testing things in different OSs so for me was important to have an alternative. I’ll add the pr in the main post for visibility.

Also I found this one yesterday https://github.com/wenet-e2e/wekws which seems to be in an early stage but for me look really promising. Hopefully they will add more documentation, I’ll keep an eye on them.

Again thanks for you comments, and let me know how your experience goes when you try to set things up. I think I’ll start setting up everything on my “production” openHAB when the 3.3.0.M2 gets published. I’ll write here how it goes.

I was thinking that an interpreter based on Wolframalpha for adding some knowledge to the house could be a good addition. WDYT? I still need to check the language support they offer. For me, I’ll be using Spanish or Galician (I try to build things that I can use :grin:, but also I’m trying to think in others).

Also if you read this README in some moment, please let me know if it’s enough clear. As I designed the process I see it clear but I have doubts whetter is enough for another user.

Regards!

Hello,

Are you sure about this ? pocketsphinx was the old keyword spotter in Mycroft. It was indeed “generic” and it was possible to use any phoneme (with poor quality result). Precise is a neural network, and needs a model file.
But maybe I’m wrong and they are not so different.
Anyway, you made a KS service for snowboy, so another free KS is not as needed as before.

I saw you made an audiosource for the pulseaudio bundle, I’m glad you beat me to it, my favorite dev is when I don’t have to :smile:

I tried to use the porcupine KS on my dev (windows) environment, but don’t manage to make it work.
First I had a little issue in the PorcupineKSService, line 291, with the use of “/” instead of the File.separator. (I can make a PR but I think I should wait and make it work before)
And second, I get a “ai.picovoice.porcupine.PorcupineInvalidArgumentException: Initialization failed” exception. It’s strange because I step debugged and all the parameter seems perfectly OK.
The lack of information, as it is a native method, is an impediment. Do you have an idea ?

Not related, but how are the build-in integrated keyword distributed ? I didn’t found them in the porcupineks project.
(and as I tried to use the “ok google” keyword, I get an error as the service seems to want a ok_google_windows model file and didn’t find it. (I train my own model on the picovoiceconsole to continue testing)

And another question : how do you start the dialog manager ? I managed to start it by the startDialog action in a rule, but don’t know if it is the handy way. I also found the rest endpoint you mentioned (needs a postman-like tool to trigger) and a console command (but the console is not available on my dev env).
Did I miss this information somewhere (I have to dig the code to find it) ? If not, we should found a place to document this.

Thanks, I will try to continue my tests (but the days are so short) and maybe participate later.

1 Like

Regarding new actions (and new console commands), this is in my TODO list to update the existing documentation page. I am just waiting for the merge of another PR to do it only one time.

I am talking about this page:

2 Likes

I will check it again, maybe I got confused I reviewed a lot of stuff that afternoon.

After trying snowboy with a personal model the results were very poor, so I have closed the pr for now.

You are right the path at line 291 is wrong. I suggest you to train a word on the Picovoice console an use that instead.
The default ks files are distributed inside the jar as they are included with in the porcupine library. So you can also extract one of those and place it under ‘/porcupine/’ that way this part is not executed.

I was having this problem for a long time, the openHAB for my house run on a small arm-64 cluster using docker and Kubernetes. When I tried to enter to the console using the command line through ‘kubectl exec’ it hangs the terminal. The solution for my was to use kubectl port forward and enter to the console through ssh. Hope it’s the same problem you are facing, it was a nightmare don’t be able to use console.

There is not a handy way at the moment. In the end we need a configuration panel that allows to configure the dialog processing for multiples source/sink pairs, but for now it think the easy way to do it is from a rule.

I can open the PR to fix the porcupine add-on tonight, the service name should be also fixed. It should not display the Keyword Spotter part of the name on the ks list inside the voice config.

EDIT:
The pr [porcupineks] fix build-in keywords on windows, fix service name and add missed modified by GiviMAD · Pull Request #12410 · openhab/openhab-addons · GitHub

1 Like