Some new voice add-ons: PorcupineKS, GoogleSTT, WatsonSTT, VoskSTT

Miguel_M.A.D · February 20, 2022, 10:21am

Hi openHAB community.

I was working the last months on adding services related to voice control.

This is the list:
Keyword Spotter Services:
PorcupineKS: Requires a PicoVoice API key, limited to three devices. README.md

Speech-to-Text Services:
GoogleSTT: Uses Google Cloud, 60 min/month free tier. README.md
WatsonSTT: Uses IBM Cloud, 500 min/month free tier. README.md
VoskSTT: works offline, you need to download the model for your language. README.md

I think I won’t add more of those services, ~~I still miss to have a good keyword spotter that do not requires license but I haven’t found anything, let me now if you known a good tool for it.~~

This is the one I’m working on right now, which is what motivates me to build the other ones. It’s a customizable Human Language Interpreter, which allow you to define your own templates to match command to write/read from your items. Allows to fallback to another interpreter, to use this one just for customization.

Human Language Interpreter:
ActionTemplateInterpreter: template system powered by OpenNLP library. README.md PR

Note that the only audio source ready to use on openHAB (at least the one I was using to test all this) is the System Audio Source which you should select in the openHAB audio config. Other general configurations for this services are under the voice category.

Let me know if you are able to take advance of this services and how good they perform to you. For the HLI I will open another post when is fully ready but it’s on a functional state, you can test the examples at the end of the readme.

Hope they are useful to you, for me having a customizable voice system (on openHAB) was my new year challenge!

If you need some guide on how to setup them don’t hesitate to ask.

UPDATE:
~~I have added a PR here to access to the Pulseaudio sources from openHAB, as commented there, still need some work to make it run perfect with the dialog processor~~. the pulseaudio Source is available.

UPDATE:
Please checkout this HAB Speaker (dialog processing on the browser)

Also I like to add that I’m able to start the dialog processor through the console command “voice startdialog”. This was added by the user lolodomo (Github) among other commands, rule actions, rest endpoints… Thank you very much for the help!

dalgwen · February 25, 2022, 4:10pm

Thank you for your work !

I saw your previous topic on the subject but unfortunately didn’t have time to check it.
I’m also very interested in a fully autonomous openHAB voice assistant, in a distributed model across the house. This is my (very long) goal.

Amongst many thing, I plan to make another AudioSource service beside the local one already existing (Within the pulseaudio binding, as I already did the pulseaudio Sink).
(and I would like to have an external KS, and it’s another discussion I hope to develop one day)

For the lack of “open” KS, did you check Precise ? It’s the one used (and created) by Mycroft.
It is fully open source, and there are some precompiled binary available.
Unfortunately, there is no java sdk, only a wrapper in python, so probably a good chunk of reverse engineering is needed.

As I’m always late, I don’t know when I will have time to try your work, but I wanted to let you know that it is much appreciated !

Miguel_M.A.D · March 1, 2022, 9:49am

Thank you for the comment, really nice to know there is other people that want a " fully autonomous openHAB voice assistant". I think we are not so far to have something that covers most of the common scenarios.

I had already checked precise, but it’s build on top of “pocketsphinx” which is like a full speech recognition toolkit so I think it should perform poorly compared to specific tools for keyword spotting, that is why I decide looking for other options.

I found out yesterday that the code for model generation with snowboy is available, so I have created another add-on for that one. I have cross-compiled the binary so it supports macOS, Debian X86_64, Debian arm7vl and Debian aarch64 (these were almost all the supported platforms by the tool).
It perform a little worst that porcupine (some false positives when I tried a single world model trained by me) but at least do not requires license. I have been blocked multiples times in porcupine (three devices monthly limit) because of testing things in different OSs so for me was important to have an alternative. I’ll add the pr in the main post for visibility.

Also I found this one yesterday https://github.com/wenet-e2e/wekws which seems to be in an early stage but for me look really promising. Hopefully they will add more documentation, I’ll keep an eye on them.

Again thanks for you comments, and let me know how your experience goes when you try to set things up. I think I’ll start setting up everything on my “production” openHAB when the 3.3.0.M2 gets published. I’ll write here how it goes.

I was thinking that an interpreter based on Wolframalpha for adding some knowledge to the house could be a good addition. WDYT? I still need to check the language support they offer. For me, I’ll be using Spanish or Galician (I try to build things that I can use , but also I’m trying to think in others).

Also if you read this README in some moment, please let me know if it’s enough clear. As I designed the process I see it clear but I have doubts whetter is enough for another user.

Regards!

dalgwen · March 2, 2022, 1:04am

Hello,

Are you sure about this ? pocketsphinx was the old keyword spotter in Mycroft. It was indeed “generic” and it was possible to use any phoneme (with poor quality result). Precise is a neural network, and needs a model file.
But maybe I’m wrong and they are not so different.
Anyway, you made a KS service for snowboy, so another free KS is not as needed as before.

I saw you made an audiosource for the pulseaudio bundle, I’m glad you beat me to it, my favorite dev is when I don’t have to

I tried to use the porcupine KS on my dev (windows) environment, but don’t manage to make it work.
First I had a little issue in the PorcupineKSService, line 291, with the use of “/” instead of the File.separator. (I can make a PR but I think I should wait and make it work before)
And second, I get a “ai.picovoice.porcupine.PorcupineInvalidArgumentException: Initialization failed” exception. It’s strange because I step debugged and all the parameter seems perfectly OK.
The lack of information, as it is a native method, is an impediment. Do you have an idea ?

Not related, but how are the build-in integrated keyword distributed ? I didn’t found them in the porcupineks project.
(and as I tried to use the “ok google” keyword, I get an error as the service seems to want a ok_google_windows model file and didn’t find it. (I train my own model on the picovoiceconsole to continue testing)

And another question : how do you start the dialog manager ? I managed to start it by the startDialog action in a rule, but don’t know if it is the handy way. I also found the rest endpoint you mentioned (needs a postman-like tool to trigger) and a console command (but the console is not available on my dev env).
Did I miss this information somewhere (I have to dig the code to find it) ? If not, we should found a place to document this.

Thanks, I will try to continue my tests (but the days are so short) and maybe participate later.

Lolodomo · March 2, 2022, 6:56am

Regarding new actions (and new console commands), this is in my TODO list to update the existing documentation page. I am just waiting for the merge of another PR to do it only one time.

I am talking about this page:

Miguel_M.A.D · March 3, 2022, 6:49pm

I will check it again, maybe I got confused I reviewed a lot of stuff that afternoon.

After trying snowboy with a personal model the results were very poor, so I have closed the pr for now.

You are right the path at line 291 is wrong. I suggest you to train a word on the Picovoice console an use that instead.
The default ks files are distributed inside the jar as they are included with in the porcupine library. So you can also extract one of those and place it under ‘/porcupine/’ that way this part is not executed.

I was having this problem for a long time, the openHAB for my house run on a small arm-64 cluster using docker and Kubernetes. When I tried to enter to the console using the command line through ‘kubectl exec’ it hangs the terminal. The solution for my was to use kubectl port forward and enter to the console through ssh. Hope it’s the same problem you are facing, it was a nightmare don’t be able to use console.

There is not a handy way at the moment. In the end we need a configuration panel that allows to configure the dialog processing for multiples source/sink pairs, but for now it think the easy way to do it is from a rule.

I can open the PR to fix the porcupine add-on tonight, the service name should be also fixed. It should not display the Keyword Spotter part of the name on the ks list inside the voice config.

EDIT:
The pr [porcupineks] fix build-in keywords on windows, fix service name and add missed modified by GiviMAD · Pull Request #12410 · openhab/openhab-addons · GitHub

11194 · July 18, 2022, 2:00pm

I’m sorry, how exactly porcupineks should be installed? There is some download section in addon’s source, which should unpack files to /extracted folder, but i can’t see any files appearing in that folder.

Levin1 · December 7, 2022, 9:00pm

Hi Miguel

Thank you for your post.

I’m already playing around with ActionTemplateInterpreter and the built in interpreter and installed rustpotter and VoskSTT.
I also read a lot of posts and the docs but I’m still a little bit confused about voice recognision.

Through the openhab console cli, I’m able to interpret some simulated voice commands. But my Raspberry Pi doesn’t have an analog audio input, and for testing my USB headset microphone is not recognised by openhab (it works using arecord directly on openhabian console but not in OH)

So how would you integrate microphones (one or more in maybe different rooms) in Openhab? Do I have to use the mentioned “pulseaudio”? Are there other options?

Thanks in advance for any help.

Miguel_M.A.D · December 8, 2022, 12:16pm

Hello!
Nice to see other people looking into this.
Please checkout this HAB Speaker (dialog processing on the browser)
Readme
Can be installed from the marketplace.
I don’t think other people have tried it so far so feedback will be really appreciated.
I’m using it in openhab 3.4.0-M5 but I think it should work on 3.3.x without problems.
Hope this helps you, let me know if you need anything else.

PD: I have a pending PR for a new version of the action template interpreter here [actiontemplatehli] Simplify usage and readme by GiviMAD · Pull Request #13529 · openhab/openhab-addons · GitHub I think it’s easier to use. I can moved it to the marketplace if you are interested on taking a look, I’ll do it eventually because I want to add some more changes there and maybe a web ui.
Also maybe less interesting I have a PR for a tone synthesizer to add a customized sound to the dialog before the keyword is detected (after speech recognition starts). [audio|voice] Add actions/commands to synthesize melodies and add configurable melody to the dialog processor by GiviMAD · Pull Request #3139 · openhab/openhab-core · GitHub. But I don’t think it gets into the 3.4.0 release.

Levin1 · December 9, 2022, 8:53am

Thank you, I installed HAB Speaker and will try it.

Regarding actiontemplatehli I only tried two easy commands for two items.
Not all of the documentation was clear to me, but this easy commands did work and I think after playing around a little bit more, it is not that hard to use.

What I don’t like, I can not see the added custom namespace on the item in the UI. I have to “add” a new custom Namespace using the same name to edit or review the configuration. Lets assume I create a lot of templates on different items, I have no idea/overview where I have configured a template and where not.

Another thing I noticed, if I change an existing custom namespace on an item, the changes did not take effect immediatelly. I had to restart openhab to apply the changes.

Miguel_M.A.D · December 10, 2022, 4:28pm

I will try to move the PR version to the marketplace tomorrow, requires a bit less of configuration, and have it more tested will be great.

Yes I agree with that, that is one of the reasons I would like to build a little ui for it. But also opening an issue on the ui asking for displaying the custom namespaces could be nice, because I don’t think is something I can fix from the addon (but maybe I’m wrong, I didn’t look in depth).

Yes that issue is fixed on the PR version (if I’m not wrong), as said will make it available through the marketplace, is more interesting for me to test that version . Anyway thanks for the reporting it.

Miguel_M.A.D · December 13, 2022, 6:43pm

I’ve published the action template interpreter into the market. (sorry for the delay)
Let me know if you find any trouble with it.
Regards!

Levin1 · December 13, 2022, 8:14pm

Thank you.
I will try this new version as soon as possible, but it can take some time.

ornostar · January 9, 2023, 1:17pm

Hi together!

@Miguel_M.A.D : First of all I like to thank you. Your work satisfies many of my needs.

… if I would get it run. So I like to ask for support from your side and give you also some feedback.

Regarding support:
I’ve installed OH 3.4 (Openhabian), got VOSK and HAB Speaker from market place. Unfortunatelly I didn’t get the voice recognition nor the interpretation working (as expected).

So, my setup is:
(1) HAB Speaker on Desktop PC Browser <-> (2) VOSK <-> (3) Interpreter <-> (4) Mary/Pico <-> (5) HAB Speaker on Desktop PC Browser

Now I start an audio stream via mouse click on HAB speaker (seem to work). I expect to have an answer on (5). Unfortunatelly (5) tells me:

either “Unknown voice command” if I say anything (didn’t had any changes by ~50 tries)
or the normal error message if I say nothing. But it’s 2-parted. First part I cannot trace back, but seem to come directly from VOSK, 2nd is the configured command in the VOSK addon (openhab → settings → vosk → error message)

=> I like to have:

execution of my command (or at least to see what has been understood)
Is fine for me. But it’s still strange to have a 2 parted error message from 2 origins.

The interpreter has been variied. Even rule based interpretation never changed my string item for voice commands. No differences in the outcome. Having a command like “Schalte das Licht in K2 an.” (Turn on light in K2) works via chat.

Do you’ve got any hint for me to solve this issue?

Regarding feedback:

Somehow I didn’t see any log from vosk (checked Log-Level DEBUG and INFO via openhab-CLI + had a look into /var/log/openhab/openhab.log.). Maybe I’m too dumb, but having this working would’ve saved me a lot of trouble.
Deeper Android integration: I like the idea of having my android devices as comfortable microphones (offline!).

Either having the tablet whole time listening: Keyword spotting when openhab is running.
Having a shortcut (one-click-solution) e.g. on status bar integration or better.

Miguel_M.A.D · January 9, 2023, 5:10pm

Hi,
Thanks for the feedback. Let’s see if we can find out what is happening.

From my part running in the console “log:set DEBUG org.openhab.voice.voskstt” correctly enables the Vosk debug logs and I can see the transcriptions in the console. Can you verify?
This is an example of my vosk log:

16:34:23.352 [DEBUG] [voice.voskstt.internal.VoskSTTService] - Result: {
  "text" : "para"
}

I’m also running 3.4.0 version.
First setup the “Built-in Interpreter” as default interpreter in the “System Services/Voice” options, ensure you have configured your language and region in “System Services/Regional Settings”, after doing so I recommend you to check you can use the interpreter through the console.

openhab> voice interpret "encender lámpara habitación" # this says "turn on bedroom lamp" in Spanish
Ok. # This is working for me.

You can also check the TTS part by setting your speaker sink as default in “System Services/Audio” and running the say command in the console.

voice say "Buenos días" # Says "Good morning"
# returns no output

If all those things are working for you I will suggests you to review that the model setup for vosk is correct, or to try a different STT solution to ensure that the problem is related with Vosk. Also as you mention you are running OpenHABian, please verify you’ve done this or that the package is already present, is it easy to miss in the README.

On Linux this binary requires the package libatomic to be installed (apt install libatomic1).

Another thing that came to my mind, have you verified that the “Built-in Interpreter” supports your language? Check this please. If not maybe you can create a pr for it .
If you don’t have language support you can check this other add-on to build a custom interpreter Action Template Interpreter+ (discontinued), but I think I’m going to abandon its development and look into ways to improve and customize the “Built-in Interpreter”, I have some ideas I would like to propose on the repo, but I don’t know when I will have time for it.

Another tip, if you just want to check the transcription is accurate, you can also create a string item and set it on “Other Services/Rule Voice Interpreter” then set the default interpreter to be the “Rule-based Interpreter” this interpreter just writes the transcription to the item.

Are you talking about HABot? I don’t have too much experience with it, but maybe if you set its interpreter as default interpreter in “System Services/Voice” it’ll work for you.

Hope some of this helps you, let me know how it goes.

About this, I have plans to try to create an android wrapper for this, but as its a web interface, accessing the audio when it’s not on foreground will be not possible. So the idea will be to turnoff the screen (I think won’t be possible) or decrease the device brightness and enable the application screen saver.

If you have any other specific feedback about the HABSpeaker or find some bug, it will be awesome it you can report it in its marketplace topic to keep track of it.

Regards!

ornostar · January 9, 2023, 7:57pm

Thank you for the soon feedback. That helped: I think I set the logging level wrongly and now I know I’m interested in the rule based interpreter. But Vosk will not send the recognized string to the interpreter - but will only response with “unknown voice command”.

Do you have any idea how to connect via, called via HAB speaker to the rule based interpreter (basically write into the configured string item & do not response with “unknown voice command”)?
Update, 2023-01-10: See #6 in HAB Speaker (dialog processing in the browser) - #31 by ornostar : Is it possible to use the HAB speaker in combi with the systems interpreter?

Regarding logging:
#1: Your hint worked. I see the recognition process. Setting the debug level generally to debug led an amount of logs. Somehow in these logs the desired weren’t available (I assume because the 1000 lines were overriden too fast.

Regarding Interpreter:
2# My commands won’t work easily with the build in interpreter since I have multiple instances of equipment in each room.
Example: “Licht” [eng. “Light”] is used in at least two items per room, for light dimming and light switch:

Switch LichtK2SwitchItem "Licht (K2)" <slider> (gLicht, gSwitch) ["K2","Licht","Point"] {channel="knx:device:4836da5d03:LichtK2SwitchChannel"}	LichtK2DimmerItem	
Dimmer LichtK2DimmerItem "Licht (K2)" <slider> (gLicht, gDimmer, gK2) ["K2","Licht","Point"] {channel="knx:device:4836da5d03:LichtK2DimmerChannel"}

#3 The usage of abbreviations (example above: K2 for Kind2 [eng. children2] is not recognized by vosk (maybe another model might help? I’ll give it a try.)
Edit: Bigger model doesn’t work. User, group and permissions are set identically between models. I’ve just renamed the folder using ‘mv’ cmd.

2023-01-09 20:30:49.577 [DEBUG] [oice.voskstt.internal.VoskSTTService] - loading model
2023-01-09 20:30:49.627 [WARN ] [oice.voskstt.internal.VoskSTTService] - IOException loading model: Failed to create a model

#4 regional settings are correctly set to germany/german.

#5 I’ve switched to the rule based interpreter (settings->voice). This lead to a change while using ‘say’ command in openhab-cli, but hab speaker still uses another interpreter (or none?!).
Edit: By “change” is meant that the string item containing the voice command was updated.

#6 Some interpreter works. I’ve checked this by using a german command via chat (HABot) - in comparison to shell I’ve got the dialogue here and it asks me whether he found the correct item (while on openhab-cli it’s plain ‘not found’)
Edit: I’ve just recognized that the interpreter in HABot is not the same than the build-in. But due to the restriction of abbreviations (letters) & similar item naming (#2&#3) I’m focusing on rule interpreter.

#7 As mentioned in #5 I’ve already changed to the rule based interpretation - but only with openhab cli. As mentioned in the very first question I’ve got trouble getting the rule based interpretation working with vosk/Hab speaker. What do you mean by “and set it on “Other Services/Rule Voice Interpreter””?

Regarding audio output:
#8 I’m remotely connect so I cannot use the locale sink; I’ve typed
openhab:voice say(“hallo”, marytts:bits3hsmm, habspeaker::79a0-95a8-b269::sink)
I did not hear any sound. Nor I saw anything in the logs. That’s currently acceptable since I don’t want to have a dialog but just give commands and have a answer (yes/no).

Regarding openhabian and vosk
#9 model works, as mentioned above. I’ll give the biggest model a chance since abbreviations/letters aren’t recognized.

#10 lib is/was installed and is up to date.

Misc
Yeah, I’ve meant HABot.

I’ll give my feedback regarding HAB speaker into the market place topic.

Miguel_M.A.D · January 14, 2023, 12:25pm

I had missed this last message, sorry I didn’t reply.

Would you be able to share how to do this? I looked at it in the past, but I was not able to do it.
I’m missing some words in my vosk model.

I didn’t know this was solved, some comments on the other thread make no sense then, sorry.

ornostar · January 14, 2023, 1:11pm

Hi!

I’ll write a conclusion about the mentioned issues because I thing the only issue was the user (me).

Firstly I didn’t saw any difference on the interpreter nor on the recognizer part. So I’ve asked for help. Having the logs (your comment #14 here) activated showed me that the recognizer worked fine all along (with few restrictions. see below). But after trying so many things I’ve missed to have matching thing-ID and HAB Speaker ID and I’ve observed still strange behaviour.

So we splitted the discussion into the HAB speaker thread (https://community.openhab.org/t/hab-speaker-dialog-processing-in-the-browser/) and this one.

IMHO: These features & HAB Speaker works good. Thank you again. But few constrains have to be fulfilled.

HAB speaker ID == thing ID
VOSK (model) does not recognize on same level as human.
- Having this one: vosk-model-de-0.21 abbreviations and numbers are not correctly transfered to text
- There is - for german - a punctuation model ( vosk-recasepunc-de-0.21) that might help on this issue. But I’ve already stopped (since the internal interpreters didn’t work as easy as I’ve expected. So I wrote my own rule based. See HAB Speaker thread #37)

Do we have a misunderstanding here?
I’ve meant that the recognizer (model;the small german one) works. Unfortunatelly I’m still having troubles with the abbreviations and also troubles using the bigger model (~4GB) - but I’ve also stopped solving this in favor of having a interpreter solving this issue (see HAB Speaker post #37 → Synonms variable).

BR

Miguel_M.A.D · January 14, 2023, 2:08pm

Yes, I misunderstood that, thank you for explain it gain. I’ll ping you or open a topic if I learn more about modifying vosk models I have also prioritized other things for now.

El_Duderino · January 21, 2023, 1:24pm

Hello,
Thanks a lot for all the work you are putting into this!
I have recently made it my challenge to control openhab via some voice assistant. At first I wanted to use an Android device (NsPanelPro) but I understand this is more or less impossible as the openhab application will run in foreground and it’s probably impossible to have another app continuously listen to the microphone. So I discarded that approach.
I then bought a Raspberry Pi Zero W 2 and also a Seeed Respeaker 2Mic Hat and managed to install pulseaudio on it so that with pulseaudiobinding on my openhab machine (Raspberry Pi 4) I can see/use one sink and one source.
I have then installed the porcupine binding as well as vosk. Now I have a feeling that even with a keyword spotter installed openhab is not continuously listening to the microphone source. But it seemed to me that with the openhab console “openahb:voice startdialog” I can at least force it to. Unfortunately when I do that I get an error message in openhab.log

2023-01-21 14:08:05.080 [WARN ] [.core.voice.internal.DialogProcessor] - Encountered error calling spot: ai.picovoice.porcupine.PorcupineInvalidArgumentException: Initialization failed.

I have trained a custom wake word on the picovoice page and put the .ppn file into the userdata/porcupine folder. I have also put the porcupine_params_de.pv file from the repository there as well as my language is set to de.

I am grateful for any help with this!! I am also open to other ways to achieve my goal to have some device continuously listen for wake words and then interpret and execute the command (without using cloud services or Google/Alexa devices).

Thanks a lot!!