Dialog Processing with the PulseAudioBinding

Hello. Yesterday I tried your system for the wake word. Nothing worked out))) Three hours and nothing happened!

I downloaded the latest release for windows of rustpotter-cli

What I did:

  1. Record train 50 files of my wake word angel
    rustpotter-cli record angel\train\[angel]XX.wav
    And train 14 files of silence
    rustpotter-cli record angel\train\[none]silenceXX.wav

  2. Record test 32 files of silence and ambient sound
    rustpotter-cli record angel\test\[angel]XX.wav
    And train 9 files of silence and ambient sound
    rustpotter-cli record angel\test\[none]silenceXX.wav

  3. Train model (epoch?!???!!)
    rustpotter-cli train --train-dir angel\train --test-dir angel\test --test-epochs 1 --epochs 10 -l 0.017 angel\angel.rpw

  4. Test wav
    rustpotter-cli.exe test angel\angel.rpw "angel\train\[angel]01.wav"

  5. Test voice
    rustpotter-cli.exe spot angel\angel.rpw

Rustpotter can’t find my wake word. No way.

What am I doing wrong?

P.S. Ctrl-c not good Idea on Windows when you record files using batch (bat or cmd) file. Ctrl-c interrupts its operation, not just the sound recording!

OpenHab:
model file in folder:
m:\servers\openhab\userdata\rustpotter\angel.rpw

trying to start rule
startDialog("rustpotterks", "voskstt", "voicerss", "voicerss:enUS_Linda", "system,rulehli", "javasound", "enhancedjavasound", "en-US", "angel", "voice_listing")

But in logs error
2023-11-21 13:12:10.895 [WARN ] [.core.voice.internal.DialogProcessor] - Encountered error calling spot: Unable to load wake word model: Semantic(None, "missing field 'name'")

Help me please!

i think you are using rustpotter V3, but OpenHAB 4.0.x. The addon started using the version 3 in the 4.1.0M2 milestone.

Oh thank you!

Now, I used version two and the build-model command. Then I made a test-model on wav files and it worked! Wow!

But version three cannot learn my word at all. Strange.

I connected the word model of the second version in openhab. There are no errors. I’ll check the work later in the evening.

You seriously studied voice in openhab.

Please tell me, I understand (I hope) that I can enable constant listening to the microphone on the server and respond to voice commands from the microphone connected to the server. I don’t know how yet (I haven’t finished it), but I’m sure I can.

But how can I do the same for a browser on a computer? It seems to me that this is not possible for a browser. I don’t see a microphone in the site settings! Can output sound, but there is no microphone.

Error in logs when enabling/disabling ‘speak’ switch using trigger in rules


rule "Voice Speak"
when
    Item voice_speak received command
then
  if(ON === voice_speak.state) {
    startDialog("rustpotterks", "voskstt", "voicerss", "voicerss:enUS_Linda", "system,rulehli", "javasound", "enhancedjavasound", "en-US", "angel", "voice_listing")
    logInfo("voice_speak", "startDialog:" + voice_speak.state.toString)
  }
  else {
    stopDialog("javasound")
    logInfo("voice_speak", "stopDialog:" + voice_speak.state.toString)
  }
end

There is a file on the disk

M:\servers\openhab\userdata\tmp\nativeutils22675010575900\librustpotter_java_win_x86_64.dll

The first time with switch = on everything is fine, but the second and subsequent times it writes an error. Perhaps the library is already loaded?

Sorry, I was outside, it was too much to answer from the phone.

Yes, you are right, I fixed that on the v3 version :confused: It only causes errors on windows which is a platform I rarely use so I didn’t discovered it until I was testing the v3. I didn’t realize it was affecting the OpenHAB add-on. I think adding a bolean to avoid registering the library multiple times should do the trick, it is what the library does now. I can open a PR agains the 4.0.x add-ons branch, with that fix, or if you want to do it, for me it will be better to avoid to setup a windows installation right now to test it, let me know.

You can use the command record with the ‘–ms’ option to auto stop the records after some time, it’s preferable when training as the records will have same length, also helps to avoid producing noises with the keyboard.

For the trained wakewords, you should provide a big amount of samples, but on my experience its performance is highly better.

The only thing I see wrong is that the “[none]” records (which is not required, you can just omit it, as long as ‘[xxx]’ not appear in the file name, is equivalent to ‘[none]’) should not be only silence samples, should be a collections of different sounds that do not match the wakeword. I’m meaning silence, noises, music, you talking without saying the wakeword, other people… anything that should not be classified as a detection and the speaker will be exposed to.

What I did, sorry if it’s too much text, was recording an initial training set of 150 records with distribution 30/70 (wakeword/none) and a testing set of 30 records distributed 70/30. I trained that model a couple of times until get a model with an accuracy of over 90% (the weight initialization is not fixed, so different executions lead to different results). After that I used that model with the spot command and the --record-path option and a threshold over 0.8, to capture a bunch of false positives, (basically I put a loud podcast in another language near the microphone and I let it for a while capturing records, those will be prefixed with the […] so I have to remove it from the files, I used a script for that), then I added the files to the training set and trained the model again, tested it in real live and it produces a lot of less false positives, I repeat that process a couple of times, and I got a pretty good functionality after that. If at some point the model stops detecting the wakeword all the times add more positive samples to balance the numbers.
I believe that using the record feature, it’s posible to get to a functional personal model in less than an hour… but I will need to test that, I’ve created mine while developing the functionality.

And I not sure about the numbers, I used learning rate 0.017 and 2000 epochs, but I think those are too much epochs, I need to built some different variants when I have a moment. Also you have different model types/sized (names copied from whisper, but the rest has nothing to do with it, rustpotter uses a simple classification network), I have just tested in real live the medium and large types.

Adding voice features to OpenHAB is my Saturday morning project since a while ago haha. I started having zero knowledge about audio, but I love using voice control and open source software, so I got into it. Also, in general this project is a masterpiece, and the reviews tend to be really meaningful, so far it has been a great experience and I have learned a lot.

By design OpenHAB can control the audio sources, so when you start a dialog, under the hood OpenHAB is starting the constant audio streaming from the source implementation to the keyword spotter implementation.

There is no source implementation in the UI at the moment.

I have a project in the marketplace called HABSpeaker which includes a Sink and Source implementation over Websocket, with automatic dialog setup, if you can give it a try a give me feedback that would be awesome. I had some problems at that project and let it quite buggy for a time due to several causes so I think that people that have tested it until now didn’t have a great experience with it :frowning: , but I think the audio implementation is a good work, and the sink is real time and support concurrent streaming, which is an advantage over the current one, so I hope one day it can be ported or integrated into the MainUI, but right now I’m giving priority to finish some addons and to some changes I want to make on the interpreter, because I’m already using it on a daily basis. If you have an Android, I suggest you to use the android apk, it didn’t require https (which is a browser security measure to capture the microphone) and is the one I use more at the moment, because I only have one speaker, when I want to send a command from another room I use the phone.

Hope I have answered all your questions, let me know if I’ve missed anything. BR.

Thanks for the detailed answer!!! (+1 000 000 → !!!)
I received answers to all the questions that no one could answer)))

Unfortunately (no, no, fortunately!) I can’t leave Windows.
Last night, while testing the operation of startDialog when called from the rules or the OH console (openhab:vocie startdialog), I discovered that after executing stopDialog (openhab:vocie stopdialog), I get a dll loading error in the logs, and then subsequent calls to startDialog (or openhab:voice registerdailog->openhab:voice unregisterdialog->openhab:voice registerdailog) will be fail until the server is restarted.

Fix the unload error? I’m a programmer, I won’t sleep if I don’t fix it)))))
Now I won’t be able to test the wake word.

Both systems don’t work (on windows 10/server 2022 maybe)

  1. Rustpotter - crash on second use, a server reboot is required
  2. Porcupine - error checking the api key, and therefore doesn’t work. I don’t know what to do with this. A test application separate from OpenHab works without problems with this api key.

By the way, I don’t understand why by calling openhab:voice registerdialog (with or without parameters) the server does not pay attention to the wake word and sends everything to the command analyzer (system or rulehli). I wanted to run openhab:vocie registerdialog from the console, wait for the wake word, and only after it analyze the transmitted as string command in the rules. But registerdialog transmits everything that comes to the microphone. It’s very strange, or again I misunderstood or did something wrong )))

I tried these commands to work with the wake word.

#openhab console
openhab:voice registerdialog --source javasound --sink enhancedjavasound  --hlis system,rulehli --tts voicerss --stt voskstt --ks rustpotterks --keyword angel --listening-item voice_listing

openhab:voice unregisterdialog javasound
#openhab console
openhab:voice startdialog --source javasound --sink enhancedjavasound --hlis system,rulehli --tts voicerss --stt voskstt --ks rustpotterks --keyword angel --listening-item voice_listing

openhab:voice stopdialog javasound     
// rules
startDialog("rustpotterks ", "voskstt", "voicerss", "voicerss:enUS_Linda", "system,rulehli", "javasound", "enhancedjavasound", "en-US", "angel", "voice_listing")

stopDialog("javasound")

Hahaha, you are welcome.

I just opened these PR with the fix, but I’m having problems with the format checks.

You can build the add-on with mvn spotless:apply clean install -pl :org.openhab.voice.rustpotterks in case you want. Or you can upgrade to OpenHAB 4.1.0M3, for me it’s working nice.

The reason you can avoid passing some parameters is because both method fallback to the defaults configured in the voice settings. The registerDialog method call the startDialog under the hood, so you should not get different results, the only difference is that the registerDialog stores your preferences and call the startDialog in case the dialog is stopped (system reboot, temporal unavailability of the sink/source…), If you register a dialog and you stopped it using the stopDialog it will be started again after some seconds.

Probably it needs an update on the library. The thing is that I’m not going to use that project anymore so I’ll ask for the removal these weekend or for someone to take it, because I don’t want to be dealing with the api key, the account… for now rustpotter is working quite good for me, so I have zero motivation on invest time on it. But maybe another people is interested on keeping it.

I never saw that behavior, the only thing that came to my mind is that the keyword spotter is working wrong and sending spot evens all the time, I suggest you to enable the debug log of the rustpotter add-on to check when it’s sending the spot events and it’s score.
The execution should always be spot → stt → interpreter → tts, and keep waiting for the next spot event, unless for the listenAndAnswer method which just does stt → interpreter → tts.

Over the weekend I will test the work in 4.1.0M3. If I achieve that everything works as it should - so that everything works, then I will wait for the release of 4.1.0, then I will implement all the logic for voice control, I will buy good hardware for the voice (micro/speakers).
Thanks for explaining how wake word works. I will try. I’ll turn on the logs. I’ll test it. On the weekend.

P.S. Unfortunately, I am far from building a Java application. I’m a C++ programmer, and it’s a mystery to me how to build a jar using the specified command))))

So far I’m somewhat disappointed, everything works very uncertainly [OH] ))) ButI don’t want to use evel voice clouds (google and etc).

If you use the IntelliJ IDE, I think it handles everything for you (java installation, makes mvn available in its integrated terminal…), in case you wanna try it some time.

I’m struggling with cpp these days writing wrappers to libraries I want to integrate, like these one whisper-jni/src/main/native/io_github_givimad_whisperjni_WhisperJNI.cpp at main · GiviMAD/whisper-jni · GitHub, if you find there something wrong pease let me know :slight_smile: .

For me it only worked using a trained rustpotter wakeword and whisper.cpp as STT, also upgrading my speaker helped a lot, because the audio quality is important. Still I don’t find it as accurate as the commercial alternatives, but I’m using the small or base whisper models (because I’m running in a “small” arm64 server) and I didn’t spend too much time training the wakeword or configuring whisper, because I’m still working on things, but for me it’s usable, I have stopped using the Alexa integration some weeks ago.

Sorry, I disappeared for a few days, I was sick.

I tested version 4.1.M3 and everything works there as expected - without bugs. I’ll wait until the 4.1.0 release comes out. Now, I think I’ll practice with the voice system on a separate installation of 4.1.X.

I didn’t mean the wake word (about disappointed), I’m talking about the voice system (OH) in general. (only applicable for 4.0.4).

Can you provide a link to the java addon so I can build it? The one sent earlier was on 4.1.M3, as I understand it, and not on 4.0.4. I will try. I’m working in Visual Studio (not VS.Code). I’ll install IntelliJ, I haven’t used it for a very long time. (as well as the Java language).

Don’t worry, I couldn’t find time to respond earlier either.

Great to know it’s working for you. Out of curiosity, what add-on combination you end up using?

Yes, there are meaningful changes in the voice control feature in 4.1.0, but it’s built upon the previous work, and the new available community tools, so difficult/impossible to have before.

Still, it’s something difficult to setup, a community shared preconfigured image of pulseaudio will be something cool to have, but it will be a tremendous effort. I think it’s better to wait for more people to get to use it and give feedback about it.

If we are talking about the fixed rustpotter version for 4.0.4, it has been merged into the 4.0.x branch, you can use that one.

I forgot to use the reply button in the previous comment.

I have also updated the last part of the main post with other recent changes.

After some distance to the voice recognition stuff i tested again a little bit with oh 4.1. The crackling sound occurs sometimes and directly after restarting openhab. When i restart it again it’s mostly gone.

The voice recognition works now.
I see e.g. Text recognized: schalte das licht im büro aus. But the buildin interpreter says that there is no object with that name. I don’t know how to tag the items the right way. Where is a description for that? But even if i use the rule-based interpreter i got the same error and my Speech-Input item isn’t updated. What am i doing wrong here?
Doing this via cli with the interpret command gave me the same result.

I need some help again please or should i create a new topic for this?

Thank you for reporting it’s working for you. I don’t know what could be the origin of the noise, for me I have to comment the “load-module module-suspend-on-idle” in /etc/pulse/system.pa because it gave some problems with a new speaker, but I don’t think that could be related. Does the crack sound occurt when you are not running the dialog?

The rule-based interpreter should work, if you are using the defaults when creating the dialog, you must restart the dialog in order for it to use the new selected interpreter, you can verify the interpreter in use using the voice dialogs command, you should check the “HLIs” section.

openhab> voice dialogs
 Source: pulseaudio:source:df54bf45b3:Jabra_Speak2_40_Echo_Cancel_Source - Sink: pulseaudio:sink:df54bf45b3:Jabra_Speak2_40_Echo_Cancel_Sink (STT: whisperstt, TTS: pipertts, HLIs: system, KS: rustpotterks, Keyword: ok casa, Dialog Group: default) Location: BedRoom

The standard interpreter is not yet documented, I’ll try to open a PR for the documentation these weekend. I can try to help you if you translate the commands to English because I have not clue of German sorry :frowning: but as a summary the standard interpreter have some build-it grammar which is loaded depending on your server language and tries to match the commands to the items using its label and/or its parent labels.
For example if you have a group labeled as “bedroom” which contains an item labeled as “light”, you can use the command “turn off the bedroom” | “turn off the bedroom light” | “turn off the light” as far as there are not more that one item compatible with the OFF commands that matches those labels.

Hope it helps, I’ll try to upload the documentation soon.

Thank you, the problem was that the dialog wasn’t updated despite restarting. after removing it manually and restarting the correct dialog was created.

1 Like

Referring to org.openhab.core.voice.internal.text.StandardInterpreterTest#allowHandleQuestionWithCustomCommand i tried to query the time but failed.
I defined that item

String				Time							{ voiceSystem="wie $cmd$ ist es?\nwie viel $cmd$ ist es?", commandDescription="" [ options="spät=spät" ] }

What the interpreter does is to set “spät” as my item value. How can i force that it tolds me the current item value?

No, sorry, the name of the test is confusing, read rules are not implemented, there are only write rules at the moment. My idea is to add a default read rule and custom read rules in the future, but still not working on it as I’m working on some changes on the audio system and the pulseaudio binding, and later I going to try to add optional dialog support to the UI, so I don’t know if I’ll manage to have it done for the next release, as it depends on the free time I have.

The current workaround will be to use an item as trigger for a rule and from there read the other item value and respond using the last dialog sink, pretty ugly sorry, in javascript you can do it like:

var { osgi } = require("openhab");
var voiceManager = osgi.getService("org.openhab.core.voice.VoiceManager");
var dialogContext = voiceManager.getLastDialogContext();
if (dialogContext) {
  audioSinkId = dialogContext.sink().getId();
} else {
  console.warning("missing dialog context");
}
actions.Voice.say("The item is " + items.getItem("item_name").state.toString(), null, audioSinkId);

If you think the feature is a must have, I can try to look at it sooner.

Hardware:
Raspberry Pi 4b 4GB

OS:
openHABian: Stable track 4.0.4 Release Build

Java Runtime Environment:
openjdk version “17.0.9” 2023-10-17
OpenJDK Runtime Environment (build 17.0.9+9-Debian-1deb11u1)
OpenJDK 64-Bit Server VM (build 17.0.9+9-Debian-1deb11u1, mixed mode, sharing)

Configuration:
Built-in Interpreter enabled
MaryTTS
Vosk
Rustpotter
rustpotter-cli 3.0.2

Hello!

Thanks for the great tutorial!

I was able to connect a USB conference call speaker to a Pi Zero W and got it all connected. Unfortunately because it’s a arm6l, rustpotter-cli 7l wouldn’t run. So started over with a Pi 3B+ I had and made my recordings and moved the files over to openhabian and I’m getting the following error and I’m at a loss.

 [WARN ] [.core.voice.internal.DialogProcessor] - Encountered error calling spot: Unable to load wake word model: Semantic(None, "missing field `enabled`")

I tried searching and couldn’t finding anything. Perhaps I should try and make my recordings and build the .rpw file using rustpotter-cli 2.x.x?

Thank you and Happy New Year!

Edit: OK, rebuilt w/V2 rustpotter-cli and it’s working! Thanks again!

1 Like

All right, thanks. I will wait for an embedded feature. It’s great what you did and i can wait. No pressure. In generell it’s nice if i could ask openhab some things and will get an answer to my questions.

1 Like

Finally added the docs for the functionality in 4.1.0: https://github.com/openhab/openhab-docs/blob/47bbb2e6b3a6a70a942153c2522ebd88f9e281dc/configuration/multimedia.md