HABSpeaker

Hmm so I found this:

if this study is accurate, it would seem Coqui is not that much better. Since i can run both now, i’ll do some informal testing. If Vosk really is that much better (or the same) i don’t see much value in continuing with Coqui. But it was a good learning opportunity in any case!

1 Like

I see the coqui project has a better documentation on how to train your own models and offer a lot of pre-trained models so maybe coqui is a better alternative for some users.
So, even if the comparison says vosk performs a little better, I see a lot of value on having both available.
Let me know how your testing goes!

Using the default large english model has had so-so results, Vosk performs noticeably better , and with less resources.

I see the coqui project has a better documentation on how to train your own models and offer a lot of pre-trained models

I’m not very up to speed with the inner working of model training, but I would imagine that STT accuracy might improve if the models were scoped to the specific vocabulary that openHAB expects. I would think this would be a common set of actions/verbs (on, off, brighten, dim, open, close, etc…), a common set of nouns (light, thermostat, door, tv, etc…), and then it would be great to register an ItemRegistry hook so the STT addon could read items tagged with voice support and add their names to this vocabulary.

Couqui has this concept of “Hotwords” which i may be misunderstanding (docs are not good) , but i think its to give a higher/lower weight to certain words when detecting, this had unexpected results when i tried , so i’m assuming I am not grasping the concept correctly.

It seems like both VOSK and Couqui have the ability to create custom models based on a limited vocabulary ?

Have you given any thought to the possibility of using custom openHAB specific models? And i wonder if those could by dynamically modified/generated with item names? Again, i’m pretty uneducated with the inner working of NLP.

1 Like

Hi Dan,

Nice to know, still I think that having more than one offline STT option is interesting.

Me too, I started looking at this like a year ago and find out that the functionality for dialog processing was already defined on the core, but not exposed, also there was no STT or KS available at that moment. It seems really interesting for me because I think that with more work and in integration with the other openHAB capabilities, in can provide a really customizable voice control system, also contributing to this project allows me to improve my java skill thanks to the maintainers reviews, so I’ve been contributing to it since then.

For vosk there is the setGrammar method that you can use to define an alternative grammar in runtime. I though it was already available in the add-on version but seems to has been added to the java library 3 weeks ago added `Recognizer.setGrammar(String)` for java (#1229) · alphacep/vosk-api@64d84b0 · GitHub.

But for my personal use case is not interesting to have to rewrite the complete grammar.
I’m using the voice recognition with a set of predefined phrases but also I’m using it to make dynamic searches, (as an example the YouTube and Spotify integration in this add-on send text to those services search engines, and an also I’m using this in my home to make searches from a rule to my jellyfin server and play media on my devices using the Jellyfin binding combined with the AndroidDebugBridge binding) so I’m not interested on rewrite the full grammar, what would be interesting for me is to expand it or to give more priority to some words/phrases. To do so for vosk you need to recompile the model using Kaldi and you need to have the source material they use to build the model which is available for some languages, but not for Spanish at that moment, so desist to try it.
I think I read that they want to allow this dynamic enhancement of the grammar in some point, which will be awesome for me.

In you are interested on look at it here are their instructions: Model adaptation for VOSK.

As a side topic I have opened this PR that aims to simplify the dialog setup in the next openHAB version, if you are interested on taking a look at it in case you can see a better way to integrate this functionality that would be nice. We can also open another topic if you are interesting on chatting more about possible improvements on the dialogs processing and the interpreters for the next version, maybe more people is interested on give feedback or contribute to it.

I was hoping this is what “hotwords” in Coqui is suppose to do, it would be nice to give a positive weight to item names and common commands and nouns to improve accuracy in the STT step. I agree having a model that only understands OH commands would be limited.

Alternatively I think i remember the speech models can return multiple results, would be awesome to have a result with hotwords matched and not (or use multiple models), so if we (the interpreter?) determine its a OH command, we use the hotwords version, if not we go with the alternative version for a more generic search/question. i’m specifically thinking about my personal use case where i have Amazon echo’s spread across my house whose primary purpose is voice control of openhab as well as playing music, although we occasional ask it for the weather or trivia like questions

As a side topic I have opened this PR

I’ll take a look this weekend, thanks for the work you have put in so far, its much appreciated!

1 Like

FYI, the only thing holding back actually submitting this as a new binding right now is the fact that building the platform native libraries required quite a bit of hacking of Coqui’s build files, so i don’t have an automated way of doing that yet, they are also around 18MB (per platform), so quite big to bundle. There is also the model and scorer data files, which together is around 200MB, so i would need a build in a way to have the bundle download these files (or have the user provide them which is not ideal)

Hi @Miguel_M.A.D

First of all I like to thank you again. You did a great job here creating useful features and giving support.

Based on our conversation here: Some new voice add-ons: PorcupineKS, GoogleSTT, WatsonSTT, VoskSTT - #16 by ornostar I’ld like to write down my feedback. Maybe they are useful for you and/or the community.

I’ve made few observations. The errors might occured by misuse of myself!
#1 If I had fun in the settings and set the “hold model in RAM” to inactive while having an instance of HAB Speaker in a browser open I receive this error on click in browser:

2023-01-09 20:34:55.008 [INFO ] [rnal.websocket.HABSpeakerWebSocketIO] - New client connected.
2023-01-09 20:34:55.012 [WARN ] [rnal.websocket.HABSpeakerWebSocketIO] - WebSocket Error:
java.lang.IllegalStateException: speaker already registered
        at org.openhab.voice.habspeaker.internal.io.HABSpeakerIOManager.onConnected(HABSpeakerIOManager.java:103) ~[bundleFile:?]
        at org.openhab.voice.habspeaker.internal.io.internal.websocket.HABSpeakerWebSocketProtocol.addHandler(HABSpeakerWebSocketProtocol.java:176) ~[bundleFile:?]
        at org.openhab.voice.habspeaker.internal.io.internal.websocket.HABSpeakerWebSocketIO.handleCommand(HABSpeakerWebSocketIO.java:95) ~[bundleFile:?]
        at org.openhab.voice.habspeaker.internal.io.internal.websocket.HABSpeakerWebSocketIO.onWebSocketText(HABSpeakerWebSocketIO.java:333) ~[bundleFile:?]
        at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextMessage(JettyListenerEventDriver.java:296) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.message.SimpleTextMessage.messageComplete(SimpleTextMessage.java:69) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.appendMessage(AbstractEventDriver.java:67) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onTextFrame(JettyListenerEventDriver.java:235) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.incomingFrame(AbstractEventDriver.java:152) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.WebSocketSession.incomingFrame(WebSocketSession.java:326) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.AbstractExtension.nextIncomingFrame(AbstractExtension.java:148) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.nextIncomingFrame(PerMessageDeflateExtension.java:111) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.forwardIncoming(CompressExtension.java:169) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.incomingFrame(PerMessageDeflateExtension.java:90) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.incomingFrame(ExtensionStack.java:202) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.Parser.notifyFrame(Parser.java:225) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.Parser.parseSingleFrame(Parser.java:259) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:459) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.onFillable(AbstractWebSocketConnection.java:440) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883) [bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034) [bundleFile:9.4.46.v20220331]
        at java.lang.Thread.run(Thread.java:829) [?:?]
2023-01-09 20:35:10.999 [INFO ] [rnal.websocket.HABSpeakerWebSocketIO] - New client connected.

observation #2 & #3: I’m not sure on what point the websocket error occured but it was basically the scenario. My guess is that the loading of the model took too long and somehow a race condition occurs. But maybe it was just my refresh of the browser.

Also in the same scenarion somehow the habspeaker source was “not available”.

2023-01-09 20:35:10.999 [INFO ] [rnal.websocket.HABSpeakerWebSocketIO] - New client connected.
2023-01-09 20:35:11.004 [WARN ] [core.audio.internal.AudioManagerImpl] - Default AudioSource service 'habspeaker::79a0-95a8-b269::source' not available!
2023-01-09 20:35:16.950 [DEBUG] [oice.voskstt.internal.VoskSTTService] - loading model
2023-01-09 20:35:21.994 [WARN ] [rnal.websocket.HABSpeakerWebSocketIO] - WebSocket Error:
org.eclipse.jetty.io.EofException: null
        at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:279) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.ssl.SslConnection.networkFlush(SslConnection.java:489) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.flush(SslConnection.java:1112) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.WriteFlusher.flush(WriteFlusher.java:422) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.WriteFlusher.write(WriteFlusher.java:277) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.io.AbstractEndPoint.write(AbstractEndPoint.java:381) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.io.FrameFlusher.flush(FrameFlusher.java:264) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.io.FrameFlusher.process(FrameFlusher.java:193) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.outgoingFrame(AbstractWebSocketConnection.java:581) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.AbstractExtension.nextOutgoingFrame(AbstractExtension.java:157) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.PerMessageDeflateExtension.nextOutgoingFrame(PerMessageDeflateExtension.java:123) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.access$700(CompressExtension.java:45) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.deflate(CompressExtension.java:466) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension$Flusher.process(CompressExtension.java:450) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.compress.CompressExtension.outgoingFrame(CompressExtension.java:234) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.ExtensionStack$Flusher.process(ExtensionStack.java:403) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.IteratingCallback.processing(IteratingCallback.java:241) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.util.IteratingCallback.iterate(IteratingCallback.java:223) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.extensions.ExtensionStack.outgoingFrame(ExtensionStack.java:280) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.WebSocketSession.outgoingFrame(WebSocketSession.java:360) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.uncheckedSendFrame(WebSocketRemoteEndpoint.java:322) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendAsyncFrame(WebSocketRemoteEndpoint.java:243) ~[bundleFile:9.4.46.v20220331]
        at org.eclipse.jetty.websocket.common.WebSocketRemoteEndpoint.sendPing(WebSocketRemoteEndpoint.java:376) ~[bundleFile:9.4.46.v20220331]
        at org.openhab.voice.habspeaker.internal.io.internal.websocket.HABSpeakerWebSocketProtocol.pingHandlers(HABSpeakerWebSocketProtocol.java:117) ~[bundleFile:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) [?:?]
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) [?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
        at java.lang.Thread.run(Thread.java:829) [?:?]
Caused by: java.io.IOException: Connection reset by peer
        at sun.nio.ch.FileDispatcherImpl.writev0(Native Method) ~[?:?]
        at sun.nio.ch.SocketDispatcher.writev(SocketDispatcher.java:51) ~[?:?]
        at sun.nio.ch.IOUtil.write(IOUtil.java:182) ~[?:?]
        at sun.nio.ch.IOUtil.write(IOUtil.java:130) ~[?:?]
        at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:493) ~[?:?]
        at java.nio.channels.SocketChannel.write(SocketChannel.java:507) ~[?:?]
        at org.eclipse.jetty.io.ChannelEndPoint.flush(ChannelEndPoint.java:273) ~[bundleFile:9.4.46.v20220331]
        ... 33 more

#4: I still don’t know why I need to have a separate keyword in HAB speaker (I’ve got one already in settings->voice)

#5: Is it somehow possible to start keyword listening without physical interaction?

EDIT, 2023-01-10:
#6: Be aware, I’m a noob!
I guess I’ve found my issue here (expecting HAB Speaker uses the systems interpreter). I’ve expected that HAB Speaker uses the common interpreter ( HumanLanguageInterpreter.java). By investigating the code I interpret that HAB Speaker uses its own implementation of an interpreter interface (HumanLanguageInterpreter.java).
So, therefore I don’t see any way without code change to have any other interpreter than the hard coded.

Are those files specific to a language? Is so, I won’t bundle those. Maybe a download utility managed from the service configuration (something like adding a new text config with a list of recommended models as options that the add-on with download and setup in the userforlder/) that way the manual configuration is still possible if needed. Implementing something like that for the Vosk add-on will be a good idea.

Yes, you can open a PR with your work in Coqui. Sound like something nice to have for them. This way maybe they end up distributing the java library through maven.

This should only happen if there is already a connected speaker using the same id, which is not allowed. But if there is an error while unregistering the speaker this could happen, I will try to improve it and remove the excessive log.

For the 1,2,and 3 points, I think the problem is that the jvm is running out of memory or near to it when loading the vosk model.
You can login into the openhab console and run: shell:info.
For me it shows the following under memory:

Memory
  Current heap size           186,142 kbytes
  Maximum heap size           2,000,896 kbytes

Memory requirements will change depending on the model you load. I just tried the small ones.

I can not access that configuration from the add-on (at least I don’t know how), it’s not exposed. I know it feels weird, but as it will require changes on the core I’ll keep it like this for now.

Not in the web version. It’s a browser security rule to require user interaction to start audio capture. The electron version of the application does not require this.

Don’t worry. No, there is not hardcoded interpreter. The dialog accepts internally a list of interpreters, it try to interpret the text in order, if it returns a InterpretationException, it goes to the next. So what I’m doing there is to prepend an interpreter that allows the voice commands defined in “Other Services/Hab Speaker” to work. So that shouldn’t a problem if you have nothing configured there, unless there is a bug I’m missing. For me it seems to be working, I’m using the Action Template Interpreter fine.

            HumanLanguageInterpreter hli = !speakerConfig.hli.isBlank() ? voiceManager.getHLI(speakerConfig.hli)
                    : voiceManager.getHLI();
            if (hli != null) {
                hlis = List.of(speakerLanguageInterpreter, hli);
            }

Hi again!

Regarding memory I cannot tell. Somehow - independend from the model - the heap size is changing all the time from 170m to 260m and restarting at 170.

Memory
  Current heap size           190,695 kbytes
  Maximum heap size           316,800 kbytes

Even with

log:set TRACE org.openhab.voice.voskstt
log:tail

nor with

tail -f /var/log/openhab/openhab.log

there was any output from this namespace.
I did not changed the heap size (/etc/default/openhab) since this requires a restart and so far I do not aim for “improving” the recognition, but the interpreter. Additionally this would clear my item state history.

If you are interested in any logs, I would increase it and deliver required logs.

The explanation with the keyword is plausible. So, it would be helpful to change the core on next opportunity. Do you know how to request such a change + the change from your side?

So, I’ll experiment with the electron version as soon as I’ve got my desired behaviour working.

What makes me still wondering is that I always receive Interpretation exception: Unknown voice command from the interpreter.

  • I’ve got 0 phrases (in settings → HAB Speaker) defined
  • I’ve tried all interpreter in settings → Voice
  • I’ve tried all (including none) in my HAB Speaker thing

Having the rule based interpreter I except at least having a change in the voice command string item (holding the recognized string). But it never does via HAB Speaker - openhab-cli works to alter the string (and trigger the rule ofc).
So, besides that I think that the phrase-based interpreter is basically a rule based, I still see something works strange. I see your point regarding the list, but as long as I receive always the same output (“unknown voice command”) it’s - at least for me - not working as expected.
Maybe

line 184: if (thingHandler != null)

the thingHandler IS null?!

That would lead to a call of

line 207: voiceManager.startDialog(hsKS, stt, tts, voice, hlis, source, sink, null, null, listeningItem);

having hlis only

line 183: List<HumanLanguageInterpreter> hlis = List.of(speakerLanguageInterpreter);

Log:

2:06:15.082 [DEBUG] [websocket.HABSpeakerWebSocketProtocol] - Pinging 1 clients...
22:06:15.187 [DEBUG] [ernal.websocket.HABSpeakerWebSocketIO] - Handling command ON_SPOT
22:06:15.190 [DEBUG] [b.core.voice.internal.DialogProcessor] - KSpottedEvent event received
22:06:15.308 [DEBUG] [l.audio.internal.ConvertedInputStream] - Duration of input stream : 1300ms
22:06:15.310 [DEBUG] [l.audio.internal.ConvertedInputStream] - Sound is not in the target format. Trying to reencode it
22:06:15.333 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - Sleep time to let the system play sound : 1282ms
22:06:15.825 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Doorbell_Image' changed from raw type (image/jpeg): 118697 bytes to raw type (image/jpeg): 118946 bytes
22:06:16.617 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - ConvertedAudioStream 79a0-95a8-b263 closed
22:06:16.624 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - AudioStream 79a0-95a8-b263 closed
22:06:16.629 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - OutputStream 79a0-95a8-b263 closed
22:06:16.635 [DEBUG] [ernal.websocket.HABSpeakerWebSocketIO] - Send start listening 79a0-95a8-b263
22:06:16.738 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:16.818 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:16.907 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:16.926 [DEBUG] [b.core.voice.internal.DialogProcessor] - RecognitionStartEvent event received
22:06:16.931 [INFO ] [openhab.event.ItemCommandEvent       ] - Item 'VoiceRecognitionActive' received command ON
22:06:16.934 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'VoiceRecognitionActive' changed from OFF to ON
22:06:16.997 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.077 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.168 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.247 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.337 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.418 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.508 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.776 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.778 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.781 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.847 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.927 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:17.928 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Doorbell_Image' changed from raw type (image/jpeg): 118946 bytes to raw type (image/jpeg): 118973 bytes
22:06:18.018 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.097 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.187 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.277 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.358 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.451 [WARN ] [files.JavaScriptTransformationProfile] - Could not transform state '2' with function 'percentageCalculator.js' and format '%s'
22:06:18.456 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.528 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.630 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.697 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.788 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.869 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:18.957 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.037 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.369 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.374 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.378 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.382 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.468 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.558 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.943 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.947 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.950 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.953 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:19.981 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.032 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Doorbell_Image' changed from raw type (image/jpeg): 118973 bytes to raw type (image/jpeg): 118746 bytes
22:06:20.068 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.413 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.416 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.419 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.421 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.487 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.577 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.706 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.747 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.838 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:20.917 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.007 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.088 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.305 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.308 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.347 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.427 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.517 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.598 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.687 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.768 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.857 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:21.938 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:22.028 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:22.118 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:22.182 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Doorbell_Image' changed from raw type (image/jpeg): 118746 bytes to raw type (image/jpeg): 118728 bytes
22:06:22.190 [DEBUG] [b.core.voice.internal.DialogProcessor] - RecognitionStopEvent event received
22:06:22.193 [DEBUG] [b.core.voice.internal.DialogProcessor] - SpeechRecognitionEvent event received
22:06:22.195 [DEBUG] [b.core.voice.internal.DialogProcessor] - Text recognized: fest dass es ein test
22:06:22.198 [DEBUG] [b.core.voice.internal.DialogProcessor] - Interpretation exception: Unknown voice command
22:06:22.200 [INFO ] [openhab.event.ItemCommandEvent       ] - Item 'VoiceRecognitionActive' received command OFF
22:06:22.201 [INFO ] [marytts.R 1                          ] - New request (input type "TEXT", output type "AUDIO", voice "bits1-hsmm", audio "WAVE")
22:06:22.204 [INFO ] [marytts.R 1                          ] - Handling request using the following modules:
22:06:22.206 [INFO ] [marytts.R 1                          ] - - TextToMaryXML (marytts.modules.TextToMaryXML)
22:06:22.208 [INFO ] [marytts.R 1                          ] - Next module: TextToMaryXML
22:06:22.211 [INFO ] [marytts.R 1                          ] - Handling request using the following modules:
22:06:22.212 [INFO ] [marytts.R 1                          ] - - JTokeniser (marytts.language.de.JTokeniser)
22:06:22.214 [INFO ] [marytts.R 1                          ] - - Preprocess (marytts.language.de.Preprocess)
22:06:22.215 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'VoiceRecognitionActive' changed from ON to OFF
22:06:22.215 [INFO ] [marytts.R 1                          ] - - OpenNLPPosTagger (marytts.modules.OpenNLPPosTagger)
22:06:22.218 [INFO ] [marytts.R 1                          ] - - JPhonemiser_de (marytts.language.de.JPhonemiser)
22:06:22.220 [INFO ] [marytts.R 1                          ] - - Prosody (marytts.language.de.Prosody)
22:06:22.221 [INFO ] [marytts.R 1                          ] - - PronunciationModel (marytts.language.de.Postlex)
22:06:22.223 [INFO ] [marytts.R 1                          ] - - AcousticModeller (marytts.modules.AcousticModeller)
22:06:22.224 [INFO ] [marytts.R 1                          ] - - Synthesis (marytts.modules.Synthesis)
22:06:22.225 [INFO ] [marytts.R 1                          ] - Next module: JTokeniser
22:06:22.227 [INFO ] [marytts.R 1                          ] - Next module: Preprocess
22:06:22.229 [INFO ] [marytts.Preprocess                   ] - Expanding say-as elements...
22:06:22.230 [INFO ] [marytts.Preprocess                   ] - Matching and expanding patterns...
22:06:22.232 [INFO ] [marytts.Preprocess                   ] - Done.
22:06:22.234 [INFO ] [marytts.R 1                          ] - Next module: OpenNLPPosTagger
22:06:22.236 [INFO ] [marytts.R 1                          ] - Next module: JPhonemiser_de
22:06:22.242 [INFO ] [marytts.R 1                          ] - Next module: Prosody
22:06:22.245 [INFO ] [marytts.R 1                          ] - Next module: PronunciationModel
22:06:22.248 [INFO ] [marytts.R 1                          ] - Next module: AcousticModeller
22:06:22.261 [INFO ] [marytts.ParameterGeneration          ] - Parameter generation for LF0:
22:06:22.262 [INFO ] [marytts.PStream                      ] - Global variance optimization
22:06:22.264 [INFO ] [marytts.PStream                      ] -    optimization stopped by reaching max number of iterations (no global variance applied)
22:06:22.266 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (0)  number of iterations=1
22:06:22.268 [INFO ] [marytts.R 1                          ] - Next module: Synthesis
22:06:22.270 [INFO ] [marytts.HMMSynthesizer               ] - Synthesizing one sentence.
22:06:22.272 [INFO ] [marytts.HTSEngine                    ] - Using prosody from acoustparams.
22:06:22.275 [INFO ] [marytts.HTSEngine                    ] - Number of models in sentence numModel=19  Total number of states numState=95
22:06:22.276 [INFO ] [marytts.HTSEngine                    ] - Total number of frames=392  Number of voiced frames=181
22:06:22.281 [INFO ] [marytts.ParameterGeneration          ] - Parameter generation for MGC:
22:06:22.282 [INFO ] [marytts.PStream                      ] - Global variance optimization
22:06:22.284 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (0)  number of iterations=2
22:06:22.288 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (1)  number of iterations=27
22:06:22.291 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (2)  number of iterations=17
22:06:22.295 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (3)  number of iterations=22
22:06:22.298 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (4)  number of iterations=20
22:06:22.301 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (5)  number of iterations=16
22:06:22.305 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (6)  number of iterations=23
22:06:22.309 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (7)  number of iterations=22
22:06:22.311 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (8)  number of iterations=9
22:06:22.314 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (9)  number of iterations=14
22:06:22.318 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (10)  number of iterations=16
22:06:22.320 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (11)  number of iterations=8
22:06:22.323 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (12)  number of iterations=10
22:06:22.326 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (13)  number of iterations=4
22:06:22.329 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (14)  number of iterations=9
22:06:22.331 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (15)  number of iterations=3
22:06:22.333 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (16)  number of iterations=2
22:06:22.335 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (17)  number of iterations=2
22:06:22.339 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (18)  number of iterations=16
22:06:22.341 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (19)  number of iterations=7
22:06:22.344 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (20)  number of iterations=11
22:06:22.347 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (21)  number of iterations=3
22:06:22.349 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (22)  number of iterations=3
22:06:22.351 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (23)  number of iterations=8
22:06:22.355 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (24)  number of iterations=30
22:06:22.357 [INFO ] [marytts.ParameterGeneration          ] - Using f0 from maryXML acoustparams
22:06:22.360 [INFO ] [marytts.PStream                      ] - Global variance optimization
22:06:22.366 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (0)  number of iterations=78
22:06:22.368 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (1)  number of iterations=3
22:06:22.371 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (2)  number of iterations=5
22:06:22.373 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (3)  number of iterations=3
22:06:22.377 [INFO ] [marytts.PStream                      ] - Gradient GV optimization for feature: (4)  number of iterations=37
22:06:22.379 [INFO ] [marytts.R 1                          ] - Request processed in 176 ms.
22:06:22.381 [INFO ] [marytts.R 1                          ] -    TextToMaryXML took 2 ms
22:06:22.382 [INFO ] [marytts.R 1                          ] -    JTokeniser took 2 ms
22:06:22.384 [INFO ] [marytts.R 1                          ] -    Preprocess took 7 ms
22:06:22.385 [INFO ] [marytts.R 1                          ] -    OpenNLPPosTagger took 2 ms
22:06:22.387 [INFO ] [marytts.R 1                          ] -    JPhonemiser_de took 6 ms
22:06:22.388 [INFO ] [marytts.R 1                          ] -    Prosody took 3 ms
22:06:22.390 [INFO ] [marytts.R 1                          ] -    PronunciationModel took 3 ms
22:06:22.391 [INFO ] [marytts.R 1                          ] -    AcousticModeller took 20 ms
22:06:22.392 [INFO ] [marytts.R 1                          ] -    Synthesis took 111 ms
22:06:22.637 [DEBUG] [l.audio.internal.ConvertedInputStream] - Duration of input stream : 1961ms
22:06:22.661 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - Sleep time to let the system play sound : 1939ms
22:06:23.126 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:24.305 [INFO ] [openhab.event.ItemStateChangedEvent  ] - Item 'Doorbell_Image' changed from raw type (image/jpeg): 118728 bytes to raw type (image/jpeg): 118637 bytes
22:06:24.602 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - ConvertedAudioStream 79a0-95a8-b263 closed
22:06:24.605 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - AudioStream 79a0-95a8-b263 closed
22:06:24.608 [DEBUG] [er.internal.audio.HABSpeakerAudioSink] - OutputStream 79a0-95a8-b263 closed
22:06:24.628 [DEBUG] [ernal.websocket.HABSpeakerWebSocketIO] - Send stop listening 79a0-95a8-b263
22:06:25.131 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.136 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.142 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.147 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.152 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.157 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.162 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.166 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.171 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.175 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.180 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.184 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.188 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.193 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.198 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.202 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.206 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.211 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.215 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.219 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.224 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.228 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.232 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.235 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.237 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.239 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.242 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730
22:06:25.244 [TRACE] [ernal.websocket.HABSpeakerWebSocketIO] - Received binary data of length 2730

Another finding is that the HAB Speaker binding is shown as offline in my list of things:

Have you tried to enable the org.openhab.core.voice debug logs? Maybe you can see more info there.

I don’t think Vosk can load the model with that maximum heap size. I recommend you to try to verify the add-on works with a cloud SST engine like WatsonSTT or GoogleSTT if you don’t want to change the heap size for now. Also, the next time you reboot you can check the logs in case Vosk is logging some initialization error, I’m curious about what is the problem with it.

The speaker should be online whenever there is a connection using that id. So if your speaker change it’s id (it’s on the local storage, so it will change if you change your domain or delete this storage). Check the speaker you are using is still using the same id.
I recommend you to use something that make sense to you instead of the autogenerated id.I use something like givipc, givilaptop… and I assign the same for the electron app and the web client by device, this way the speaker is detected online whenever I’m using the app or the web.

I think you are right on this one. This logic seems to be wrong for the case of a speaker that is not registered in openHAB as a thing (this is the case when thingHandler == null). I’ll look at it. Really appreciated!

Maybe there is another way to load the configuration from another component by its id, I’ll start by asking about this in the forum.

I have just added a new version with a fix for the interpreters configuration.

After thinking on it again, maybe Vosk is actually working if the interpreter is called, I though that the model was hold on the java heap but maybe is not the case. Let me know if it solves something for you.

I have also added a debug log that prints the text to interpret in the speaker interpreter in case it helps.

Regards!

Hi Miguel,

I appreciate your help. Thanks a lot.

I’ll sumarize what I’ve understood/figured out until now is (at least for the community):

  1. I’ve increased the heap size. The current heap size still varies, but will not reach maximum (by far). See examples below.
  2. VOSK worked correctly without changes besides the manual + model download (Actually this item referes to Some new voice add-ons: PorcupineKS, GoogleSTT, WatsonSTT, VoskSTT).
  3. Appearently the ID on thing creation is crucial for having the system interpreter working (I have not checked your fix yet!). Otherwise only the Addon-internal interpreter run.
    • HAB Speaker works with the “offline” thing -including STT and TTS. But as said above with the interpreter.

… and what is still open for me to check:

  1. Have the electron app running on android. This will take some take for me to check.
    • including keyword listening (as I understood I need to use the one in HAB speaker. Could be advantageous if multiple microphones are in range).
  2. Having a rule interpreter implemented by myself. I’ll share my working - but still in progress - code here. Just to give you an impression what I’m implementing on my side.
    • the interpretion is actually 2 parted:
      • an “special command” part (like scenes or whatever)
      • an “regular command” part (I have created all items semi automatically. So I can refer to the name). But my goal is on long term to access via the interpreted location the semantic model and the via interpreted location the equipment (I hope that is the correct term for the object in the semantic model. I’ll read it up later)
  3. I’ve got kinda headache because I like to give audio output (TTS) to the triggering HAB Speaker. Right now I do not see any way. Maybe some hack with an item “receiving microfone” will do it. But I’m far from realistic approach.

Regarding heap size.

Memory
  Current heap size           334,576 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           322,660 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           317,610 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           257,288 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           277,065 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           349,327 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           314,163 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           346,130 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           272,470 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes
~few sec. later
Memory
  Current heap size           323,345 kbytes
  Maximum heap size           1,013,632 kbytes
  Committed heap size         388,220 kbytes

regarding interpretion rule:

var logger = Java.type("org.slf4j.LoggerFactory").getLogger("myScript");
var location = "";
var value = "";
var property = "";
//var command = items["Sprachbefehl"].toString().toLowerCase();
//var command = "Musik an";
var command = event.itemState.toString().toLowerCase();
var command = command.toLowerCase();
logger.info("Sprachbefehl ausführen-Regel wird aufgerufen");
//logger.info("Sprachbefehl: event ist " + event);
var lightKeyWords= ["Licht", "Light"];
var synonymsLocation = JSON.parse('[' +
  '{"meaning": "K2", "words":["Kind2", "K2", "Südost"]},' +
  '{"meaning": "Wohnen", "words":["Wohn", "Wohnen"]},' +
  '{"meaning": "K1", "words":["Kind1", "K1", "Südwest"]}' +
']');
var synonymsType = JSON.parse('[' +
  '{"meaning": "Licht", "words":[]}' +
']');
var synonymsProperty = JSON.parse('[' +
  '{"meaning": "An", "words":["On", "eins", "1", "Hell"]}' +
']');

var commandAsJSON = { 
  meaning : "",  
  word : [] 
};

var synonyms;
synonyms = synonymsLocation;
synonyms.concat(synonymsType);
synonyms.concat(synonymsProperty);

if (!specialCommandExecuted(command)){
  evaluateCommand(synonyms, command);
}

function evaluateCommand(synonyms, command){
  logger.info("JSON.stringify(synonyms): " + JSON.stringify(synonyms));

  var newCommand = updateCommandWithSynonymMeaning(synonyms, command);
  var newCommandAsJSON = convertCommandToJSON(newCommand);
  logger.info("newCommand: " + newCommand);
  location = findLocation(command);
  property = findProperty(command);
  value = findValue(command);
}


function findLocation(cmd){
  var retVal = "";
  for (var k = 0, len = cmd.length; k < len; k++){
    logger.info("cmd: " + JSON.stringify(cmd));
    
  }
  return retVal;
};

function findProperty(cmd){
  var retVal = "";
  return retVal;
};

function findValue(cmd){
  var retVal = "";
  return retVal;
};

function getSynonym(synonym, word){
  var retVal = "";
  return retVal;
}

function updateCommandWithSynonymMeaning(synonymlist, command){
  var retVal = command;
  for (var i = 0, len = synonymlist.length; i < len; i++) {
    for (var j = 0, l = synonymlist[i].words.length; j < l; j++){
      // ToDo: Check if a part of a word is replaced or a standalone word.
      if (retVal.search(synonymlist[i].words[j]) > -1){
        //logger.info("synonymlist[i].words[j] ist "+ synonymlist[i].words[j] +" ersetzen mit: " + synonymlist[i].meaning);
        retVal = retVal.replace(synonymlist[i].words[j], synonymlist[i].meaning);
      }
    }
  }
  return retVal;
};

function specialCommandExecuted(command){
  logger.info("specialCommandExecuted wird ausgeführt mit: " + command);
  logger.info("command.search(Musik, command) : " + command.search("Musik", command) );
  logger.info("command.search(An, command) : " + command.search("MusAnik", command) );
  if (command.search("musik", command) != -1 && command.search("an", command) != -1) {
    switchMusic("ON");
    return true;
  } else   if (command.search("musik", command) != -1 && command.search("aus", command) != -1) 
  {
    switchMusic("OFF");
    return true;    
  }
  
  return false;
}

function switchMusic(state){
    events.sendCommand(ir.getItem("SqueezePlayer1Playpause"), state);
    /*events.sendCommand(ir.getItem("SqueezePlayer2Playpause"), state);
    events.sendCommand(ir.getItem("SqueezePlayer3Playpause"), state);
    events.sendCommand(ir.getItem("SqueezePlayer4Playpause"), state);*/
}

function convertCommandToJSON(command){
  var retVal = JSON.parse('[]');
  //logger.info("retVal.length: " + retVal.length);
    retVal = {"type": "location", "word": findLocation(command)};
  /*var tmp = command.split(' ');
  for (var k = 0, len = tmp.length; k < len; k++){
    logger.info("retValtmp: " + JSON.stringify(retValtmp));
    retVal.push(retValtmp);
  }
  logger.info("JSON.stringify(retVal): " + JSON.stringify(retVal));*/
  return retVal;
}

1 Like

Thank you for summarizing everything!

Hope this doesn’t apply anymore after the fix.
Either way I will add another section on the readme to explain that openHAB recognize the speaker based on its id, and does not allow multiples connections under the same id, I think it’s not explained anywhere.

Electron just target desktop environment (windows/macOS/linux).
I will try to made an android/ios wrapper soon using capacitor, probably I’ll give it a try this week, but I’m not sure how long it could take.

Intelligent solution. Thank you for sharing it. Another similar workaround could be after applying ‘updateCommandWithSynonymMeaning’ passing result to an interpreter and say its response.

After sawing this, I think that having a synonym replace solution like this build-it into the dialog processing flow could be a great addition, already had though on it for supporting parsing numbers, but now I think that something just like this would do the trick and will be easy to add to the voice configuration in a “key=value\n” format. WDYT?

Hi!

I’ve just read this:

That might be a solution and quite helpful (if the interpreter delivers good results; as we know this depends from speaker to speaker, from openhab item config to openhab item config etc.).

I’ve read at Multimedia | openHAB and tried it for myself. Unfortunatelly in the docs only DSL rules are mentioned and I receive an error (interpret is not known):

2023-01-14 23:18:30.445 [ERROR] [internal.handler.ScriptActionHandler] - Script execution of rule with UID 'cb71c69296' failed: ReferenceError: "interpret" is not defined in <eval> at line number 68

So, at least for today I’ll stop my activities on this one.

I’ve got the script below working (basically the one above updated). So both the specialCommandExecuted and also executeCommandBySemanticModel. Not completely but working.
There I’m convinced that a mapping table as you suggested might be a good addition. Important is, that the keys are identical to what is in the semantic model.

Edit, 2023-01-15:

=> That is something I that would be really helpful for me including the keyword spotter. I’m looking forward to have feedback on this one.

Here, enjoy my script (@all):
(Edit, 2023-01-15: The script is not finished yet. I’ll update significant changes after having the working HAB speaker android porting without interaction)

/*
/*
Written by Ornostar, 2023-01-15. V0.9.0
This script will take the a text command from an item and will interpret it and create a result.

There are currently some important things (Todos! & nice to know):
1. you need to program all complicated or non-semantic/slang commands by yourself. See function specialCommandExecuted(.)
2. the evaluation of the semantic model is currently not complete.
  a) see evaluateCommand(.): The type will not be evaluated. If you got 2 different items with different types, but same tags (like light dimming + temperature) you'll always get one error.
3. You need to configure the synonyms
4. Of course, you are responsible to deliver the string to interpret. For example look here:
  a) see https://community.openhab.org/t/some-new-voice-add-ons-porcupineks-googlestt-watsonstt-voskstt/133500
  b) see https://community.openhab.org/t/hab-speaker-dialog-processing-in-the-browser/140655
  c) create item for voice command and configuration of using this (here + interpreter settings in openhab->settings)
5. You need to configure the items (currently correct tagging with the synonymsType defined below)
6. relative changes are not implemented (higher, lower, brighter, darker, louder... )
7. Some interpretation issues I have (see above) might be solved by using the build in interpreter.
  Thanks @ Miguel for proposing this solution. I didn't tried so far, but could work. See https://www.openhab.org/docs/configuration/multimedia.html. 
*/

//import org.openhab.model.script.actions.*;
  
var logger = Java.type("org.slf4j.LoggerFactory").getLogger("myScript");
var Voice = Java.type("org.openhab.core.model.script.actions.Voice");

/*
input parameter
*/
//var command = items["Sprachbefehl"].toString().toLowerCase();
var command = ("Mach viel Licht k2 aus").toLowerCase();

//var command = event.itemState.toString().toLowerCase();



logger.info("Voice command rule is executed with command: " + command);
//logger.info("Sprachbefehl: event ist " + event);

/*
Configuration parameter. Synonyms have the "meaning" which is the term known in openhab  and an array of words that are meant as the meaning (this script will replace these words by the meaning).
important: recognized type (synonmsType) is used as "tag" in items
get your tts and sink by looking via API (developer tools -> API Explorer -> get on those ressources) or via openhab-cli (get sinks, voices)
*/
var volume = new PercentType(80);
var sink = "habspeaker::a4140789f5::sink"; // that is hard coded from the thing that is used byself. This won't work on your side.
var tts = "marytts:bits1hsmm"; // define this

// define these
var knownMisunderstandings = JSON.parse('[' +
  '{"meaning": "Licht in K2 an", "words":["Licht im katz feier", "Licht in katz feier", "Licht in kanns feier", "Licht in ganzz bayern", "Licht sind katz feier"]}' +
']');
var synonymsLocation = JSON.parse('[' +
  '{"meaning": "K2", "words":["kind2", "k2", "südost", "kampf zweier", "kanns feiern", "südosten", "süd ost", "ca zwei", "katz feier"]},' +
  '{"meaning": "Wohnen", "words":["wohn", "wohnen"]},' +
  '{"meaning": "K1", "words":["kind1", "K1", "südwest"]},' +
  '{"meaning": "Bad", "words":["bad", "badezimmer"]}' +
']');
var synonymsType = JSON.parse('[' +
  '{"meaning": "Light", "words":["licht", "nicht"]},' +
  '{"meaning": "Rollershutter", "words":["Rolläden", "Beschattung", "Rollos"]}' +
']');
var synonymsValue = JSON.parse('[' +
  '{"meaning": "ON", "words":["an", "eins", "1", "hell", "ein"]},' +
  '{"meaning": "OFF", "words":["aus", "0", "1", "dunkel"]},' +
  '{"meaning": "10", "words":["etwas", "Bisschen"]},' +
  '{"meaning": "80", "words":["viel"]}' +
']');

/*
a small initialization
*/
var synonyms;
var additionalLocationTagsToExtendSynonymsBySematicModel = ["Location", 
"Location_Indoor",
"Location_Indoor_Room_Entry" ,
"Location_Indoor_Room_DiningRoom" ,
"Location_Indoor_Room" ,
"Location_Indoor_Room_Office" ,
"Location_Indoor_Room_BoilerRoom" ,
"Location_Indoor_Room_Kitchen" ,
"Location_Indoor_Room_Bathroom" ,
"Location_Indoor_Room_LivingRoom" ,
"Location_Indoor_Corridor" ,
"Location_Indoor_Room_Bedroom" ,
"Location_Outdoor_Carport" ,
"Location_Outdoor_Garden" ,
"Location_Indoor" ,
"Location_Indoor_Floor_GroundFloor" ,
"Location_Indoor_Floor_SecondFloor" ,
"Location_Indoor_Building"];
synonymsLocation = extendConfiguredSynonymsBySemanticModel(additionalLocationTagsToExtendSynonymsBySematicModel, synonymsLocation);


synonyms = knownMisunderstandings;
synonyms = synonyms.concat(synonymsLocation);
synonyms = synonyms.concat(synonymsType);
synonyms = synonyms.concat(synonymsValue);

/*
begin.
*/

// preparation of command; substitute words with meaning of words
command = updateCommandWithSynonymMeaning(synonyms, command);
//interpret(command, "system", null); // system is Build-In Interpreter. See <Protocol:your OpenhabIP:PORT>/developer/api-explorer
// try to find some fixed special commands
// if not successful go into complete analysis (what is location, what is type, what is value, what is other; last is not used). and trigger are command (not sure: either via semantic model or via known naming schema of items)
if (!specialCommandExecuted(command)){
  evaluateCommand(command);
}

function evaluateCommand(command){
  //logger.info("command: " + command);
  var newCommandAsJSON = convertCommandToJSON(command);
  /*if (newCommandAsJSON.locations.length >1 || newCommandAsJSON.type.length > 1){
   //ToDo run through reasonable location/type combinations and call executeCommandBySemanticModel(.)
   // 1. align first location and first type.
   // afterwords aling hte latter (if location is after type, align the next type to the same location / if next is type, align it to the next type ???)
     }*/
  executeCommandBySemanticModel(newCommandAsJSON);
  //logger.info("newCommandAsJSON: " + JSON.stringify(newCommandAsJSON));
};

function executeCommandBySemanticModel(cmdAsJSON){
  var allItemsByTag;
  // ToDo: Type korrekt identifiziern
  // Problem: Der Value ist nicht eindeutig: "10" kann Dimmer und Number sein (Beispiel Lichttemperatur, Lichthelligkeit) oder Rollershutter.
  // Über Kombination von Licht und Value kann man auch nicht identifzieren, weil Licht auch beides sein kann.
  var type = "";
  if (isNaN(cmdAsJSON.value)){
      type == "Number"
  }

  if (type == ""){
    allItemsByTag = itemRegistry.getItemsByTag([cmdAsJSON.locations, cmdAsJSON.type]);
  } else {
    allItemsByTag = itemRegistry.getItemsByTagAndType(type, [cmdAsJSON.locations, cmdAsJSON.type]);
  }
  var message = [];

  for (var i in allItemsByTag) {
    var state = allItemsByTag[i].getState();
    var name = allItemsByTag[i].getName();
    logger.info("name: " + name + " und state: " + state);
    try {
      events.sendCommand(allItemsByTag[i].getName(), cmdAsJSON.value);
      
      if(message.indexOf([cmdAsJSON.type, cmdAsJSON.locations, cmdAsJSON.value].join("|")) == -1) {
        message.push([cmdAsJSON.type, cmdAsJSON.locations, cmdAsJSON.value].join("|"));
      }
      logger.info("Item-Kommando wurde versendet allItemsByTag[i]: " + allItemsByTag[i].getName() + " und cmdAsJSON.value: " + cmdAsJSON.value);
    } catch (e) {
      logger.info("Item-Kommando wurde versendet allItemsByTag[i]: " + allItemsByTag[i].getName() + " und cmdAsJSON.value: " + cmdAsJSON.value);
    }
  }
  for (var i in message){
    //logger.info("message[i]: " + message[i]);
    sayVoiceRespones(message[i].split("|")[0], message[i].split("|")[1], message[i].split("|")[2]);
  }
};


function extendConfiguredSynonymsBySemanticModel(itemTagsForItemSearch, synonyms){
  var retVal = synonyms;
  for (var j in itemTagsForItemSearch){
    allItemsByTag = itemRegistry.getItemsByTag(itemTagsForItemSearch[j]);
    for (var i in allItemsByTag) {
      /*logger.info("allItemsByTag[i].getLabel: " + allItemsByTag[i].getLabel());
      logger.info("allItemsByTag[i].getName: " + allItemsByTag[i].getName());
      logger.info("allItemsByTag[i].getGroupNames: " + allItemsByTag[i].getGroupNames());
      logger.info("allItemsByTag[i].getState: " + allItemsByTag[i].getState());
      logger.info("allItemsByTag[i].getTags: " + allItemsByTag[i].getTags());
      logger.info("allItemsByTag[i].getType: " + allItemsByTag[i].getType());*/
      var alreadyAvailable = findIndexInSynonymsByMeaning(allItemsByTag[i].getLabel(), synonyms)
      //logger.info("searched for: " + allItemsByTag[i].getLabel() + " and found index is: " + alreadyAvailable);
      if (alreadyAvailable != -1){
        //logger.info("retVal[alreadyAvailable].words vorher: " + retVal[alreadyAvailable].words);
        retVal[alreadyAvailable].words = retVal[alreadyAvailable].words.concat([allItemsByTag[i].getLabel(), allItemsByTag[i].getName()])
        //logger.info("retVal[alreadyAvailable].words nachher: " + retVal[alreadyAvailable].words);
      } else {
        //logger.info("retVal vorher: " + JSON.stringify(retVal));
        retVal = retVal.concat(JSON.parse('[' +
          '{"meaning": "' + allItemsByTag[i].getLabel() + '", "words":["' + allItemsByTag[i].getLabel() +'","'+ allItemsByTag[i].getName() +'"]}' +
        ']'));
        //logger.info("retVal nachher: " + JSON.stringify(retVal));
      }
    }
  }
  return retVal
}

function findIndexInSynonymsByMeaning(searchString, synonyms){
  for (var k = 0, len = synonyms.length; k < len; k++){
    if (synonyms[k].meaning == searchString){
      return k;
    }
  }
  return -1;
}

function assignWordsToSynonymGroup(cmd, synonymGroupToCheck){
  var retVal = new Array();
  for (var k = 0, len = synonymGroupToCheck.length; k < len; k++){
    var searchIndex = cmd.search(synonymGroupToCheck[k].meaning);
    //logger.info("Suchbegriff: " + synonymGroupToCheck[k].meaning.toLowerCase() + ". Gefunden: " + searchIndex);
    if (searchIndex != -1){
      retVal[retVal.length] = synonymGroupToCheck[k].meaning;
    }
  }
  return retVal;
};

function findLocation(cmd){
  return assignWordsToSynonymGroup(cmd, synonymsLocation);
};

function findType(cmd){
  return assignWordsToSynonymGroup(cmd, synonymsType);
};

function findValue(cmd){
  var values = assignWordsToSynonymGroup(cmd, synonymsValue);
  // if string and number then remove string (due to grammatically reasons: e.g. "turn smt 10% on")
  for (var i = 0, len = values.length; i < len; i++) {
    if (!isNaN(values[i])){
      /*logger.info("values[i]: " + values[i] + " ist eine number");
      logger.info("i: " + i);
      logger.info("values.slice(i,i+1): " + values.slice(i,i+1));*/
      return values.slice(i,i+1);
    }
  }
  return values;
};


function updateCommandWithSynonymMeaning(synonymlist, command){
  var retVal = command.toLowerCase();
  for (var i = 0, len = synonymlist.length; i < len; i++) {
    //logger.info("retVal.toLowerCase(): " +  retVal.toLowerCase());
    //logger.info("synonymlist[i].meaning: " +  synonymlist[i].meaning);
    if (retVal.search(synonymlist[i].meaning.toLowerCase()) > -1){
      retVal = retVal.replace(synonymlist[i].meaning.toLowerCase(), synonymlist[i].meaning);
    }
    for (var j = 0, l = synonymlist[i].words.length; j < l; j++){
      // ToDo: Check if a part of a word is replaced or a standalone word.
      //logger.info("Wort zu lower Case: " + synonymlist[i].words[j]);
      if (retVal.search(synonymlist[i].words[j].toLowerCase()) > -1){
        //logger.info("synonymlist[i].words[j] ist "+ synonymlist[i].words[j] +" ersetzen mit: " + synonymlist[i].meaning);
        retVal = retVal.replace(synonymlist[i].words[j], synonymlist[i].meaning);
      }
    }
  }
  //logger.info("updateCommandWithSynonymMeaning: " + command + " wurde zu "+ retVal);
  return retVal;
};

function convertCommandToJSON(command){
  var retVal = JSON.parse('{' +
  '"locations": [],' +
  '"value": [],' +
  '"type": [],' +
  '"other": []' +
'}');
  retVal.locations.push(findLocation(command));
  retVal.type.push(findType(command));
  retVal.value.push(findValue(command));
  logger.info("retVal: " + JSON.stringify(retVal));
  return retVal;
};

function sayVoiceRespones(type, loc, val){
  var message = "Ich setze " + type;
  if (loc != null){
    message = message + " in " + loc;
  }
  message = message +" auf " + val + ".";
  Voice.say(message, tts, sink, volume);
};

function specialCommandExecuted(command){
  logger.info("specialCommandExecuted wird ausgeführt mit: " + command);
  if (command.search("musik", command) != -1 && command.search("an", command) != -1) {
    switchMusic("ON", null);
    sayVoiceRespones("Musik", null, "OFF");
    return true;
  } else if (command.search("musik", command) != -1 && command.search("aus", command) != -1) 
  {
    switchMusic("OFF", null);
    sayVoiceRespones("Musik", null, "OFF");
    return true;    
  } /*else if (command.search("Light im katz feier", command) != -1  || command.search("Light in katz feier", command) != -1  || command.search("Light in kanns feier", command) != -1   || command.search("Light in gONz bayern", command) != -1 || command.search("Light sind katz feier", command) != -1) 
  {
    events.sendCommand("LichtK2SwitchItem", "ON");
    return true;    
  }*/
  return false;
}

function switchMusic(state, location){
  if (location == null){
    events.sendCommand(ir.getItem("SqueezePlayer1Playpause"), state);
    events.sendCommand(ir.getItem("SqueezePlayer2Playpause"), state);
    events.sendCommand(ir.getItem("SqueezePlayer3Playpause"), state);
    events.sendCommand(ir.getItem("SqueezePlayer4Playpause"), state);
    return;
  } else {
    switch(location) {
      case "alles":
        events.sendCommand(ir.getItem("SqueezePlayer1Playpause"), state);
        events.sendCommand(ir.getItem("SqueezePlayer2Playpause"), state);
        events.sendCommand(ir.getItem("SqueezePlayer3Playpause"), state);
        events.sendCommand(ir.getItem("SqueezePlayer4Playpause"), state);
        break;
      case "wohn":
        logger.info("case wohn in switchMusic not implemented yet. location: " + location);
        break;
      case "küche":
        logger.info("case küche in switchMusic not implemented yet. location: " + location);
        break;
      case "ess":
        logger.info("case ess in switchMusic not implemented yet. location: " + location);
        break;
      case "wc":
        logger.info("case wc in switchMusic not implemented yet. location: " + location);
        break;
      case "draußen":
        logger.info("case draußen in switchMusic not implemented yet. location: " + location);
        break;
      default:
        logger.info("case default in switchMusic not implemented yet. location: " + location);
        break;
    } 
  }
};


The build in keyword spotter rustpotter was done by myself, and don’t behave as well as I would like, so don’t expect too much of it, I’m planning on dedicate some time to try to improve it, probably after looking into the mobile wrapper for this one, I will focus on that, over all in see if it can preform better on presence of noise.

The last version, removed the early mentioned dedicated keyword for rustpotter web, apart from that I was able to remove also the ‘secure’ configuration, that I think was also difficult to understand, and mimic the api security implicit user role and trusted networks functionalities, so the setup difficulty has been reduced a little.

The latest version fixes a connection error in chrome.

Also it should improve the ui performance in modern browsers, as it avoid passing audio data through the main thread taking advantage of worklets and message channels.

@ornostar, I have linked the android apk in main post, in case you can give it a try and let me know how it goes :grin:, I was able just to tried it on my old nexus 5X, it seems to work.

Regards!

I think the bugs are fixed and I’m going to remove the beta tag, probably when OpenHAB 4.1.0 is released.

It would be great if someone has a moment to try it.

I have reviewed both the website on different platforms and the different installers available and everything seems to work correctly.

Best regards.

Edit: In fact I’m going to create another beta because I realize the option for adding a new habspeaker thing was missing in the MainUI, I was always going through the url.

I’ll try to share a video about basic usage.

Actually I made things worst on the beta 29, I have to change the add-on type again to a binding and change the uri to solve the problems. Also I realized that the android apk was incorrectly generated in the CI and I have fixed it.

It does not works on Firefox Android, already commented on the thread, I’m looking at it.