HAB Speaker (dialog processing in the browser)

logo

HAB Speaker is a project designed to facilitate the use of the openHAB dialog processing capabilities.

This add-on consist on a web UI that uses the browser audio api and a WebSocket connection to enable dialog processing agains your openHAB server.

There are two requirements when using this UI on a browser, because for capturing the microphone audio in a browser user interaction and using the page over https are required. To overcome these requirements you can install HABSpeaker as a desktop or mobile app (Electron and Capacitor were used to bring the web UI to those platforms).

Here you can find the desktop installers.
Here you can find the mobile installers. (only the android apk is available at the moment, I will try to distribute a demo through the official stores).

Here you can find the project readme which summarize its installation and functionalities.

Changelog

Version 3.4.0 - BETA19

  • Support screen saver brightness dim on mobile.
  • Support prevent sleep on mobile, desktop and browser.
  • Auto start the speaker at app launch on mobile.
  • Fix read of json media files.

Version 3.4.0 - BETA18

  • Fix race error on initialization that breaks audio streaming.
  • Improve local settings configuration all platforms.
  • Change capacitor audio permissions library.
  • Allow any cors domain on ui endpoints (needed for capacitor).

Version 3.4.0 - BETA17

  • Fix speaker connection on chrome when implicit user role.
  • Use audio worklet if available instead of deprecated script processor.
  • Use message channels to transfer audio data (to avoid pass it through the main thread).
  • Changes to build for android/ios using capacitor.

Version 3.4.0 - BETA16

  • Remove websocket secure option and mimic implicit user role logic.
  • Remove the keyword config for rustpotter web.
  • Add new option to customize the keyword by speaker.
  • New electron preload animation.

Version 3.4.0 - BETA15

  • Fix interpreters configuration for unregistered speakers. (Thanks to ornostar).
  • Improve electron app Librespot integration.
  • Fix incorrect Spotify volumen on start.
  • Add debug logs to ws auth.

Version 3.4.0 - BETA14

  • Disconnect speaker on ping fail.
  • Improve duplicate speaker message.
  • Media control voice commands fallback to a speaker that is playing media.
  • Spotify integration fixes.
  • Add electron app installers to readme.

Version 3.4.0 - BETA13

  • Add voice commands to transfer media.
  • Fix YouTube Player not loading after first close.

Version 3.4.0 - BETA12

  • Add support to run as an electron app.
  • Basic scripts for build electron app (windows/linux/macOS).
  • Fixes init Spotify on web when authentication is required.
  • Change WebSocket auth method (send token in Sec-WebSocket-Protocol header).
  • Update readme with electron app details.

Version 3.4.0 - BETA11

  • Add media files and web player phrases.
  • Fix login redirect.
  • Support client keyword spotting.

Version 3.4.0 - BETA10

  • Fix playing live video url.
  • Add speaker location.
  • Avoid audio format conversion when drop-in.
  • Use 16000Hz to send audio to client and avoid client audio resampling on supported browsers (chrome).
  • Drop-in small refactor.
  • Remove restore previous volume.
  • Add start drop-in phrase.
  • Fix spotify reconnection.

Version 3.4.0 - BETA9

  • Continue UI code cleaning/refactor.
  • Update UI dependencies.
  • Implement media fast-forward/rewind in the server using seek.
  • Fix watchOnYouTubePhrase config label.
  • Lower media volumen while dialog is active.
  • Improve disable screensaver when playing video media.
  • Add next/previous media voice commands.
  • Improve YouTube search.
  • Add basic Spotify player ui.

Version 3.4.0 - BETA8

  • Code refactor
  • Start typescript migration
  • Add media providers (initial draft)
  • Readme updates

Version 3.4.0 - BETA7

  • UI screen saver
  • Fix stereo sink using channel 0 data for channel 1
  • Fix speaker stt and tts configs

Version 3.4.0 - BETA6

  • Drop-in support (speaker to speaker communication)
  • Keep microphone stream active (fixes speaker operation when returning from background on mobile)
  • Allow speaker voice control (requires service configuration) (only “stop drop-in” phrase is implemented) (documentation pending)
  • UI sink implementation fixes

Version 3.4.0 - BETA5

  • UI viewport block zoom
  • UI double max volume level

Version 3.4.0 - BETA4

  • Add speaker voice configurations.
  • Support server keyword spotting.
  • Add spot channel.
  • Fix concurrent modification exception on bundle stop.

Version 3.4.0 - BETA3

  • Sink wait for audio to be played as other oh sinks
  • Add listeningItem speaker configuration
  • Update readme

Version 3.4.0 - BETA2

  • Sink mp3 support
  • Sink use stereo audio (speaker configuration)
  • Sink volume control fixes
  • Fix ui login (authenticate agains the login page instead of relying on the main ui)
  • Use thing label as sink/source label.

Version 3.4.0 - BETA1

  • A speaker is now a thing and can be discovered.
  • Implement sink volume support.
  • Remove local configs unless the speaker id.

Version 3.4.0 - BETA0

  • initial release

Status and future development

Basic dialog processing seems to work on all platforms.

Actually I have only tested this on:

  • Web version: Safari, Chrome.
  • Desktop version: Windows, MacOS.
  • Mobile version: iOS.

Media capabilities are broken on iOS, causes glitches in the browser, I thing due to bugs on their WebAudioApi support, pending to investigate and report.

These are some things I would like to add for the final version:

  • Update README gif, and clarify speaker states with images.
  • Create own icons (current icons are copied from habot).
  • Basic styles improvements and allow speaker colors customization.
  • Basic mobile application wrapper.
  • Basic desktop wrapper using electron.
  • Improve the settings page design.
  • Media playback capabilities.
  • Support keyword spotting on the client.
  • UI web screen saver (to prevent pixel damage).
  • Speaker voice commands.
  • Basic drop-in support.
  • Implement sink volumen.
  • Enable mp3 support for the sink (converted on the server to wav).
  • Support keyword spotting on the server.
  • Add stereo support to the sink.
  • Improve the authentication mechanism.

About my personal case and the motivations for this project:

I’ve been some time using the dialog processing capabilities of openHAB and I have a couple of speakers at home using the pulseaudio binding next to my Echo devices which is what I mostly use nowadays to interact with openHAB, I actually only use pulseaudio speakers to process a couple of custom phrases that I use to control my TVs, but my goal will be to remove the Echo devices in the future (only things I use in the Echo devices are the openHAB skill and the Spotify integration).

As I only have a couple of speakers setup at home, a lot of times I miss been able to use my mobile or laptop to speak to openHAB, and that was the main motivation for starting this project.

Also, while developing add-ons for dialog processing in the past, I felt that tests the openHAB dialog capabilities can require a lot of configuration depending on your setup, because it requires you to have audio capabilities on your openHAB host (to use the system sink/source) or to setup some remote device using the pulseaudio binding which takes some time. So I also see this add-on as a way to test the dialog processing capabilities using capable audio devices that you already own with a quicker/easier setup.

I enjoyed so much developing this add-on, it allows me to learn some things about the WebAudioApi and it’s always interesting to develop anything related to openHAB, hope some of you can find it useful and that it motivates more people to think about how the dialog processing support can be improved in future versions.

Regards!

Resources

JAR bundle
Source and Documentation

7 Likes

Interesting, thanks !

I plan to do a client (not a web one, but still) with this kind of audio capability.
I want to provide intercom capability over SIP to my raspberrys all over the house), and that kind of speaker functionnality for openHAB is totally on my todo list.

I will monitor this and maybe some day steal some code :sweat_smile:

I’ll probably also make a native client for this once I finished with the web. I actually discovered the SIP widget today, really cool. Have to read more about that protocol, seems interesting.

2 Likes

awesome contribution Miguel :+1:

1 Like

This sounds great, thank you.

Just for clarification: it is possible to start “recording” audio interactively and it will not stop until I stop recording manually? It will always listen to “wake words” or is it needed to start “recording” after each command?
How about multiple devices respectively multiple audio sources? Openhab will only listen to one audio source as far as I know, in example for wake word detection.

I’m not sure that I’m understanding the question. This project manages the dialog creation by itself so you don’t need to create one (registers a dialog per connected client). By default it uses a phantom keyword spotter that is triggered by clicking a button on the web interface, but you can also activate the keyword spotting on the thing config if you register the speaker in openhab using the discovery service, it’s all on the readme.

No, you can have multiples dialogs running with keyword spotting, there is not such limitation.

Sorry for my unclear question nad thank you for clarification.
I just installed the add-on and trying to get running. But I get an error during capturing.
I think my vosk STT is not working correctly, just trying to fix it.

Update:
got it to work :slight_smile: my vosk STT model was not correctly named. I’m able to perform voice commands through your binding now.
However it seems it does not wait for spotting a command until the magic word is detected. It seems it starts spotting when there is enough noise instead when the magic word is spoken.

Great that you managed to setting it up!
Which keyword spotting are you using?

Currently I have rustpotter installed, using the built in magic word in german “guten morgen” for testing, but in future I want to build a own magic word using the rustpotter-cli.

Another question:
Which possibilities do you see to use/setup/place your “web-speaker” around home. For example using an RPi Zero with Respeaker an configure the websocket through a web browser? Any other ideas/possibilities?

Yes I’m facing the same issue with loud noises. Rustpotter is a project I developed with information I found around, I have little/no knowledge about sound analysis but at the moment I couldn’t find a good open-source alternative so I decided to give it a try. There is a couple of improvements I will like to try there, dynamically change the audio gain when detecting too much noise is something that can maybe solve that problem :crossed_fingers:. But I have been lately more motivated to improve other tools and finish a stable version of this addon.

I think that for a persistent use of this web interface, will be interesting wrapped it into a native application, mostly to try to take control over the device screen and to prevent the system to sleep (and to remove the necesite of user interaction to use the web audio api).
I think that having mobile applications would be nice so old mobiles/tablets can be reused, I’m looking on using the capacitor project for this.
For desktop I was thinking on using electron. And I think that looking for a raspbian distribution that target a single electron application could be also something interesting to lookup.

But all of this are plans I got, I’m not sure I’ll go so far with it. At the moment I’m not setup any speaker, but I think that if you keep the system/screen awake and it’s running in foreground (top application) it should work fine. Let me know how it goes for you.

I suggest you to verify the records are clean of noise.

You can try this tool Rustpotter Build Model Demo!
I need to add something there to decrease the microphone gain, on my tests right now it record too much background noise, I suggest you to lower the microphone input on the system for now (now sure if it will do the trick). But it’s quicker to use than the rustpotter-cli and you can easily test the results on the demo (Rustpotter Web Demo!) selecting ‘from file’ on the wake word selector.

As rustpotter works on the browser thanks to web assembly, it will be integrated into the habspeaker interface to offload the openHAB server. But don’t expect it too soon.

I guess that mobiles/tablets will not work very well for speech recognising if there is some distance between the speaking person and the mobile/tablet, but it’s only guessing, no real life experience.
In my opinion a tablet is also consuming too much energy, only for speech regocnision.
Thats why I was thinking about a Raspberry Pi Zero or something like that. But I don’t know if there is any possibility to start HAB Speaker without a screen attached, for example thrue CLI or whatever.

Yes, I started to write and forgot to answer that. I think that it should be possible, once the project is wrapped into an electron app, it will not have the limitation on requiring user interaction to start capturing audio (I think electron removes this by default). It should be posible to run it in headless mode and read the speaker id from a configuration file so no user interaction is required. I’m not sure about the performance, but fortunately (because I think they are out of stock) I have a Raspberry Zero W so I will try that at some point.

I think I can give it a try in a couple of weeks after reviewing the current functionalities and finish the media providers, I will start with the desktop distribution as I have no current preference and it will be probably the easest.

Forgot to mention you can currently setup a speaker in a raspberry pi using the pulseaudio binding. It requires some configuration but it works fine, I was running two speakers for days without issues, one of them on the Zero W model. The problem is that the core is not ready to handle the dialog creation by itself so you should do it through rules or the cli and it will not be recreated if the server is restarted. I’m planning on open a proposal for the next release to add this functionality, for now using the rule engine you can build a script that setup up the speakers. What I did was link a switch item to a rule that turn on/off the speakers, but as there is a trigger for the system initialization maybe it can be automated.

Yes, Raspberrys are still out of stock. It will take some time to get one.
I’m looking forward to your further progress with the wrapping into an electron app.

Yes I read about the pulseaudio binding but it doesn’t look easy as you mentioned.

I just tried your model demo which seem to work. I decreased the input volume of my microphone and created a model for “hey openhab”. However, it is still spotting whatever he wants and starts to listen. I noticed that this is also the case in the web demo. The score is always something around 0.5xxx, no matter how many words/expressions are matching.

Normally I use a threshold around 0.6 with the personal models. I just recorded “hey openhab” using the web tool and it seems to work ok with a threshold of 0.61.
Still need to take a deeper look into some things in the library, will ping you if I find a way to improve the results.
Regards!

Yes, using a threshold of 0.6 and an average threshold of 0.3 seems to work a lot better, thank you.

Hi Miguel,
thanks for providing thing this extentions. After reading about it in the Openhab release notes I felt motivated to experiment with voice commands over the holiday days.
I upgraded my Openhab installation to 3.4 and setup Rustspotter, Mimic and Vost. After that I installed HAB Speaker and started experimenting. I use my mobile phone to connect to my Openhab installation. Since i running OpenHAB on docker and use a nginx proxy, I had to add the https configuration. In addition I had to add extra configuration to get the Websockets to work. After that I could connect with my phone.

If somebody also uses a configuration like that - here is my nginx config for https and a self signed certificate:

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name openhab.home;
    
    include /etc/nginx/conf.d/ssl-params.conf;
    ssl_certificate /etc/nginx/ssl/nginx-selfsigned.crt;
    ssl_certificate_key /etc/nginx/ssl/nginx-selfsigned.key;

    location / {
	proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://docker.home:8081;
        // required for WS to work
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Here is my current issue:

As said above I run Openhab on docker at a small home server. (Quad core Celeron J3455, 8GB RAM)
I get some sound indications, when I connected and in the logs show a

2022-12-27 16:50:33.111 [INFO ] [rnal.websocket.HABSpeakerWebSocketIO] - New client connected.

The HAB Speaker thing also becomes online in the UI. But then nothing more happens. After my phone is connected the memory load jumps from ~8% to ~50% in some seconds and the machine get stuck. After around 60 seconds the memory usage increases to overall ~70% and the container gets killed by the OOM (dmesg output):

[167852.218581] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-04b0c076dfa4d743a899d5a8e22381be45da7a35caf69f735fa42830aca20d68.scope,task=java,pid=719063,uid=1000
[167852.218982] Out of memory: Killed process 719063 (java) total-vm:20249808kB, anon-rss:5709500kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21972kB oom_score_adj:0

It seems there is some kind of memory leak or the voice data can not processed by Openhab. I don’t see any indication or other problem. I currently don’t know how to debug the in more detail, but maybe you (or somebody else) know how to proceed from here to get this to work.

Another minor feedback
It took me 30 minutes to understand the state machine of HAB speaker. I did not understand the visual effects in the first place. Maybe it makes sense to add a small written indication on the webpage to indicate the current status (disconnected, connecting, connected, recording).

I use a Android phone (Oneplus 6t). I only could use the Chrome browser to connect. Firefox for Android did not work. It seems like when I clicked on the microphone button the connection was not established.

1 Like

Hi,
Since a while ago I use this awesome project https://nginxproxymanager.com in case you find it interesting. Maybe it’s worth to mention it in the post as an easy way to configure https. Either way thank you for sharing your nginx configuration.

Did you face any issue when using the speaker without rustpotter? In the last version I added rustpotter in the client, can you test if you face any issues when running like that? Because it seems more like a rustpotter issue that something related to the speaker.

Thank you for the feedback, maybe the readme needs also a clearer explanation of those states.
I’ll look into adding the option for displaying a state change message.

I have not done any testing on android devices :frowning: (Just safari, chrome(not to much), and safari iOS). Maybe the leak could be also caused by the speaker not working ok on the android browser (if you can doble check with a desktop browser that will be great). I have just a couple more of ideas to implement for now, then I will start testing it more deeply in all the devices I can get and try to make it as stable as possible.

Also, I never tried Mimic with the speaker but I don’t think the problem is related to that.
For information I’m testing it with rustpotter (server or client), vosk and voiceRSS or PicoTTS and openHAB running on a raspberry pi 4 or my Mac.

You’re welcome, hope something on this response helps you, let me know if you find anything else about the issue cause.

Regards!