HABSpeaker

logo

HABSpeaker is a project designed to facilitate the use of the openHAB dialog processing capabilities.

This add-on consist on a web UI that some web features (WebWorkers, AudioWorklets and WebSockets), to enable dialog processing agains OpenHAB.

It registers an AudioSink, an AudioSource and a DialogProcessor instance that are bound to the WebSocket connection establish between the server and the web UI.

Be aware that for capturing the microphone audio in a browser it requires an initial user event and using the page over https, , here you can find the related documentation about setting up https on your instance. Personally I use the docker image for running openHAB and I use this open source project also installed as docker container to manage my domains and certificates: Nginx Proxy Manager. If you want to give it a quick try without setting up https, in Chrome you can disable the security for a local domain.

This project does not work with the myOpenHAB Cloud Service for now.

You can install HABSpeaker as a desktop or mobile app (Electron and Capacitor were used to bring the web UI to those platforms). Those apps work over http and do not require the initial user interaction to launch the connection, other than that is the same than using the UI installed as a WebApp.

Here you can find the web UI packaged as an ElectronJS app.
Here you can find the web UI packaged as a mobile app. (only the android apk is available at the moment, IOS app is also working but I donā€™t have a viable way of distribution it, as I donā€™t have an paid apple developer account).

Here you can find the project readme which summarize its installation and functionalities.

Changelog

Version 4.0.x - BETA31

  • Change the client resample dependency to reduce size.
  • Fix initialize webworker in a promise.

Version 4.0.x - BETA30

  • Change addon type and package uri to fix navigation problems in the MainUI to the addon menus.

Version 4.0.x - BETA29

  • Fix navigation to add as new thing.
  • Fix secondary color option name.

Version 4.0.x - BETA28

  • Fix electron/capacitor integrations.
  • Implement suspend on hide option, recommended for mobile to suspend the speaker when in background.
  • Minor fixes.
  • Remove mp3 support from sink.

You can find other releases here: Releases Ā· GiviMAD/openhab-addons Ā· GitHub

Status and future development

I think the current one is the last beta version, as I have retested all the platforms and I think itā€™s working correctly.

Also there are two more things I would like to accomplish:

  • Make it work with the cloud connector.
  • Allow integration with the MainUI, I would like to mimic the way HABot is integrated as a widget.

Actually I have tested it on:

  • Web: Chrome, Firefox, Safari.
  • Desktop: Windows, MacOS.
  • Mobile: iOS, Android.

Regards!

Resources

JAR bundle
Source and Documentation

8 Likes

Interesting, thanks !

I plan to do a client (not a web one, but still) with this kind of audio capability.
I want to provide intercom capability over SIP to my raspberrys all over the house), and that kind of speaker functionnality for openHAB is totally on my todo list.

I will monitor this and maybe some day steal some code :sweat_smile:

Iā€™ll probably also make a native client for this once I finished with the web. I actually discovered the SIP widget today, really cool. Have to read more about that protocol, seems interesting.

2 Likes

awesome contribution Miguel :+1:

1 Like

This sounds great, thank you.

Just for clarification: it is possible to start ā€œrecordingā€ audio interactively and it will not stop until I stop recording manually? It will always listen to ā€œwake wordsā€ or is it needed to start ā€œrecordingā€ after each command?
How about multiple devices respectively multiple audio sources? Openhab will only listen to one audio source as far as I know, in example for wake word detection.

Iā€™m not sure that Iā€™m understanding the question. This project manages the dialog creation by itself so you donā€™t need to create one (registers a dialog per connected client). By default it uses a phantom keyword spotter that is triggered by clicking a button on the web interface, but you can also activate the keyword spotting on the thing config if you register the speaker in openhab using the discovery service, itā€™s all on the readme.

No, you can have multiples dialogs running with keyword spotting, there is not such limitation.

Sorry for my unclear question nad thank you for clarification.
I just installed the add-on and trying to get running. But I get an error during capturing.
I think my vosk STT is not working correctly, just trying to fix it.

Update:
got it to work :slight_smile: my vosk STT model was not correctly named. Iā€™m able to perform voice commands through your binding now.
However it seems it does not wait for spotting a command until the magic word is detected. It seems it starts spotting when there is enough noise instead when the magic word is spoken.

Great that you managed to setting it up!
Which keyword spotting are you using?

Currently I have rustpotter installed, using the built in magic word in german ā€œguten morgenā€ for testing, but in future I want to build a own magic word using the rustpotter-cli.

Another question:
Which possibilities do you see to use/setup/place your ā€œweb-speakerā€ around home. For example using an RPi Zero with Respeaker an configure the websocket through a web browser? Any other ideas/possibilities?

Yes Iā€™m facing the same issue with loud noises. Rustpotter is a project I developed with information I found around, I have little/no knowledge about sound analysis but at the moment I couldnā€™t find a good open-source alternative so I decided to give it a try. There is a couple of improvements I will like to try there, dynamically change the audio gain when detecting too much noise is something that can maybe solve that problem :crossed_fingers:. But I have been lately more motivated to improve other tools and finish a stable version of this addon.

I think that for a persistent use of this web interface, will be interesting wrapped it into a native application, mostly to try to take control over the device screen and to prevent the system to sleep (and to remove the necesite of user interaction to use the web audio api).
I think that having mobile applications would be nice so old mobiles/tablets can be reused, Iā€™m looking on using the capacitor project for this.
For desktop I was thinking on using electron. And I think that looking for a raspbian distribution that target a single electron application could be also something interesting to lookup.

But all of this are plans I got, Iā€™m not sure Iā€™ll go so far with it. At the moment Iā€™m not setup any speaker, but I think that if you keep the system/screen awake and itā€™s running in foreground (top application) it should work fine. Let me know how it goes for you.

I suggest you to verify the records are clean of noise.

You can try this tool Rustpotter Build Model Demo!
I need to add something there to decrease the microphone gain, on my tests right now it record too much background noise, I suggest you to lower the microphone input on the system for now (now sure if it will do the trick). But itā€™s quicker to use than the rustpotter-cli and you can easily test the results on the demo (Rustpotter Web Demo!) selecting ā€˜from fileā€™ on the wake word selector.

As rustpotter works on the browser thanks to web assembly, it will be integrated into the habspeaker interface to offload the openHAB server. But donā€™t expect it too soon.

I guess that mobiles/tablets will not work very well for speech recognising if there is some distance between the speaking person and the mobile/tablet, but itā€™s only guessing, no real life experience.
In my opinion a tablet is also consuming too much energy, only for speech regocnision.
Thats why I was thinking about a Raspberry Pi Zero or something like that. But I donā€™t know if there is any possibility to start HAB Speaker without a screen attached, for example thrue CLI or whatever.

Yes, I started to write and forgot to answer that. I think that it should be possible, once the project is wrapped into an electron app, it will not have the limitation on requiring user interaction to start capturing audio (I think electron removes this by default). It should be posible to run it in headless mode and read the speaker id from a configuration file so no user interaction is required. Iā€™m not sure about the performance, but fortunately (because I think they are out of stock) I have a Raspberry Zero W so I will try that at some point.

I think I can give it a try in a couple of weeks after reviewing the current functionalities and finish the media providers, I will start with the desktop distribution as I have no current preference and it will be probably the easest.

Forgot to mention you can currently setup a speaker in a raspberry pi using the pulseaudio binding. It requires some configuration but it works fine, I was running two speakers for days without issues, one of them on the Zero W model. The problem is that the core is not ready to handle the dialog creation by itself so you should do it through rules or the cli and it will not be recreated if the server is restarted. Iā€™m planning on open a proposal for the next release to add this functionality, for now using the rule engine you can build a script that setup up the speakers. What I did was link a switch item to a rule that turn on/off the speakers, but as there is a trigger for the system initialization maybe it can be automated.

Yes, Raspberrys are still out of stock. It will take some time to get one.
Iā€™m looking forward to your further progress with the wrapping into an electron app.

Yes I read about the pulseaudio binding but it doesnā€™t look easy as you mentioned.

I just tried your model demo which seem to work. I decreased the input volume of my microphone and created a model for ā€œhey openhabā€. However, it is still spotting whatever he wants and starts to listen. I noticed that this is also the case in the web demo. The score is always something around 0.5xxx, no matter how many words/expressions are matching.

Normally I use a threshold around 0.6 with the personal models. I just recorded ā€œhey openhabā€ using the web tool and it seems to work ok with a threshold of 0.61.
Still need to take a deeper look into some things in the library, will ping you if I find a way to improve the results.
Regards!

Yes, using a threshold of 0.6 and an average threshold of 0.3 seems to work a lot better, thank you.

Hi Miguel,
thanks for providing thing this extentions. After reading about it in the Openhab release notes I felt motivated to experiment with voice commands over the holiday days.
I upgraded my Openhab installation to 3.4 and setup Rustspotter, Mimic and Vost. After that I installed HAB Speaker and started experimenting. I use my mobile phone to connect to my Openhab installation. Since i running OpenHAB on docker and use a nginx proxy, I had to add the https configuration. In addition I had to add extra configuration to get the Websockets to work. After that I could connect with my phone.

If somebody also uses a configuration like that - here is my nginx config for https and a self signed certificate:

server {
    listen 443 ssl;
    listen [::]:443 ssl;
    server_name openhab.home;
    
    include /etc/nginx/conf.d/ssl-params.conf;
    ssl_certificate /etc/nginx/ssl/nginx-selfsigned.crt;
    ssl_certificate_key /etc/nginx/ssl/nginx-selfsigned.key;

    location / {
	proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_pass http://docker.home:8081;
        // required for WS to work
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Here is my current issue:

As said above I run Openhab on docker at a small home server. (Quad core Celeron J3455, 8GB RAM)
I get some sound indications, when I connected and in the logs show a

2022-12-27 16:50:33.111 [INFO ] [rnal.websocket.HABSpeakerWebSocketIO] - New client connected.

The HAB Speaker thing also becomes online in the UI. But then nothing more happens. After my phone is connected the memory load jumps from ~8% to ~50% in some seconds and the machine get stuck. After around 60 seconds the memory usage increases to overall ~70% and the container gets killed by the OOM (dmesg output):

[167852.218581] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=user.slice,mems_allowed=0,global_oom,task_memcg=/system.slice/docker-04b0c076dfa4d743a899d5a8e22381be45da7a35caf69f735fa42830aca20d68.scope,task=java,pid=719063,uid=1000
[167852.218982] Out of memory: Killed process 719063 (java) total-vm:20249808kB, anon-rss:5709500kB, file-rss:0kB, shmem-rss:0kB, UID:1000 pgtables:21972kB oom_score_adj:0

It seems there is some kind of memory leak or the voice data can not processed by Openhab. I donā€™t see any indication or other problem. I currently donā€™t know how to debug the in more detail, but maybe you (or somebody else) know how to proceed from here to get this to work.

Another minor feedback
It took me 30 minutes to understand the state machine of HAB speaker. I did not understand the visual effects in the first place. Maybe it makes sense to add a small written indication on the webpage to indicate the current status (disconnected, connecting, connected, recording).

I use a Android phone (Oneplus 6t). I only could use the Chrome browser to connect. Firefox for Android did not work. It seems like when I clicked on the microphone button the connection was not established.

1 Like

Hi,
Since a while ago I use this awesome project https://nginxproxymanager.com in case you find it interesting. Maybe itā€™s worth to mention it in the post as an easy way to configure https. Either way thank you for sharing your nginx configuration.

Did you face any issue when using the speaker without rustpotter? In the last version I added rustpotter in the client, can you test if you face any issues when running like that? Because it seems more like a rustpotter issue that something related to the speaker.

Thank you for the feedback, maybe the readme needs also a clearer explanation of those states.
Iā€™ll look into adding the option for displaying a state change message.

I have not done any testing on android devices :frowning: (Just safari, chrome(not to much), and safari iOS). Maybe the leak could be also caused by the speaker not working ok on the android browser (if you can doble check with a desktop browser that will be great). I have just a couple more of ideas to implement for now, then I will start testing it more deeply in all the devices I can get and try to make it as stable as possible.

Also, I never tried Mimic with the speaker but I donā€™t think the problem is related to that.
For information Iā€™m testing it with rustpotter (server or client), vosk and voiceRSS or PicoTTS and openHAB running on a raspberry pi 4 or my Mac.

Youā€™re welcome, hope something on this response helps you, let me know if you find anything else about the issue cause.

Regards!