HAB Speaker is a project designed to facilitate the use of the openHAB dialog processing capabilities.
This add-on consist on a web UI that uses the browser audio api and a WebSocket connection to enable dialog processing agains your openHAB server.
There are two requirements when using this UI on a browser, because for capturing the microphone audio in a browser user interaction and using the page over https are required. To overcome these requirements you can install HABSpeaker as a desktop or mobile app (Electron and Capacitor were used to bring the web UI to those platforms).
Here you can find the desktop installers.
Here you can find the mobile installers. (only the android apk is available at the moment, I will try to distribute a demo through the official stores).
Here you can find the project readme which summarize its installation and functionalities.
Changelog
Version 3.4.0 - BETA19
- Support screen saver brightness dim on mobile.
- Support prevent sleep on mobile, desktop and browser.
- Auto start the speaker at app launch on mobile.
- Fix read of json media files.
Version 3.4.0 - BETA18
- Fix race error on initialization that breaks audio streaming.
- Improve local settings configuration all platforms.
- Change capacitor audio permissions library.
- Allow any cors domain on ui endpoints (needed for capacitor).
Version 3.4.0 - BETA17
- Fix speaker connection on chrome when implicit user role.
- Use audio worklet if available instead of deprecated script processor.
- Use message channels to transfer audio data (to avoid pass it through the main thread).
- Changes to build for android/ios using capacitor.
Version 3.4.0 - BETA16
- Remove websocket secure option and mimic implicit user role logic.
- Remove the keyword config for rustpotter web.
- Add new option to customize the keyword by speaker.
- New electron preload animation.
Version 3.4.0 - BETA15
- Fix interpreters configuration for unregistered speakers. (Thanks to ornostar).
- Improve electron app Librespot integration.
- Fix incorrect Spotify volumen on start.
- Add debug logs to ws auth.
Version 3.4.0 - BETA14
- Disconnect speaker on ping fail.
- Improve duplicate speaker message.
- Media control voice commands fallback to a speaker that is playing media.
- Spotify integration fixes.
- Add electron app installers to readme.
Version 3.4.0 - BETA13
- Add voice commands to transfer media.
- Fix YouTube Player not loading after first close.
Version 3.4.0 - BETA12
- Add support to run as an electron app.
- Basic scripts for build electron app (windows/linux/macOS).
- Fixes init Spotify on web when authentication is required.
- Change WebSocket auth method (send token in Sec-WebSocket-Protocol header).
- Update readme with electron app details.
Version 3.4.0 - BETA11
- Add media files and web player phrases.
- Fix login redirect.
- Support client keyword spotting.
Version 3.4.0 - BETA10
- Fix playing live video url.
- Add speaker location.
- Avoid audio format conversion when drop-in.
- Use 16000Hz to send audio to client and avoid client audio resampling on supported browsers (chrome).
- Drop-in small refactor.
- Remove restore previous volume.
- Add start drop-in phrase.
- Fix spotify reconnection.
Version 3.4.0 - BETA9
- Continue UI code cleaning/refactor.
- Update UI dependencies.
- Implement media fast-forward/rewind in the server using seek.
- Fix watchOnYouTubePhrase config label.
- Lower media volumen while dialog is active.
- Improve disable screensaver when playing video media.
- Add next/previous media voice commands.
- Improve YouTube search.
- Add basic Spotify player ui.
Version 3.4.0 - BETA8
- Code refactor
- Start typescript migration
- Add media providers (initial draft)
- Readme updates
Version 3.4.0 - BETA7
- UI screen saver
- Fix stereo sink using channel 0 data for channel 1
- Fix speaker stt and tts configs
Version 3.4.0 - BETA6
- Drop-in support (speaker to speaker communication)
- Keep microphone stream active (fixes speaker operation when returning from background on mobile)
- Allow speaker voice control (requires service configuration) (only āstop drop-inā phrase is implemented) (documentation pending)
- UI sink implementation fixes
Version 3.4.0 - BETA5
- UI viewport block zoom
- UI double max volume level
Version 3.4.0 - BETA4
- Add speaker voice configurations.
- Support server keyword spotting.
- Add spot channel.
- Fix concurrent modification exception on bundle stop.
Version 3.4.0 - BETA3
- Sink wait for audio to be played as other oh sinks
- Add listeningItem speaker configuration
- Update readme
Version 3.4.0 - BETA2
- Sink mp3 support
- Sink use stereo audio (speaker configuration)
- Sink volume control fixes
- Fix ui login (authenticate agains the login page instead of relying on the main ui)
- Use thing label as sink/source label.
Version 3.4.0 - BETA1
- A speaker is now a thing and can be discovered.
- Implement sink volume support.
- Remove local configs unless the speaker id.
Version 3.4.0 - BETA0
- initial release
Status and future development
Basic dialog processing seems to work on all platforms.
Actually I have only tested this on:
- Web version: Safari, Chrome.
- Desktop version: Windows, MacOS.
- Mobile version: iOS.
Media capabilities are broken on iOS, causes glitches in the browser, I thing due to bugs on their WebAudioApi support, pending to investigate and report.
These are some things I would like to add for the final version:
- Update README gif, and clarify speaker states with images.
- Create own icons (current icons are copied from habot).
- Basic styles improvements and allow speaker colors customization.
Basic mobile application wrapper.Basic desktop wrapper using electron.Improve the settings page design.Media playback capabilities.Support keyword spotting on the client.UI web screen saver (to prevent pixel damage).Speaker voice commands.Basic drop-in support.Implement sink volumen.Enable mp3 support for the sink (converted on the server to wav).Support keyword spotting on the server.Add stereo support to the sink.Improve the authentication mechanism.
About my personal case and the motivations for this project:
Iāve been some time using the dialog processing capabilities of openHAB and I have a couple of speakers at home using the pulseaudio binding next to my Echo devices which is what I mostly use nowadays to interact with openHAB, I actually only use pulseaudio speakers to process a couple of custom phrases that I use to control my TVs, but my goal will be to remove the Echo devices in the future (only things I use in the Echo devices are the openHAB skill and the Spotify integration).
As I only have a couple of speakers setup at home, a lot of times I miss been able to use my mobile or laptop to speak to openHAB, and that was the main motivation for starting this project.
Also, while developing add-ons for dialog processing in the past, I felt that tests the openHAB dialog capabilities can require a lot of configuration depending on your setup, because it requires you to have audio capabilities on your openHAB host (to use the system sink/source) or to setup some remote device using the pulseaudio binding which takes some time. So I also see this add-on as a way to test the dialog processing capabilities using capable audio devices that you already own with a quicker/easier setup.
I enjoyed so much developing this add-on, it allows me to learn some things about the WebAudioApi and itās always interesting to develop anything related to openHAB, hope some of you can find it useful and that it motivates more people to think about how the dialog processing support can be improved in future versions.
Regards!