With https://github.com/openhab/openhab-webui/pull/2285 being merged, we now have an easy way to access (especially during development/testing) openHAB’s dialog system.
I am currently working on modernising the Piper JNI wrapper used by the Piper TTS voice add-on, next thing will probably be to look into modernising the Whisper JNI wrapper used by the Whisper STT voice add-on as well. At the current state, those are already really good and provide a solid foundation for TTS and STT.
The only thing currently missing is a good human language interpreter (HLI), but there are plans and there is WIP to implement one based on LLMs.
Once that’s finished, we need hardware. You can either buy hardware yourself and create your own firmware based on our PCM Audio WebSocket (supported since openHAB 5.1; documentation PR is already open).
Though the better option would be to have the (unofficial) ESPHome binding support the ESPHome Voice protocol, which would allow you to buy a Home Assistant Voice device.
Pine64 is also working on cheap hardware (they told me about 30$), the PineVox, but that isn’t finished yet (they told me will likely be finished in three months). IIRC at the moment they are only supporting HA’s Wyoming (of which the voice satellite part is kind of deprecated in favour of ESPHome Voice), but this could either be made compatible with openHAB (if PineVox used ESPHome Voice, again we only need to support that; or by having Wyoming satellite support; or by extending PineVox with our WS API).
If I understand correctly, the “FutureProofhomes”-hardware stack is built on something like this with added ESP32: ReSpeaker Mic Array v2.0 - Seeed Studio - which then triggers a (local) LLM, which in turn can trigger some endpoints on let’s say a smarthome system like openHAB.
Do you have more information on PineVox? For me it seems an abandoned track?
I had a swift look at FutureproofHome at it seems (from looking at the website, haven’t looked at the code) that they are compatible with ESPHome, so it seems we could tick many hardware support boxes by supporting ESPHome voice protocol.
Not yet, I’m waiting for the release/focussing on the core tasks at the moment. But we talked to them a bit on FOSDEM and it’s in development and not finished yet.
yes, indeed. The Satellite1 DevKit is in itself ESPHome based and supports all of it. In HA it appears automatically as an ESPHome device. If OH also supports the ESPHome voice protocol, it would work ootb, I think.
As I understand it, you can use the DevKit to build your own smart speaker (like ceiling speakers or use the 3D-printed enclosures provided by FutureProofHomes themselves.
For me, this finally sounds like a way to avoid having to feed cloud services like amazon. I just wished, the reaction time would be a bit faster. This really is a drawback, that the processing and thinking time is longer as you see in like echo-devices.
I think that really depends on the hardware you are running on, and where the LLM runs that you use. Cloud-based LLMs have good inference speed but latency, local LLMs depend on your hardware. BTW, the reaction time is also a reason to link into the JNI wrappers. Especially for Whisper, using a newer version of whisper.cpp could really improve performance depending on your platform as support for special hardware acceleration is added over time. For example on Apple Silicon, latest whisper.cpp can accelerate using Metal and Core ML.