My name is Kristian Kielhofner and I am the creator of Willow:
Willow is completely open source hardware device firmware that provides Amazon Echo/Google Home competitive wake word and speech recognition on the commercially available ESP BOX from our friends at Espressif.
It’s actually “Alexa/Echo competitive”. Wake word detection, voice activity detection, echo cancellation, automatic gain control, and high quality audio for $50 means with Willow and the support of the openHAB action template interpreter there are no compromises on looks, quality, accuracy, speed, and cost. Seriously - wake word and speech recognition from 30ft away.
It’s cheap. With a touch LCD display, dual microphones, speaker, enclosure, buttons, etc it can be purchased for $50 all-in.
It’s ready to go. Take it out of the box, flash with Willow, put it somewhere.
It’s not ugly. No mess of wires or throwing together random components.
It’s not creepy. Voice is either sent to a self-hosted, highly optimized Willow Inference Server (or anything provided by openHAB). Additionally, commands can be recognized locally on the ESP BOX with the on device speech recognition module.
It doesn’t hassle or try to sell you. If I hear “Did you know?” one more time from Alexa I think I’m going to lose it.
It’s open source. Did I mention it’s open source?
It still does cool maker stuff. With 16 GPIOs exposed on the back of the enclosure there are all kinds of interesting possibilities.
It’s modular. Our first integration was to Home Assistant but we aim to be the best voice user interface in the world - across any platform (including openHAB, of course)!
We’re very interested in developing an openHAB integration module for Willow. From the looks of the documentation it should be pretty straightforward to interact with the openHAB action template interpreter API with no software development needed on the openHAB side.
Is this something the openHAB community is interested in?
I would say yes. I know a lot of people used Mycroft in the past and a number of users have set up other ones as well.
Indeed I would expect integration would be possible with no changes to OH, just add-ons that implement the right interfaces in the right ways.
I would expect even just a simple add-on would be sufficient for this. There’s no cloud so there shouldn’t be any changes needed to the cloud server. But of course I’ve not looked into that kind of integration myself. Dragons may live here.
In addition to Willow sending commands into OH and retrieving information on command, the ability of OH to send TTS for announcements would be important. The Chromecast add-on is one you can look to for examples for how to do that (essentially the devices need to appear in OH as an audio sink), when Willow supports that of course.
This is a pretty cool project! Thanks for bringing attention to it to the OH community!
Wake is more accurate (truly commercial/Alexa grade).
Speech recognition is more accurate (same).
The “just plug it in” hardware is $50, not $400.
From what I’m reading in the openHAB docs we shouldn’t need any new development on the OH side. More than anything I’m here to introduce myself, gauge interest, and see what kind of cool ideas for us you may have!
So, it would be pretty easy for me to unplug Google in favour of Willow, but only if it’s easy to set Willow up in the first place. Looking at your GitHub, it’s clear that you already know the process is a little daunting right now. The “multiple devices responding” issue is also a concern, but that could be managed by using different wake words for different Willows.
If possible, I’d suggest that Willow be able to handle timers natively, instead of farming out commands to another service. That would give it standalone value (even just as a kitchen timer) and get you the instant “this is awesome” reaction when users first try it out. That’s very rewarding.
I’m not a big fan of touchscreen interfaces, but other people are. So, I’d suggest that it be able to do some of the things that are possible with the HABPanel app. Specifically:
Display any webpage URL as the default UI
This would enable users to employ UIs they’ve built in HA/openHAB/etc.
React to commands from HA/openHAB/etc.
I have an Android tablet on my wall that only turns on its display when I’m at home and awake. At all other times, openHAB turns it off since I’m obviously not looking at it.
Yes, setting up Willow is involved (to say the least). We’re currently targeting very early adopters (developers, really). Willow is roughly one month old and the initial very soft release was on Monday.
We expect to have a “click click go” setup interface ready for users in the next month or two.
For multiple Willows responding it’s actually pretty easy to solve when compared to what we had to do to get this far . It will be included in the initial release for users I mentioned because (practically speaking) it’s a requirement.
Meanwhile, I have my first ever OH install up and running with the voice interpreter API controlling some devices so Willow isn’t far away!
OH, HA, etc can be made aware of interior layout, room names/areas/zones, etc. Why address a specific Willow device with a different word when we can automatically know the area of the device you’re speaking to/near?
So, for example, if you’re in the bedroom and say “Hi Willow, turn off the lights” it turns off the lights in the bedroom because it knows you were addressing the Willow in that area. Same for other rooms. Of course you can still override with “Hi Willow, turn of the lights in the bedroom” but we believe this is truly unique functionality only we can enable with the power of systems like OH.
I think this always sounds good conceptually, but tends to break down in practice. It’s really going to be a question of how good Willow is at picking the right device to respond. If you can’t do this with 99% accuracy, then I’m always going to wonder what will happen when I say, “turn the lights on”. Will it be my office? My bedroom? My front hall? Does Willow know that I’m leaving the bedroom and going to the living room? Or going the other way? There are just so many variables.
Automation is only magical when it works so well that you don’t have to think about it. With this in mind, I generally suggest that people only automate things they want to happen 99% of the time. Any less than that, and automation can quickly turn into “the system has a mind of its own”…which is super frustrating.
This brings me back to using different wake words, which would take the guesswork out of it and guarantee that I’m targeting the right Willow.
As long as you make it possible for OH to know which Willow is sending the command, users will have a lot of flexibility to decide how they use it. I’d agree with you that this is unique functionality, but I’d be careful about overselling the magic.
Yes, conceptually I think that would be the way you would want it to work. So if I’m in the kitchen and I say “open the blinds”, it will open only the blinds in the kitchen. Similarly, if I’m in the sun room and I say “open the blinds” it’ll open only the blinds in the sun room. That would be so sweet!
We can read the amplitude of the incoming audio from the audio processing interface, drop it in a multicast packet, and only the Willow with the highest amplitude leaves wake open (all others silently stop and exit). It should be fast enough to delay waking the LCD until true source resolution (master election, if you will) without user perceptible delay.
Alexa has had this for years and it works very well. Addressing a specific device in any way is a step backwards.
Just to clarify, I’m not talking about this as a technology problem. It’s more about:
the wide variety of physical spaces and how/where devices are placed in them
the unpredictability of humans
That’s why I focused on transitioning between spaces. Those are very common scenarios (for me at least), and much harder to handle. There will be occasions when it’s not at all clear which Willow is being addressed, and it won’t be Willow’s fault at all.
Multiple Willow devices within “earshot” of each other. This is the already-been-done scenario I was describing. Let’s call it “Alexa did it first” .
More powerful platforms like OH have concepts of areas, zones, etc where thing/items are associated with an area. We would be associating a Willow with an area just like you would a light with that area.
So now that we have #1 “this is the Willow I’m closest to” worked out, if the bedroom Willow won the wake election and the speech command is ambiguous like “turn on the lights” I think it’s pretty reasonable to try to help the user out and turn on the lights in the area we know them to be in relative to any other Willow devices.
The only alternative being erroring and essentially ignore the user command they directed to you after wake. Don’t get me started on when Alexa gets confused and tries to have a conversation with you: “I’m sorry, I didn’t get that. Which light did you mean?”…
This isn’t playing catch up with Alexa anymore, this is genuinely new functionality that Alexa can’t do because she doesn’t have the power of OH.
Of course in all of this if the command isn’t ambiguous like “turn on the bedroom lights” you can say that from anywhere and that’s exactly what it will do.
The vast majority of the logic for #2 would actually be in OH so it would be completely under your control without us even having to expose any additional configuration in Willow. Don’t configure it and/or don’t use ambiguous commands.
In addition to the privacy/self-hosting aspects, this is easily my next favorite part of Willow - not imposing things on users. That’s one of the biggest issues I see with Alexa - she’s opinionated, she’s stuck in her ways, and she’s only getting worse.
Well, at least in my home quite often the wrong Alexa reacts which is not the closest one. I have to add that it only happens with Echos of different hardware, maybe because of different microphones. It’s quite annoying and the only way to stop this is to give them different signal names. I wished Alexa had a calibration function to avoid this.
So maybe this is a function where Willow could be better. So I’m really looking forward to this gadget.
That’s a really good point and a great example of one of the issues with far-field voice.
Many (all?) open source speech efforts that have come before Willow operated under the assumption that a “microphone is a microphone” and you can just throw together random hardware components (Raspberry Pi and pick a mic) and have an Alexa experience.
You say otherwise, and you’re right! There is a tremendous amount of audio and acoustic engineering that goes into these things, down to the specifications of the plastic of the enclosure, the microphones, other internal components (in the “cavity”), even the specification of the microphone holes in the enclosure come into play. It’s hard - to the point where Amazon can’t even get it quite right between different hardware revisions (I’ve still always had a very good experience).
I love the ESP BOX because get this - Espressif actually had it tested and qualified by Amazon themselves for wake word detection, audio quality, etc as an Alexa enabled device:
So when we say Alexa quality, we really mean it - and Amazon agrees!
Our goal is to best Echo+Alexa in very way possible while being open source, private, and trustworthy. So we’ll certainly try!
Since others have mentioned it I’ll list my current GA and Alexa use cases as just another data point.
Our household is maybe a little different from most here in that we have tons of Nest Home Hubs and speakers about but we almost never use them for home automation, not even the TTS announcements I brought up previously. I had them for awhile but my SO found them to be creepy.
About the only thing we do is on occasion from a home automation perspective is trigger the garage door opener from a tile driven through the Home app (though that changed on Android 13 and I never went back to fix it) or turn on/off all the lights.
Our house is not a Star Trek house. “Computer, earl grey, hot”.
But we do play music and will sync some or all the speakers in the house, especially on cleaning days. My 10-year-old watches YT videos on the kitchen screen (mostly Minecraft stuff and Mark Rober, the latter of which I highly recommend), plays his favorite music, and sends broadcast messages throughout the house. And of course to fact check his parents in real-time.
All the Hubs show the highlights from our latest and favorite photos (wish it did videos too) and show the weather and such.
In the kitchen we use the Hub as a cookbook.
The grandparents use theirs to video call us.
When my internet goes offline, the Hubs are usually the first to know and change their screen to tell us, making them excellent visual Internet status indicators.
For the most part what we love about them is they fade into the background when not needed but respond well (usually) when called upon.
I have one Alexa echo show given to me for free because Amazon ended their program that allowed our printer to order toner when it gets low directly. It now has to go through the Echo. I find the Echo to be way busier and pushier than the Hubs, showing lots of current news, pushing Echo features, and such. It definitely demands way too much attention so here it sits on my desk, turned around. The only reason I keep it is that ability of the printer to order toner has been a life saver and through the Amazon Control binding I can do minimal interaction with my thermostat and outdoor low voltage lighting which support Alexa but there is no openHAB binding.
To summarize from this I’d like to see/use Willow to:
play music based on a voice search
play videos based on a voice search
sync speakers in groups (stereo, danger though as here be patents)
adhoc internet searches
recording and sending a voice message to another device in the house or all devices in the house
rotating photo display, automatically updated (for cloudless Nextcloud integration or Photoprism integration would be cool)
weather (obtained from OH?)
video streams pushed from IP cameras in cases where an event was detected (e.g. someone rings the doorbell, show the doorbell camera feed). My account got borked when I moved from Nest accounts to Google accounts so this has never worked for me but it would be nice. OH has the IP Camera binding which could be a source for this.
a. I’d say that this implies an ability to push data to the Willow, not just receive queries from the Willow.
Would this reasoning take place on the Willow or on the openHAB side? If on the openHAB side I’d recommend HABot as the reasoner. It’s semantic model aware and supports the NLP to interpret commands like these. Willow would just need to add “in the location” to the end of what ever the person said based on the room the Willow knows it’s in. Some users are using HABot with chat (Nextcloud Talk, Telegram, etc.) to carry on two way conversations with openHAB.
“The garage door has been open for a long time”
“Please close the garage door”
"The garage door is now closed
I’ve had the same experience with Google devices, though I get a nice little toast on my phone asking if the right device answered. Not sure it changes anything. Most of the time it gets it right though.