API.AI Binding, anyone working on one?

Hi all,

I’ve been looking at the API.AI service to see how I Google Home could be integrated with openHAB. Yes, I understand that one could use a service like IFTTT to integrate the two, but it is somewhat limited and for various reasons its a service that I have started to dislike.

Anyways, before spending more time researching this, I was wondering if anyone has looked into this topic or have possilby already started working on one. I thought I would throw it out there before stating working a binding. I dislike re-inventing the wheel as much as the next guy.

I understand that eclipse smarthome also have a voice-text/text-to-voice interface and I am not sure if and how that could be leveraged.

Any ideas, insights or just reasons why such a binding is a bad idea is very much welcomed.

Also, please let me know if you know about other services that allows Google Home to converse with openHab.

Best Regards,
Cato

Good question.
My perfect setting would be:

  1. select Amazon Echo as input microphone in openHAB
  2. already known strings were executed instantly
  3. unknown voice-strings were redirected to api.ai
  4. api.ai sends the most probably intent back to OH
  5. openHAB asks via Echo back - if the command has missing information

Instead of echo you could choose Google Home or any other supported microphone or smartphone.
Instead of api.ai you can choose wit.ai or any other AI-engine
Instead of Echo-output you could choose any other speaker

I know, that there are some requirements that will not work at the moment in this way!

Interesting that you posted this as it was just yesterday I was looking at the Google Home Actions page and noticed they mention Api.ai integration.

I started experimenting with Api.ai and Wit.ai back in 2015, you can find the post here.

However I stopped investigating when @Kai mentioned there was ongoing work in ESH for text-to-speech, speech-to-text and intent-to-action functionality. I have kept an eye on these things but haven’t been involved in the development and reference implementations.

Given my understanding of OH I don’t think a “binding” is the right place for them but I could see Api.ai and Wit.ai etc becoming implementations of SST and intent-to-action interfaces instead.

I’ve been looking into similar things too, I played around with sending open ended txt messages that gave me more flexibility and would love to have this hooked up to speech too. How does api.ai differ from wit which is what I was using in the past? When I was using wit I couldn’t get it to do what I want which was basic multitasking. “turn off the kitchen and bathroom light”. Would api.ai allow for this type of chained command?

I also think like Daniel that having this as a intent-to-action instance is better than built in. I would probably use mqtt or something but that might be to laggy.

I haven’t used either service in about a year so I don’t know how they’ve moved on but Api.ai allowed for the following depending on how you structured your intents:

Me: "Turn off the light"
Agent: Which room?"
Me: "Bedroom"
Agent: “OK turning off the bedroom light”

My intent required an action (on/off), room location and device (light) and it if certain bits were missing it would ask for the missing bits.

It also allowed conversation continuation so following from the above:

Me: "OK turn it back on"
Agent: “OK turning the bedroom light on”

Where it still kept the room and device but was given a new action.

Wit.ai may have caught up but I really have no idea.

Right, there was a team from the Mozilla Foundation that worked on these topics last year and where mainly @hkuhn was involved from the openHAB side as well.
The result is briefly described here: http://docs.openhab.org/concepts/audio.html
So I think for api.ai support, this would best be implemented as an “Human Language Interpreter” (HLI).
Note that ESH/openHAB2 already comes with a very basic HLI implementation - stuff like api.ai would be of course much more powerful, so it would be great if someone would start working on that.

There isn’t really much available yet to support dialogs, but the class that is intended to do this (and which would need to be extended for that) is the DialogProcessor (which is independent of any specific HLI implementation).

I hope this gives you a rough idea about where to start looking!

@Kai how do you use the DialogProcessor? I couldn’t find any console commands or anything in the REST API.

You are right, all of that is still missing - I never added that as I was still lacking a STT service that could be used for such dialogs… Feel free to create PRs to add access to the DialogProcessor.

I’ll probably need quite a bit of support given the number of classes the DialogProcessor is trying to orchestrate.

I’ve never used the audio sources - I assume this would be a logically place to start experimenting with.