Hi community
I’m hoping this post will generate some discussion regarding setting up openHAB for voice control.
Over the last week I’ve been experimenting with integrating the voice/text processing capabilities from wit.ai with openHAB, and while it works for my simple set up, I’m looking for ways to make it more scalable and robust.
If you’re not already familiar with wit.ai it might be worth taking a few mins to try and get your head around how it works.
Here’s a quick run through my progress so far:
As some of you may know, when you use the voice controls in the mobile apps (or at least on Android), the app sends a transcribed version of what you said to the item “VoiceCommand”. I’ve got a rule set up so that when the VoiceCommand item is updated it sends the new data off to wit.ai to be processed and I receive a JSON formatted message back with with the probable meaning. Depending on what was said, and how you set up your wit.ai rules (intents), you may or maybe not receive things like device, location, state, amount etc. Once the response is received I try and extract the important parts from the message using the JSONPATH transformation tools.
Hopefully you’re still with me! Here’s a quick example.
-
I say “turn the light on” in the openHAB android app.
-
The VoiceCommand item is updated to “turn the light on”
-
The rule that triggers during this change sends the phrase to wit.ai and receives the response. The response looks as follows:
{
“msg_id” : “2ca95ad3-610e-4d8c-99c8-b3fc2a9d71a3”,
“_text” : “turn the light on”,
“outcomes” : [ {
“_text” : “turn the light on”,
“intent” : “command_toggle”,
“entities” : {
“device” : [ {
“suggested” : true,
“value” : “light”,
“type” : “value”
} ],
“on_off” : [ {
“value” : “on”
} ]
},
“confidence” : 0.981
} ]
}
Using JSONPATH transformation I’m able to extract the intent (“command_toggle”), device (“light”) and the state (“on”) and check how confident the service is that the response is an accurate interpretation of what I said.
- Assuming I only have one light item called “light”, I can build a rule easily to handle this to turn the light on or off, something like sendCommand(device, state).
I’m now trying to progress this set up and that’s where I’m looking for some ideas on how best to approach this. Obviously if I have more than one light and perhaps my phrase was “turn the bedroom light on” wit.ai is intelligent enough to respond with the following:
{
"msg_id" : "c8b73118-590a-4d03-8103-0b60b208a549",
"_text" : "turn the bedroom light on",
"outcomes" : [ {
"_text" : "turn the bedroom light on",
"intent" : "command_toggle",
"entities" : {
"device" : [ {
"suggested" : true,
"value" : "light",
"type" : "value"
} ],
"on_off" : [ {
"value" : "on"
} ],
"location" : [ {
"suggested" : true,
"value" : "bedroom",
"type" : "value"
} ]
},
"confidence" : 0.887
} ]
}
You can see it’s added the location. OK maybe I could change the name of my items to something like “bedroom_light” and it would work for multiple lights if I concatenate the location and item but I’m not convinced this is an elegant solution so tips/ideas on how to structure the rules or items is appreciated.
Has anyone else tried to integrate voice command processing into their setup? What are your experiences or lessons learnt so far?