Voice Control (wit.ai) - My progress/setup + Tips/advice needed

danielwalters86 · September 23, 2015, 11:37am

@Nicholas_Waterton you mentioned you tried natural language processing but it didn’t work out well - what did you try?

Later on I’ll post my complete setup which is functional for turning items or groups on or off as required and I’m hoping to extend it to be used with other commands.

There are still things I think could be improved though (maybe writing a binding to communicate with wit.ai instead of using a script).

@Kai are we able to use the REST API through my.openhab?

Kai · September 23, 2015, 11:49am

Yes, simply at https://my.openhab.org/rest/

Nicholas_Waterton · September 23, 2015, 1:22pm

I have a server written in python that I use for general things. I was sending the stt text there, processing it and sending on to openhab.

So I tried a few python libraries (and some on-line tests). NLTK was one (I think, it was a while back). And there was a python fuzzy logic library (which actually worked the best) this was fuzzywuzzy (see here https://github.com/seatgeek/fuzzywuzzy.

I found that NLTK was way too complicated, involved a “training database” most of the time, and was very focused on the mathematics of language processing (rather than figuring out what was meant vs what was said). What you got back was a list (or dictionary) of key words - which is what I was starting with anyway.

Really all I wanted was the noun and the verb. I decided that as I had a very limited subset of targets (ie lights and such) and a limited range of actions (ON, OFF, numbers etc), all I had to do was identify the key words and ignore the rest.

I also wanted to get rid of the send/receive from python server thing (too many links makes things fragile), so I had to implement it in openhabs “limited subset of a language similar to java” which I know nothing about, and has no debugging tools. How hard could it be?

danielwalters86 · October 12, 2015, 12:38pm

My setup has progressed a bit so here’s an update…
It’s probably a bit of a wall of text so I can’t cover everything in detail but if you have any questions or queries let me know.

wit.ai:

Initially I had one wit.ai intent, “command_toggle”, shown in my earlier posts. I’ve now got three set up.

“command_toggle” - primarily this is used to turn my lights on and off but this may get replaced with a more powerful intent that can handle dimming.
“set_tv_source” - given a room location (living room, bedroom) and a source (film/movie, sky/tv/cable, playstation, off) I use this to switch on/off the appropriate tv and ensure the source is set correctly.
“set_channel” - I use this to change tv/sky channel e.g. “change sky to sky sports” or “change channel to bbc one”

openHAB:

I was struggling to find a good way to map the intent response from wit.ai back to openHAB items. After a few suggestions by forum members I decided to create a hierarchical group structure for my items. So far I’ve got one set up of groups that represent an items location and another that represents the type of item. Below is a simplified snippet from my item file that helps to explain:

// locations
Group location
Group house        (location)
Group upstairs     (house)
Group downstairs   (house)
Group bedroom      (upstairs)
Group bathroom     (upstairs)
Group living_room  (downstairs)
Group kitchen      (downstairs)

// devices
Group devices      
Group lights        (devices)
Group tvs           (devices)

// items   
String bedroom_tv         (bedroom, tvs)         // proxy item
String bedroom_light      (bedroom, lights)      // proxy item
String bedroom_lamp       (bedroom, lights)      // proxy item
String living_room_tv     (living_room, tvs)     // proxy item
String living_room_light  (living_room, lights)  // proxy item

Taking ‘bedroom’ as an example, you can see that it belongs to the ‘upstairs’ group which itself belongs to the ‘house’ group.

Using this structure I can send a command easily to items in the ‘bedroom’ group, or if I need to target a higher level I could send a command to the ‘house’ group and things in the ‘bedroom’ group would also receive the command.

Combining wit.ai with my items:

I’m still trying to refine and enhance both the wit.ai intents and my openHAB config to make this easier but for now my wit.ai intents usually contain the location and item in the response.

e.g. “turn the bedroom lights off” - wit returns the intent as “command_toggle” with location as bedroom and device as lights
e.g. “i want to watch a movie in the living room” - wit returns intent as “set_tv_source” with the location as living room and the device as movie

I’ve created a lambda function to process each different intent and these lambda functions send high level commands to the proxy items. I then have separate rule files which maps the commands sent to the proxy items to the physical item commands.

The logic in the lambda functions tries to identify which item or items are the target of the commands. In general I concatenate the location and device (e.g. bedroom and tv becomes bedroom_tv) with a underscore to create an item name. If I can get an exact match I send the command to that item. If I can’t find an exact match I come back to interrogating my group hierarchy. So if I say “turn off the bedroom lights” the location is bedroom and the device is lights I check for items that belong to both lists, from the example item file above there are two items (bedroom_light, bedroom_lamp) that fit this description so I send the command to both proxy items. Using proxy items allows me to send the same command to both but have different final commands going to the physical devices which is helpful if I want different behaviour for each or if they have different interfaces and the end commands are totally different.

Things I’m yet to do/investigate:

Try and root my S6 so I can install Xposed Google Now API so that I can use my Moto 360 mic instead of having to use my phone.
See if there’s any better way to construct/query/deconstruct JSON messages (I’m using JSONPath transformation at present)
Try and store context information that can be used to enrich the intent procesing. e.g. if openHAB knows I’m currently in the living room then can I say “i want to watch sky” instead of “i want to watch sky in the living room” (I’m not sure if this comes under presence detection and the whole presence detection debate)

frankose · November 23, 2015, 8:15pm

Hi guys,

I’m also interested in getting a voice control working with openhab, so I have read this thread with interest. I hate doing manual work, so I’m really looking for some way of automating setting up all my wit.ai rules and associated control of openhab.

For me the easiest solution is to use Python. It should be relatively easy to build a python script that is called with the recognised text as a parameter. This script can fire the text off to wit.ai, and use the resulting response to call the appropriate rest address.

As a start I have created a simple python script to build possible voice commands from all my switch items in my items files. It simply places “which on”, “switch off” in front of the item labels. This leaves me free to have whatever item names I want in my files, and have descriptive labels that are close to what I would use in real life. Hopefully wit.ai should be able to understand these descriptions, even if I do not include all the words that are present in the labels if they are long.

The commands are loaded into wit.ai, and I just have to go through them and validate them afterwards. This can be expanded with adding support for groups as well, and also for requesting values for numbers items and such

Maybe some of you have done this before, but I thought it was worth spending 30 minutes to see if I could make this work :-). As far as I can see, it does, so hopefully I get to spend some more time on it.

I am aware that the quality of the script itself is not very nice, but I chose speed ahead of beauty

import requests
import json
token ="" # insert your token here
commands = ["switch on", "switch off", "turn on", "turn off"]
session = requests.Session()
session.headers.update({'Authorization': "Bearer " +token,'Accept':'application/vnd.wit.20141022+json'})
def getAllSwitches():
	url = "http://192.168.1.2:8080/rest/items?recursive=false&type=Switch"
	response = requests.get(url)
	data = json.loads(response.text)
	print(data)
	for item in data:
		for command in commands:
			try:
				query = command + " " + item["label"]
				print(query)
				sendQuery(query)
			except KeyError:
				print("Item has no label: %s"% item)
				break


def sendQuery(query):
	data ={"q": query, "access_token": token}
	try:
		response = session.get("https://api.wit.ai/message?q=%s"% query)
		print("Response: {}".format(response.text))
	except:
		print("Connection error")
	

if __name__ == "__main__":
	getAllSwitches()
session.close()

danielwalters86 · November 23, 2015, 8:45pm

@frankose I’m really glad you posted this…

Part of the reason I’ve been a bit quiet in this thread recently is that I was looking for goods ways to “personalise” the voice engine (and then I got distracted writing some Java code for the Orbivo S20 sockets and real life priorities etc). I settled on a very similar idea to automatically populate either the intents themselves ("switch on item) for each item or to create a generic intent and populate the list of possible items from the local item file.

I’ve also spent a bit of time testing out api.ai as an alternative to wit.ai as there are a few differences in the current capabilities.

If you, or any one else, haven’t managed to knock this out by then time I’m less busy then hopefully I’ll pick up from here.

frankose · November 24, 2015, 2:50pm

I’m glad to hear that we are thinking along similar lines. I have continued working on this, and come up with the following:

I’ve created a separate group called “room” to which I add every group in my items files that I want to address as a “room”. Secondly, I created a group “device” which I add to all the device groups (lights, doors, temperatures, humidity, et cetera). Finally I added an action list (in python) that defines the actions I want to perform (switch switches, control dimmers, or get the value of any item). This list looks like this:

actions = {
"switch": {"items": ["Switch"], "expressions": ["switch", "turn"]},
"dimmer": {"items": ["Dimmer"], "expressions": ["dim", "brighten"]},
"get": {"items": ["Number", "Switch", "Dimmer", "Contact"], "expressions": ["value of", "is", "what"]}
}

I’m working on a python script takes all this information and sets up the necessary entities in wit.ai. It Is mostly working. Everything is pulled from the rest API, except for the actions list I just mentioned.

The plan is that any sentence I say to wit.ai will contain an action, a room (including “all” which is defined as a room in my items file), and a device. I guess I should be able to express this as a single intent, or do I need several intents?

Anyway, the response I get from wit.ai should contain all these elements. The room and the device specify which groups I need to search and, The action element can use the dictionary defined above to figure out which types of items it should search for. Switch and dimmer is obvious, while get is a bit more complex. Basically it should send me a notification containing the string values (label with inserted value) of everything that matches the request. With some tasker voodoo should be possible to have my android devices read the responses aloud

This means that I should be able to control any kind of device in any room, and get their current value. Does not allow individual control of items, for instance if there are multiple lights within a room. I think it should be possible to build this in quite easily, but I am not quite there yet. Apart from defining the rooms and devices groups in the items file this require no modifications of item definitions assuming you have a relatively logical structure :-). The python script should also add the necessary intent as well as training sentences, so the only work required would be to validate the results.

For the openhab side it should be as easy as calling the script with the recognised text, and the script will control everything using rest.

Does anyone know if it is possible to use the rest API to trigger my.openhab notifications?

If anyone is interested I guess I could put this up on gitthub once I get the basic control functionality working.

moutemoute38 · November 24, 2015, 3:58pm

Hi guys,

I also made a rule to control openhab with the voice (mostly using HABdroid).
It’s quite the same as a few made here, but with the addition of transform file maps.
That makes it quite easy and fast to fit it for your needs. It’s just a small part of your project but it might help.

The files and explanations can be found here.

Hope it helps.

robconnolly · November 24, 2015, 9:19pm

Wow, I like that. I’ve been thinking of implementing voice control for a while and was probably going to go the Wit.ai route, but this seems easier and more flexible. I like that it’s a wholly local solution too.

Fro · November 25, 2015, 12:53am

Can You explain step by step that rule for beginners ( like me ) - maybe in new post. How I must layout Items, Group etc. I look for voice engine, and Yours Work look for me the easiest way.

frankose · November 25, 2015, 11:24am

Your rule looks like it very good option for quickly getting was control up and running.

Do you have any experience regarding the tolerance your rule has for different ways of expressing the intent? Guess this is the one thing in favour of a wit.ai approach, but it should be possible to point usually support both “switch on the living room light”, and “turn on the light in the living room” without having to do any complex parsing in your own application. Still, your approach is currently much easier to configure than what I currently have for wit.ai.

As a comment to my own post above, I think it would be wise to replace the “action” concept with using separate intents for each action and separate them based on this. I suspect this is easier for the wit.ai engine.

danielwalters86 · November 25, 2015, 12:35pm

@frankose it might be worth taking a look at the api.ai service as it (at least to me) the API is a bit friendlier.

Re what you said about separate intents for each action, that’s the direction I’m headed in. At a high level, I plan to create generic intents (maybe one per item type e.g. switch, dimmer but need more thought) and then pass in a list of context information such as list of devices, rooms with each query so that whenever the engine resolves the query it does so using the user’s specific context data.

Here’s a (pseudocode) example:

Intent: Switch on @rooms @devices

Query: Switch on kitchen lights
Context: {rooms: {kitchen, bedroom, living}, devices: {lights, heating}}}

Response: {action: switch_on, room: kitchen, device: lights}

frankose · November 25, 2015, 1:06pm

Thanks, I will take a look at it, although the API itself is not a big issue. The major drawback I am seeing at the moment is that it is not possible to apply the training through the API. This has to be done using the tedious method of highlighting elements of the test phrases and categorising them as different intents using the web interface. This is not a very scalable or shareable solution.

For the moment I’m looking at one intent for switches, one for dimmers, and one for querying the value of any item. The last one is not a high priority at the moment.

However, unless I find a good method of training without having to spend hours using the web interface I might just revert to the simple yet powerful rules posted recently.

danielwalters86 · November 25, 2015, 2:01pm

I would say this is a drawback of the wit.ai API. In contract the api.ai API allows you to create intents (with variable placeholders!) by sending a JSON POST message so you don’t need to do all that highlighting test phrases.

It definitely seems a lot more receptive to programmatically setting up/customising the engine.

Below is an example API call for creating intents:

POST https://api.api.ai/v1/intents?v=20150910

Headers:
Authorization: Bearer YOUR_ACCESS_TOKEN
ocp-apim-subscription-key: YOUR_SUBSCRIPTION_KEY
Content-Type: application/json; charset=utf-8

POST body:
{
“name”: “turn on/off @appliance”,
“contexts”: ,
“templates”: [
“turn @onOff @appliance”,
“set @appliance @onOff”
],
“responses”: [
{
“action”: “setAppliance”,
“affectedContexts”: [
“house”
],
“parameters”: [
{
“name”: “state”,
“value”: “@onOff”
},
{
“name”: “appliance”,
“value”: “@appliance”
}
]
}
]
}

frankose · November 25, 2015, 2:10pm

I agree, but looking at the examples in the API it looked to not be as flexible and adaptive as wit.ai? Does it have the same kind of learning behaviour and the ability to extrapolate to similar utterances?

I guess my question is whether it is worth it to define the intents and templates (a template seems more strict than the wit.ai intent concept) rather than using the simple rule-based approach posted earlier? I guess it will be easier to expand to more complex scenarios, but I guess most people’s needs is just simple was control of devices?

It should be quite easy to adapt make existing entity and intent building code to API.AI instead, so maybe I will try. But I’m not entirely convinced that it is worth the hassle.

danielwalters86 · November 25, 2015, 2:24pm

Yes, you can use the web UI to feed it learning data. As to which engine is better in working out similar utterances it’s hard to say. When I was doing some investigation they seemed about the same personally but I guess depending on use one will be better.

I’d very much urge you to try it out and see how you get on. One of the plus points I really appreciated is context (which when configured stays on for 5 queries by default but this can be changed/deleted).

Example conversation:

Query: “Switch on the bedroom lamp.”

Response: {room: bedroom, device: lamp, state: on, context: {room: bedroom, device: lamp}}

Query: “OK I’m done, switch it off.”

Response: {room: bedroom, device: lamp, state: off, context:{room:bedroom, device: lamp}} // you don’t need to mention the room again or the device as this hasn’t changed and is still in the context.

frankose · November 25, 2015, 3:20pm

Thanks, I’ll give it a shot. I wonder if it is worthwhile to try to use their existing smart home domain, and just do some mapping in my own scripts…

danielwalters86 · November 25, 2015, 3:46pm

Unfortunately I didn’t find the existing domains flexible enough as you can’t customise those intents. I don’t think I tried passing in custom entities lists as part of the query though, if it works that might get around so of the limitations I found.

frankose · November 26, 2015, 1:42pm

I have played with this for a few hours now, and it seems to work pretty well. I have set up two Intents for controlling OH2. One deals with a regular light switching, and the other one controls dimmers. The control group based as discussed earlier.

I set up a small python script that gets the spoken text from openhab and controls it in turn through rest based on the response from api.ai.

This has allowed me to successfully control all my light switches and dimmers in all the rooms in my house, and I’m very happy

Step is to achieve more granular control (I’m not exactly sure how to do that), and set up an intent for querying the state for any combination of room and device (e.g. the temperature in the kitchen). Feedback should be provided by through a notification to myself, I am also working on getting it blasted through my Sonos system via the Google translate service.

All in all, good times

danielwalters86 · November 26, 2015, 3:05pm

Glad to hear it went well!

Would you mind sharing your python script or at least the logic behind it? Would love to hear your take on some of the things I tried.