Rest api, multiple users, voice control

Mike,

Once i saw the magic mirrors online i knew that I wanted my house A.I. in a mirror in every room. I think I’ve proven to myself that the concept is pretty doable and will work (the bigger question is why would any sane person do this?).

I have been following the Echo threads and it’s pretty exciting! I started on my project before the Echo threads matured I was not really happy with Alexa that you had to begin everything with “tell x to turn on y”. Another problem I’ve read about but don’t know if its true is that having a truly “native” app is not possible. If you want to speak to Alexa directly you have to use a push to talk feature as that is the only way to legally use the api. Obviously I cant have that with mirrors.

I do have a ton of questions about how your doing speech to intent. That is fascinating stuff to me. I may have to buy an echo and test it out. I guess i should have bought one when they were on sale back in november.

One of my biggest problem with lack of intent is my own memory. Since its a pretty dumb solution i have to remember all the commands word for word. I program in about 5 commands for each item (turn on the light, please turn on the light, turn the bedroom light on, etc) but still cant remember. lol. The worst was the first few days with the alarm clock. At 6 am and half asleep, I could not remember what i had to say to turn it off. It was “snooze” or “snooze please”. “Snooze” is a weird word to me anyways so I dont know if i want to use it. It sounds funny to me. Maybe that is why i could not remember it.lol I may just come up with a cool code work to mean “snooze”. Anyways, so now, when deciding what commands to use, I stop thinking about it for a while and leave the room. I’ll do something to take my mind off of it. Then i walk back into the room and write down first 5 commands or so i can think of off the top of my head. Just guess’s that I think would work. If they are spontaneous to me i reason that they are the first ones i will try when i cant remember. Hope that made sense.

[edit: added jasper thoughts]
I had looked at maybe using jasper but decided against it because of its current lack of support for a cheap beamforming microphone. The only two mic’s that I could see that were viable and affordabe, but also had beamforming and noise cancellation was the kinect and the echo. I ruled out the echo because of the limitations above and because you had not yet created the skill! So that left me with the kinect. To get the full benefit of the noise cancelling and beamforming features of the kinect, you have to use the kinect sdk which is windows only. Hence, i decided to use windows and the kinect . That decision made jarvis and mega-voice.command as my current best choice (again , because you had not yet created the echo skill :wink:
But wait , you say, jasper has support for the kinect. Ah yes, i did check into that but it turns out that jasper does have basic support for the kinect. However it does not support its beamforming features and treats it like a regular mic as jasper does not use the kinect sdk… Through my own tests at home, having a beamforming mic in the room was key to making it work. You just cant use a regular mic for picking up speech in a noisy room. Its what makes alexa so good at listening. So anyway, thats how I got to this point. Kinect-windows-jarvis and finally linked with openhab. [end edit]

My kids call me “skynet” now so i think its a pretty good sign that things are moving forward. lol.

Looking forward to seeing how the Alexa skill progresses. Great job! Would love to maybe have it as a go between F.R.I.D.A.Y. and openhab. Let me know when i can pepper you with long, dragged out questions.lol

Kimberly F

1 Like

Rich,

Thanks. It’s so crystal clear to me after I’ve been shown the answer! lol. Makes sense though. The more i see the answers the more the programming from long ago starts to come back. It can be frustrating as it feels like I am starting all over. As usual, thanks for the help. Hope your RFM69HW transceiver project is coming along.
Kimberly F

Kimberly

I’ve had a SmartThings hub for a while and, while it’s amusing and sort of useful (it’s gotten better with the Rule Machine app), I do want local control and (even though the Echo is cool) local STT/TTS processing.

That said, this thread has been a fascinating read! I do want to set up an openhab server on an RPi, though, so I’m kinda bummed that your whole setup is based on Windows. Have you seen this?
https://openkinect.org/wiki/Main_Page
Does it help?

I’m not much of a programmer. I can cobble stuff together at a very basic level - sometimes…

Do you have a github of your code? Or a build log of some type? I would love to see your implementation. And steal some of it!

Also - I love your magic mirror idea. I would be very interested to see your hardware/software tree that makes it work.

I enabled the AVS on a Raspberry Pi (using other people’s instructions, to be honest - look up Sam Manchin on github). It’s handicapped by Amazon because they want it to be used as a platform with which to develop complimentary products, rather than cloning the Echo in one way or another. Therefor, besides the required physical button, it also doesn’t seem to support any kind of sound or music beyond Alexa’s voice. Music services won’t work. (Even a fart app won’t work :wink: ). And some of the other things like news services seem to not work as well. A few 3rd party skills come through ok, but it’s definitely a remedial thing. Not sure if you’d ever be able to port AVS to Jarvis and an RPi.

Maybe that’ll change, but from what I’ve seen, your Alexa-HA is one of the better (if not the current best) workarounds, even if not the ideal solution.

Thanks a lot, @kfischer628!

My sentiments exactly, and one of the driving reasons I started Alexa-HA :slight_smile:

I didn’t realize this was a limitation! You are correct, Amazon basically prevents you from cloning the Echo by using Amazon Voice Services. DOH!

It pretty much revolves around the structures of speech defined for each intent. See my sample utterances for reference. These utterances are auto-generated by Alexa-HA based on the intents/rooms/items/states you want to use in your home (once its configured). Whats great about this is that its custom tailored and more accurate for figuring what you want to do based on what you actually have.

I hear ya! Would be great to have your help refining the Alexa-HA usage patterns (i.e. other ways to ask/tell the same command). Making it as intuitive as possible is certainly difficult, but its off to a fairly good start already :slight_smile:

LOL!

Thanks; anytime, Kimberly!! Please raise any questions regarding Alexa-HA on our thread here.

@jason_fay - thanks a lot! I wouldn’t call it ‘ideal’ just yet, certainly a few things need addressed. After extensive research into other Echo-based options, as well as locally hosted Linux based speech to text processing, its fairly close in my eyes :slight_smile:

Oh - I forgot to mention: Enabling the AVS on RPi does see and connect to Smart Things. All I was able to do was make it control my lights, but it was still cool. Except for having to hit the button to use it…

Good to know!
Yeah having to press a button kind of defeats the purpose… …sigh…

jason_fay

Hi ,thanks

First off, my openhab is running on a lowly rpi2. Openhab is great that it doesnt need much. Its just the voice front end that I am using windows. You could go more expensive and find one that works in a non windows environment, but those are way way out of reach of my budget.

Secondly, my feeble attempts at coding are nowhere near deserving of anything remotely close to GitHub. lol.

As far as openkinect is concerned, that is what i was talking about. From what i could find out they just use the mic as standard mic. Actually as 3 mics. No beamforming as I can tell. I think they are geared more toward spatial detection of people and things.

As far as code i’d be happy to share. Its just a bunch of rules that decide what to say when given a command. I send a string through the rest api to openhab. Then a rule takes that string and decides what to say back. Lots of string building. Brute force attack and not elegant. i am hoping in the future to take that part off of openhab and make it part of the mirror software. As Rich was saying …better to get the status from openhab and have the mirror figure out what to do/say. I am more than happy to share what i have. Its ugly but it works.lol. In the long run i dont think its the way to go. More towards what Mike is doing with the echo is where we need to be. A free standing string processor independent of the tts/stt engine is what we really need. Basically a software version of the human brain, no biggie though. Should be simple right? lol.

i too do not like the cloud base of the echo. What happens when they decide to monetize it? Not saying it will happen anytime soon but you never know. i don’t like that they will have a record of everything i say in my house. The younger generation has grown up with this invasion of privacy and finds it more tolerable than I. Aint nobody got time for that!

As far as the mirror hardware. its an old monitor and a small pc tucked behind the mirror. I am perfecting the voice part before i complete the build of the rest of the mirrors. Want to make sure it works before i build multiple. But just doing a google search on magic mirror will give you lots of examples to build with. You could also use a flat tv behind the mirror. That would take care of the speaker issue as well.

http://www.hiddentelevision.com/

As long as you connect it to the local net via wired or wireless to your openhab box your good to go. My first mirror is an intel compute stick (latest version) with an old 22 monitor i had lying around. My options are definitely more limited because of windows. Add a little amp (cheap on ebay) and a speaker and your good. I am planning on putting the speaker in the ceiling. Just make sure you have air flow for the stick and amp and monitor (mine will be flush mounted inside wall so the mirror is flat on the wall). Running a long usb cable to a usb wall jack will allow me to put the kinect on an end table or something similar. Make sure you have power. I dont want to see any cords coming out of the wall so i will wire up a recessed outlet inside the wall for the mirror to get power. So keep that in mind when deciding where to put them.

[edit: note on Intel Compute stick:

Forgot to mention, if you do decide to get an Intel Compute Stick, keep in mind that you can only use the Kinect 360. For some unknown reason, Intel decided to put 32 bit windows 10 on the stick. From my research its not possible to reload with 64 bit. If you want to use a Kinect 2.0 for windows, you need a usb 3 port (this is originally why i first got a compute stick as i wanted to steal my son’s xbox one kinect …shhhhh!). and 64 bit windows.

end edit ]

So your first step is to get openhab running. A pi will do. Then get your hardware done, switches, lights,garage door, thermostat, etc. I usually advise clients to use switches if they own or are staying for a while. Use bulbs if renting or moving soon. After that …its just figuring out your voice front end, whether its echo, jarvis, mirror, etc. I think the mirrors are pretty futuristic and doable. They can be built easily by someone with basic skills. The total cost of a mirror is more than an Echo so there is that as well. Biggest cost is the computer stick. The push to talk is a deal killer for a mirror but it does prevent the tv from opening the garage door at 2 am (i joke about it but its a real problem when a tv show turns on your a/c.). image a tv broadcast with someone shouting “alexa turn on…” . haha would be funny and terrifying at the same time. I think it happend a few weeks ago as i remember reading something about it.

As Rich mentioned above i will run what i have till i get progress on the jarvis end. I am hoping to learn from Mike and maybe steal some intent processing ideas or engines or whole programs. lol.

Thanks for the input. Hope this helps.

Kimberly

Mike,

Thanks for the response. Will definitely poke my head in there and see whats up!

Kimberly

I good place to look is at chatbots and the sorts of programs that compete in Turing Test contests. The entire reason for being of these programs it to construct responses to incoming text. Could be something there that could be lifted for use in HA.

Another place to look is some of the research on Intelligent Agents

This sort of thing has been under research and development for decades. There are certainly some things that should be usable in a HA context.

This is why I never connect my TVs to my network, be they smart or not. Visio and Samsung have both been caught recording what people are saying in the room whether the TV is on or not. 1984 anyone?

It probably only has a 32-bit processor.

Rich

Compute stick has :slight_smile:

Quad-Core Intel Atom x5-Z8300 Processor

Weird. There must be some other hardware or licensing cost reason it won’t run 64-bit Windows then.

Something to do with the boot part if i remember.

btw, the Turing stuff sound interesting.anyone ever win it yet?

Kimberly

Ah, if it is BIOS instead of UEFI that would explain it. Or if the UEFI implemented does not support secure boot then that would explain it as well. You can’t boot Win 10 64 without UEFI and secure boot.

if i remember i think that was it exactly. Some tried to fake it out and still no joy.

Kim

Hi. I guess this post is mainly aimed at kfischer as you seem to have a working version of the setup I want to try and create.

Currently I have openhab controlled by Siri on my phone. This works well, but is frustrating for a number of reasons; siri does not always hear that well, my wife doesn’t have an iPhone 6s so has to press the button to activate siri, I don’t always have my phone next to me, i cannot use certain phrases as own commands (such as play) as they are already taken by native commands.

I have an xbox 360 with kinect, and the voice control on this was always very good - it worked for us both, picked us up easily from across the room and was usually accurate in interpreting what we’d said.

My current openhab set up is on a raspberry pi, and my desktop and laptop are both macs. I would invest in an intel compute stick if I knew I had the system up and working, but I don’t have a windows 8 machine to test it on.

In essence: has your experience with LINKS controlling openhab been successful? Would you say it is satisfactory enough to be worth purchasing a compute stick (as this is the only reason I would buy one). And was it easy enough to set up?

Your thoughts would be greatly appreciated.

Hi, sorrry for the delay but dont check back in here to often. I do have links running on 2 machines in my house. One in the bedroom and one in the living room. It works very well except for false positives from the tv. I put in a few rules in openhab to disable the mic while i’m watching tv. Its hooked up to Kodi and it knows if somethign is playing or not. On pause it turns the mic back on and the turns it off again during playback. Its done through links web based http commands.

Overall i really like the links. Its works well if you go through the training in windows for the voice setup. I have written some basic scripts for it and can get weather info, quick bing searches etc. And its pretty fast as there is no cloud processing. I am a beta tester know for it and i get to add feedback to the developers for new items for home automation.
If you do voice training the kinect works pretty well. Sometimes it thinks i’ve said something when i have not but i think that happens sometimes with the echo as well.

You just need a windows pc running all the time is the downside as it only works with windows. Let me know what specifics you need and i’ll see if i can help you out.

Kimberly Fischer

Thanks for getting back to me. Very much appreciated.

It’s interesting what you said about the false positives. One of the key things I’m hoping to use it for is for voice activated play/pause whilst watching TV, using the orvibo ir blaster that I have. On the xbox it’s great to be able to say “xbox pause” to stop the program without trying to find your phone or one of the fifteen remotes. This worked really well on the xbox and I want to try and extend that functionality to my tivo and apple tv.

Is there an option to use a trigger word in LINKS to make it start listening, such as “hey siri” or “xbox” (or “computer” if you really want that Star Trek: The Next Generation vibe!) Would this reduce the number of false positives? I’ve been looking at the wiki, but most of it is blank and I couldn’t find anything on this.

And do you ever need to interact with the desktop once setup is complete? My thoughts were to install it on a compute stick so that I wouldn’t need to keep a PC or laptop always on (or have to buy one!)

Thanks again for your help on this.