If you just want a general comparison of the Echo vs Home, there are plenty of articles out there, and they’re a quick Google search away. I want to focus on one narrow aspect for this comparison.
After looking at a bunch of home automation setups, I decided to go with building my own Z-wave based automation and code my own automation server. I’ll document the justification and the setup more in another post.
For the sake of this post, all you need to know is that to actually integrate an Echo or Home into my home automation, I actually have to write code. For Echo, I have to use the Alexa skills API, and for google the conversation API or ifttt. In this post I’ll compare the two systems and how easy or hard it is to get your own custom commands going.
The incumbent: Amazon Echo
For the Echo, to build your own automation, you need to implement an Alexa “Skill”. While this is not too hard, it does take some doing, and for me it took some time to learn the basics of Amazon Web Services. There are also two kinds of skill APIs you can implement: the “custom” skill api, and the “smart home” skill API.
The custom skill API lets you basically capture any kind of phrase, but the main limitation is that to invoke your phrase, you have to use certain patterns that identify the skill. In my case, I named my skill Jarvis, so I have to say “Alexa, ask Jarvis to …”. This works, but often feels silly. I really wish I could just say “Alexa, ….” and have the phrase parsed by my skill’s language model and handled appropriately.
The smart home skill works differently. You implement an endpoint that can enumerate a list of devices with names, and some description of what those devices can do (for example, can you handle a “turnOn” command?). Once this is implemented, the system defines all the phrases. So if I told the API, there’s a device named “living room” and it can respond to “turnOn” and “turnOff” then I can say “Alexa, turn off living room” and the right thing gets called.
The Smart Home API has some other nice properties, for example, it can handle different kind of sentences and all map them to the same intent. I can say “Alexa, living room off” or “Alexa, turn off the living room”, and it won’t get confused.
The main limitations right now are that the types of verbs that can be used are limited. Basically it only makes sense for on/off switches, dimmers (where you would say “Alexa, set living room to 50”) and temperature controls. I recently installed some z-wave controlled window shades, and have resorted to having to say “Alexa, turn on shades” which still feels silly an non-intuitive.
One of the best things about the Alexa system for DIY folks like me is that skills can be coded up and deployed in “developer” mode. This is really intended for people who want to deploy their skills eventually to the public, but need a way to test. In developer mode, the skill gets only associated with your account, so only your account can look up and use the skill. It ends up being perfect for one-off custom development.
The Challenger: Google Home
I was pretty excited when Google announced their Home product. It offered the promise of better speech recognition (given Google’s history) and potentially a better programming platform.
At release, there was literally zero API support. You could ask it questions and it could hook up with a small set of supported smart devices, but no way to hook in and do custom stuff.
Then a few months in, Google announced the IFTTT integration. This integration is really interesting in that it is able to capture any phrase from the user.. no prefix necessary. So if I want to capture “Hey Google, eat my shorts” I could do that. You can also capture phrases with parameters, so I could specify a wild card like “eat my $” and then hook that up to a “custom web request” action to communicate the parsed result to my automation server. So far so good. In some ways it offers more flexibility than the Alexa skills. I could implement my dream of being able to say “Hey Google, let there be light!” and have it turn on all the lights.
Awesome, right? Well sort of. As I discovered, there are some key limitations to the IFTTT approach:
- While you can use text wildcards to capture parameters, these wildcards can’t come at the beginning of the phrase. So while I can capture “turn off $”, I can’t get “$ off”
- The wildcard capture mechanism is very literal.. so if you specify “turn off $” and say “turn off the shades” then the parsed parameter is “the shades”. This means that on my automation side, I have to support many kinds of variants of “shades” to really make it feel natural.
- Because the IFTTT applet has no prior sense of what are possible words that fill in the parameter, in my experience, the speech rec is actually a little worse than Alexa’s smart home skill integration. “Turn off shades” is often interpreted as “Turn off shave”. I could map “shave” to “shades” in my code, but something feels really gross about that.
- You can’t create a dynamic response. While you can include the parsed wildcard part of your phrase in the response, if your automation fails, for example, there’s no way to communicate that back to the user. This is particularly annoying in combination with #3, where sometimes it mishears what you said, and you can’t really figure out why your command actually producing the desired result.
There’s also just a general clunkiness to the IFTTT UI. It’s great for setting up simple rules, but for complicated setups with lots of variances, the UI becomes cumbersome. I need to copy and paste the same set of URI, post params, etc. between each applet for all the phrases I want to capture, and there’s no way to do it in a programmatic way, which makes maintenance a pain.
Google Actions
As Google promised when they released the Home, they later came out with an “actions” API. I looked through the docs quite a bit, but on initial analysis, it appears to be very similar to the Alexa custom skill API. It inherits the same limitation that your custom ability must be named and invoked by name.. in otherwords “Hey Google, Ask Jarvis to blah blah”. When I found this out, it was a disappointing step back from the IFTTT model where I could capture any phrase.
Google actually has some nicer UI for actually specifying the language model that your integration understands. They used api.ai’s UI and you can create a pretty cool intent parsing system and train it with lots of examples.
But then comes deployment. As of this writing, to test deploy a custom action on google, you use a command line tool, and it can only be deployed for 24 hours. WTF!? That’s right, if you wanted to build a “private” skill like you can in the Amazon system, you’d have to write some script to continually republish your skill in dev mode multiple times a day. Thanks, but no thanks.
(One interesting difference with Google is that users don’t need to “install” your skill to use it. They can just directly start saying “ask ____ ” as if it were pre-installed. Don’t know how they’re going to manage that name space though)
Because of these limitations, I haven’t actually brought myself to try to implement a google action yet.
Google has also announced “direct actions” which sound more like the smart home skill api and support all kinds of specific scenarios. These are only accessible to approved partners currently, ruling out the DIY crowd for now. If they work like Alexa’s smart home skill, then it could be pretty promising.
There’s one potential final approach that could also work. Google has a protocol called Weave, which is a standard way for smart devices to talk to the google cloud and register their presence. It was actually announced several years ago at a GoogleIO but then not much has happend since. I *think* if you actually manage to implement a device that can talk this protocol, then these devices can be controlled by Google Home.
There’s a Weave developer website but the documentation is unfortunately a bit scant. There’s a device library (libiota) which can supposedly run on linux, but its a C library, and I just don’t have the time to try to write little C programs to create a bunch of fake devices. If there was a python wrapper, maybe, but in general it feels like this bit of code has not seen much love.
The Verdict (as of Feb 2017)
For DIY home automation purposes, right now, Amazon Echo is a better choice. For a few reasons
- Easier custom skill development and deployment suited for DIY.
- Smart Home skill API can get you to a pretty usable end result with pretty good flexibility
- Devices are cheaper. Pretty easy to get a few Echo Dot’s and spread them around.
That being said, things are still early. Google will likely open up a lot more stuff going forward, but to catch up they need to:
- Fix their deployment situation so that you can permanently deploy “private” integrations
- Enhance their API so that it’s easier to register a set of devices that can handle a common set of requests and program their behavior.
- Give me some cheaper devices!
Also, note that this whole space is really early. There’s a few things that I would really like both of these platforms to be able to do, and I’m hoping these capabilities are coming.
- Be able to capture arbitrary phrases w/o having to inovked the name of the integration.
- In Amazon’s world, you could imagine having a “default” skill which would get dibs on all the things that the user is saying and get a chance to handle them before the normal system kicks in. Another option would be to be able to use the Echo as purely a microphone and speaker system and be able to connect it directly to your own engine built using the new Lex and Polly systems.
- In Google’s world, I’m not sure what this means. Maybe they can enhance the IFTTT integration to be more capable. I also saw on some community thread the devs mentioned support for “implicit” actions.. one hopes that this means you can be invoked w/o your skill name.
- Be able to recognize who is speaking, and pass that in as a parameter. I don’t care if there is a training step required. Being able to respond contextually to me or my wife or my kids would be super powerful.
- Getting the ID of the device where the request originates. This seems like a simple thing, but neither Echo or Home API’s give you this when you’re processing requests. Many of the capture devices are in fixed places in the house. It would be really nice to be able to understand that the Echo in the bedroom caught the request and to map something like “turn off the lights” to just the lights in the bedroom. (Currently there is a hack to set up each of your echo’s with a different amazon account, but that is just gross)
- More control over audio playback from an API. Right now there is no way to get either system to programmatically play a spotify track, for example. (One really hacky way I thought of is to teach Alexa to ask Google to do something by actually playing back a response with of the form “Ok Google, ….”)
In terms of general, non DIY home automation usage, I actually prefer the Google Home. It’s fun to ask it all kinds of random questions and see it answer them usefully in a surprisingly large number scenarios. The ability to also play music to groups of Google Cast devices is pretty awesome (though, if Sonos gets their shit together, I think both Alexa and Home will be equal in this regard).