Amazon's Alexa Turns 5
Written by Sue Gee   
Wednesday, 06 November 2019

Amazon launched Alexa on November 6, 2014 which makes her  5 years old today. Thanks to developers, who have built more than 100,000 custom skills, Alexa is now capable of much more than playing music and providing timers and Amazon has established itself as the leader in voice-first technology.

If you ask Alexa, "Who is your creator?" you'll receive the stock response "I was invented by Amazon". True enough but it's not the entire story. The two people who are credited as being Alexa's co-creators are Toni Reid, who oversees Alexa experiences and devices, and Rohit Prasad who is in charge of the speech and machine learning aspects.

For Alexa's fith anniversary Rohit Prasad in his role as Alexa vice president and head scientist, posted Alexa at Five: Looking Back, Looking Forward based on a talk he had given at the Web Summit in Lisbon.

rohit prasad

One point that Prasad doesn't include in looking at how far Alexa has come in so short a time is that at launch the voice-controlled assistant  was limited to 13 in-built skills. Instead he makes a much more positive pitch:

In order to be magical at the launch of Echo, Alexa needed to be great at four fundamental AI tasks:

  1. Wake word detection: On the device, detect the keyword “Alexa” to get the AI’s attention;
  2. Automatic speech recognition (ASR): Upon detecting the wake word, convert audio streamed to the Amazon Web Services (AWS) cloud into words; 
  3. Natural-language understanding (NLU): Extract the meaning of the recognized words so that Alexa can take the appropriate action in response to the customer’s request; and
  4. Text-to-speech synthesis (TTS): Convert Alexa’s textual response to the customer’s request into spoken audio.

Over the past five years, we have continued to advance each of these foundational components. In both wake word and ASR, we’ve seen fourfold reductions in recognition errors. In NLU, the error reduction has been threefold — even though the range of utterances that NLU processes, and the range of actions Alexa can take, have both increased dramatically. And in listener studies ...we’ve seen an 80% reduction in the naturalness gap between Alexa’s speech and human speech.

Explaining that that the overarching strategy for Alexa's AI has been to combine machine learning (ML) with the data and computational resources afforded by AWS, he goes to itemize four specific topics used to extend deep learning: 

 

  • semi-supervised learning, or using a combination of unlabeled and labeled data to improve the ML system;
  • active learning, or the learning strategy where the ML system selects more-informative samples to receive manual labels;
  • large-scale distributed training, or parallelizing ML-based model training for efficient learning on a large corpus; and
  • context-aware modeling, or using a wide variety of information — including the type of device where a request originates, skills the customer uses or has enabled, and past requests — to improve accuracy.

 

Looking at current work on Alexa, Prasad gives examples of how improvements are being incorporated into Alexa's ability ti coreect ASR and NLU errors by self-learning, i.e. with any human intervention. Another goal is to make Alexa more natural - including enabling her to handle compound requests, such as “Alexa, turn down the lights and play music”

A new approach to making Alexa more knowledgeable is Alexa Answers, which an online interface released last month after a private beta test, that lets customers add to Alexa’s knowledge and has already furnished hundreds of thousands of new answers.

With regard to making Alexa's control of smart home devices more context aware and proactive an optional feature called Hunches already detects when lights, locks switches and plugs are not the "correct" state and suggests actions. A feature called Rotines is in the pipeline to detect and respond to patterns of user behavior. The example given in the blog post is:

If you set your alarm for 6:00 a.m. every day and on waking, you immediately ask for the weather, Alexa will suggest creating a Routine that sets the weekday alarm to 6:00 and plays the weather report as soon as the alarm goes off.

Finally Alexa's ability to respond is being improved by including the ability to carry context from from one request to another. For instance, if an Alexa customer asks, “When is The Addams Family playing at the Bijou?” and then follows up with the question “Is there a good Mexican restaurant near there?”, Alexa needs to know that “there” refers to the Bijou. 

Looking to what we can expect of Alexa in the near future Prasad notes:

We are currently testing a new deep-learning-based technology, called Alexa Conversations, with a small group of skill developers who are using it to build high-quality multiturn experiences with minimal effort. 

The example he uses of a complex task that require back-and-forth interaction is of a customer using Alexa to plan a night out requiring different skills to find a movie, a restaurant near the theater, and a ride-sharing service, coordinating times and locations.

At the moment this is clearly outside Alexa's competence. But looking at the progress made in Alexa's first 5 years it does seem entirely possible fairly soon.

 

alexa5sq

 

More Information

Alexa at Five: Looking Back, Looking Forward

Alexa, Happy Birthday!

Related Articles

We Need To Talk About Alexa

Alexa For Developers

The State Of Voice As UI

Microsoft Adds Conversational AI Agents

Over 100 Million Alexa Devices Sold

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Check Your APIs With Zuplo's Rate My OpenAPI
15/10/2024

Zuplo has launched a new suite of tools that rates the quality of your API, based on its OpenAPI specification. We put it through its paces and find it useful.



Microsoft Open Sources Drasi
18/10/2024

Microsoft has announced the open source availability of Drasi, a data processing system designed to simplify the detection of and reaction to critical events within complex event-driven infrastructure [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 06 November 2019 )