Alexa Teacher Models Outperform GPT-3

Written by Sue Gee

Wednesday, 31 August 2022

Researchers at Amazon Alexa AI are making breakthroughs in conversational AI and natural language processing using models that learn new concepts and transfer knowledge from one language or task to another with minimal human input.

Thanks to its encoder-decoder architecture, as opposed to decoder only which characterizes other large language models, the Alexa Teacher Model outperforms GPT-3 in tasks such as summarization and machine translation.

Introducing the AlexaTM 20B, a 20-billion parameter sequence-to-sequence (seq2seq) generative language model, Saleh Soltan, a senior applied scientist with Alexa AI explains how this aligns to Alexa AI's move to the new paradigm of "generalizable intelligence", in which models can learn new concepts and transfer knowledge from one language or task to another with minimal human input. Such models allow Alexa AI researchers to efficiently develop new features and improve Alexa on multiple languages at the same time.

Solton and his colleagues are presenting a paper about the Alexa Techer Model at the forthcoming Knowledge Discovery and Data Mining Conference which shows how the 10-billion- and two-billion-parameter AlexaTM models can improve on state-of-art cross-lingual transfer learning and increase Alexa’s accuracy in different locales and have followed this up with an arXiv paper titled "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2seq Model". The experiments reported in this paper, which use only publicly available data, show that AlexaTM 20B can not only transfer what it learns across languages but also learn new tasks from just a handful of examples, i.e. few-shot learning.

AlexaTM20B intent

In this example, included in the paper, the model is provided with three examples of different intents, or tasks that the customer wants executed: book-restaurant, play-music, and get-weather. The model can generalize from these to the unfamiliar intent get-news-update and generate utterances corresponding to that intent in multiple languages, Spanish, French German and Hindi.

Another example in the paper shows news summarization by AlexaTM 20B when given only a single example. The input to the encoder is in the yellow box, the decoder’s output in the pink box:

AlexaTM20B summary

Soltan states that Amazon will be releasing the model publicly for non-commercial use to aid the development and evaluation of multilingual large language models. Amazon has also implemented a function to enable loading the model on up to eight GPUs with limited GPU memory for running inference on instances of Amazon Web Services’ EC2 computation service, which he says provides a more flexible way for researchers to use AlexaTM 20B in their own work.

More Information

20B-parameter Alexa model sets new marks in few-shot learning

Amazon Invests In Conversational AI

The Unreasonable Effectiveness Of GPT-3

Alexa Prize SocialBot Grand Challenge 5

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Two Tools To Elevate Your MongoDB Experience
03/07/2025

The tools contradict each other; the first one allows you to write SQL instead of using Mongo's special syntax, while the other allows you to manipulate the database without having to write SQL a [ ... ]

+ Full Story

Student’s Robot Smashes 4x4 Rubik’s Cube World Record
13/06/2025

Matt Pidden, a computer science student at the University of Bristol, UK, has broken the world record for solving a 4x4 Rubik's Cube using a robot he designed, built and trained in just 15 weeks.

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 31 August 2022 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments