|Alexa Teacher Models Outperform GPT-3|
|Written by Sue Gee|
|Wednesday, 31 August 2022|
Researchers at Amazon Alexa AI are making breakthroughs in conversational AI and natural language processing using models that learn new concepts and transfer knowledge from one language or task to another with minimal human input.
Thanks to its encoder-decoder architecture, as opposed to decoder only which characterizes other large language models, the Alexa Teacher Model outperforms GPT-3 in tasks such as summarization and machine translation.
Introducing the AlexaTM 20B, a 20-billion parameter sequence-to-sequence (seq2seq) generative language model, Saleh Soltan, a senior applied scientist with Alexa AI explains how this aligns to Alexa AI's move to the new paradigm of "generalizable intelligence", in which models can learn new concepts and transfer knowledge from one language or task to another with minimal human input. Such models allow Alexa AI researchers to efficiently develop new features and improve Alexa on multiple languages at the same time.
Solton and his colleagues are presenting a paper about the Alexa Techer Model at the forthcoming Knowledge Discovery and Data Mining Conference which shows how the 10-billion- and two-billion-parameter AlexaTM models can improve on state-of-art cross-lingual transfer learning and increase Alexa’s accuracy in different locales and have followed this up with an arXiv paper titled "AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2seq Model". The experiments reported in this paper, which use only publicly available data, show that AlexaTM 20B can not only transfer what it learns across languages but also learn new tasks from just a handful of examples, i.e. few-shot learning.
In this example, included in the paper, the model is provided with three examples of different intents, or tasks that the customer wants executed: book-restaurant, play-music, and get-weather. The model can generalize from these to the unfamiliar intent get-news-update and generate utterances corresponding to that intent in multiple languages, Spanish, French German and Hindi.
Another example in the paper shows news summarization by AlexaTM 20B when given only a single example. The input to the encoder is in the yellow box, the decoder’s output in the pink box:
Soltan states that Amazon will be releasing the model publicly for non-commercial use to aid the development and evaluation of multilingual large language models. Amazon has also implemented a function to enable loading the model on up to eight GPUs with limited GPU memory for running inference on instances of Amazon Web Services’ EC2 computation service, which he says provides a more flexible way for researchers to use AlexaTM 20B in their own work.
or email your comment to: email@example.com
|Last Updated ( Wednesday, 31 August 2022 )|