No Language Left Behind - Meta's Progress Toward Universal Translation

Written by Sue Gee

Wednesday, 13 July 2022

Meta has made impressive progress with its No Language Left Behind project. It has already built a single model that can translate between 200 languages and has now open sourced the AI translation tools that made this breakthrough.

meta

When Meta announced ambitious plans to create translation software for everyone in the world earlier this year, CEO Mark Zuckerberg said:

“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,”

Microsoft, Google, Mozilla and others have been working on real-time translation from one language to another for many years and have achieved a great deal. But the problem that Meta wants to address is the fact that while commonly spoken languages like English, Mandarin, and Spanish are well catered for by current translation tools, roughly 20 percent of the world’s population do not speak languages covered by these systems. To make matters worse, some of these "under-served" languages do not have easily accessible corpuses of written text that are needed to train AI systems and sometimes have no standardized writing system at all.

To overcome these challenges Meta embarked on two projects. The first, dubbed No Language Left Behind, sets out to building AI models that can learn to translate language using fewer training examples. This will feed into the even more ambitious Universal Speech Translator project which aims to build systems that directly translate speech in real-time from one language to another without the need for a written component to serve as an intermediary.

Now Meta AI has reported that it has built a single AI model, referred to as NLLB-200 that is capable of delivering high-quality translations directly between 200 languages, including low-resource languages like Asturian, Luganda, Urdu and more. In this video member of the NLLB team tell us why this is important:

Details of the technical breakthrough that Meta has made is given in the 190-page paper "No Language Left Behind: Scaling Human-Centered Machine Translation" which details how Meta AI in collaboration with the Wikimedia Foundation:

developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages.

The project's approach is summarized in this diagram from the paper:

nollbehind1

Its caption reads:

Our low-resource translation effort focuses on four cornerstones. (1) We strive to understand the low-resource translation problem from the perspective of native speakers. (2) We study how to automatically create training data to move low-resource languages towards high-resource. (3) We utilize this data to create state-of-the-art translation models. (4) We evaluate every language we aim to translate.

According to the Meta AI blog, a new evaluation dataset, FLORES-200 was created for the project. By measuring NLLB-200’s performance in each language the research was able to confirm that the translations are high quality and the NLLB-200 exceeds the previous state of the art by an average of 44%. percent.

Meta AI is open-sourcing its NLLB-200 models, the FLORES-200, model training code, and the code for re-creating the training dataset, which can be found at: https://github.com/facebookresearch/fairseq/tree/nllb.

The blog post concludes:

A few short years ago, high-quality machine translation worked in only a handful of languages. With NLLB-200, we are closer to one day having systems that enable people to communicate with whomever they choose. We’re excited by what this unlocks in the present and what it could mean for the future as we continue to push the boundaries of machine translations.

More Information

200 languages within a single AI model: A breakthrough in high-quality machine translation

Microsoft Research Achieves Human Parity For Chinese English Translation

Transcription On Par With Human Accuracy

Speech Recognition Milestone

Neural Networks Applied To Machine Translation

Speech Recognition Breakthrough

Skype Translator Cracks Language Barrier

Facebook Open Sources Natural Language Processing Model

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Gemini On-Device - Generative AI For Robots
09/07/2025

In that same way Gemini can produce text, write poetry, summarize an article, write code, and generate images, it can also generate robot actions with Gemini Robotics. Now, the new Gemini Roboti [ ... ]

+ Full Story

Deno Not Giving Up Over JavaScript Trademark
01/07/2025

Deno has faced a setback in its attempt to get Oracle to relinquish the JavaScript Trademark. The US Patents Office Trademark Trial and Appeal Board (TTAB) dismissed Deno's fraud claim which is one th [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 13 July 2022 )

Recent Articles

Recent Book Reviews

Popular Articles

More Information

Related Articles

Comments