No Language Left Behind - Meta's Progress Toward Universal Translation
Written by Sue Gee   
Wednesday, 13 July 2022

Meta has made impressive progress with its No Language Left Behind project. It has already built a single model that can translate between 200 languages and has now open sourced the  AI translation tools that made this breakthrough. 

meta

When Meta announced ambitious plans to create translation software for everyone in the world earlier this year, CEO Mark Zuckerberg said:

“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,”

Microsoft, Google, Mozilla and others have been working on real-time translation from one language to another for many years and have achieved a great deal. But the problem that Meta wants to address is the fact that while commonly spoken languages like English, Mandarin, and Spanish are well catered for by current translation tools, roughly 20 percent of the world’s population do not speak languages covered by these systems. To make matters worse, some of these "under-served" languages do not have easily accessible corpuses of written text that are needed to train AI systems and sometimes have no standardized writing system at all.

To overcome these challenges Meta embarked on two projects. The first, dubbed No Language Left Behind, sets out to building AI models that can learn to translate language using fewer training examples. This will feed into the even more ambitious Universal Speech Translator project which aims to build systems that directly translate speech in real-time from one language to another without the need for a written component to serve as an intermediary.

Now Meta AI  has reported that it has built a single AI model, referred to as NLLB-200 that is capable of delivering  high-quality translations directly between 200 languages, including low-resource languages like Asturian, Luganda, Urdu and more. In this video member of the NLLB team tell us why this is important:

Details of the technical breakthrough that Meta has made is given in the 190-page paper "No Language Left Behind: Scaling Human-Centered Machine Translation" which details how Meta AI in collaboration with the Wikimedia Foundation:

developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages. 

The project's approach is summarized in this diagram from the paper:

nollbehind1

Its caption reads:

Our low-resource translation effort focuses on four cornerstones. (1) We strive to understand the low-resource translation problem from the perspective of native speakers. (2) We study how to automatically create training data to move low-resource languages towards high-resource. (3) We utilize this data to create state-of-the-art translation models. (4) We evaluate every language we aim to translate.

According to the Meta AI blog, a new evaluation dataset, FLORES-200 was created for the project. By measuring NLLB-200’s performance in each language the research was able to confirm that the translations are high quality and the NLLB-200 exceeds the previous state of the art by an average of 44%. percent.

Meta AI is open-sourcing its NLLB-200 models, the FLORES-200, model training code, and the code for re-creating the training dataset, which can be found at:   https://github.com/facebookresearch/fairseq/tree/nllb.

The blog post concludes:

A few short years ago, high-quality machine translation worked in only a handful of languages. With NLLB-200, we are closer to one day having systems that enable people to communicate with whomever they choose. We’re excited by what this unlocks in the present and what it could mean for the future as we continue to push the boundaries of machine translations. 

 

meta

More Information

200 languages within a single AI model: A breakthrough in high-quality machine translation

Related Articles

Microsoft Research Achieves Human Parity For Chinese English Translation 

Transcription On Par With Human Accuracy

Speech Recognition Milestone

Neural Networks Applied To Machine Translation

Speech Recognition Breakthrough 

Skype Translator Cracks Language Barrier  

Facebook Open Sources Natural Language Processing Model

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Rust Twice As Productive As C++
03/04/2024

Google director of engineering, Lars Bergstrom, gave a talk at the recent Rust Nation UK conference and claimed that Rust was twice as productive as C++. Given how good Google is at C++, this is quite [ ... ]



GR00T Could Be The Robot You Have Always Wanted
27/03/2024

We may not have flying cars, but we could well soon have robots that match up to predictions for the 21st century. Nvidia has announced GR00T, a cleverly named project to build robots using foundation [ ... ]


More News

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Wednesday, 13 July 2022 )