|No Language Left Behind - Meta's Progress Toward Universal Translation|
|Written by Sue Gee|
|Wednesday, 13 July 2022|
Meta has made impressive progress with its No Language Left Behind project. It has already built a single model that can translate between 200 languages and has now open sourced the AI translation tools that made this breakthrough.
When Meta announced ambitious plans to create translation software for everyone in the world earlier this year, CEO Mark Zuckerberg said:
“The ability to communicate with anyone in any language — that’s a superpower people have dreamed of forever, and AI is going to deliver that within our lifetimes,”
Microsoft, Google, Mozilla and others have been working on real-time translation from one language to another for many years and have achieved a great deal. But the problem that Meta wants to address is the fact that while commonly spoken languages like English, Mandarin, and Spanish are well catered for by current translation tools, roughly 20 percent of the world’s population do not speak languages covered by these systems. To make matters worse, some of these "under-served" languages do not have easily accessible corpuses of written text that are needed to train AI systems and sometimes have no standardized writing system at all.
To overcome these challenges Meta embarked on two projects. The first, dubbed No Language Left Behind, sets out to building AI models that can learn to translate language using fewer training examples. This will feed into the even more ambitious Universal Speech Translator project which aims to build systems that directly translate speech in real-time from one language to another without the need for a written component to serve as an intermediary.
Now Meta AI has reported that it has built a single AI model, referred to as NLLB-200 that is capable of delivering high-quality translations directly between 200 languages, including low-resource languages like Asturian, Luganda, Urdu and more. In this video member of the NLLB team tell us why this is important:
Details of the technical breakthrough that Meta has made is given in the 190-page paper "No Language Left Behind: Scaling Human-Centered Machine Translation" which details how Meta AI in collaboration with the Wikimedia Foundation:
developed a conditional compute model based on Sparsely Gated Mixture of Experts that is trained on data obtained with novel and effective data mining techniques tailored for low-resource languages.
The project's approach is summarized in this diagram from the paper:
Its caption reads:
Our low-resource translation effort focuses on four cornerstones. (1) We strive to understand the low-resource translation problem from the perspective of native speakers. (2) We study how to automatically create training data to move low-resource languages towards high-resource. (3) We utilize this data to create state-of-the-art translation models. (4) We evaluate every language we aim to translate.
According to the Meta AI blog, a new evaluation dataset, FLORES-200 was created for the project. By measuring NLLB-200’s performance in each language the research was able to confirm that the translations are high quality and the NLLB-200 exceeds the previous state of the art by an average of 44%. percent.
Meta AI is open-sourcing its NLLB-200 models, the FLORES-200, model training code, and the code for re-creating the training dataset, which can be found at: https://github.com/facebookresearch/fairseq/tree/nllb.
The blog post concludes:
A few short years ago, high-quality machine translation worked in only a handful of languages. With NLLB-200, we are closer to one day having systems that enable people to communicate with whomever they choose. We’re excited by what this unlocks in the present and what it could mean for the future as we continue to push the boundaries of machine translations.
Summer SALE Kindle 9.99 Paperback $10 off!!
or email your comment to: firstname.lastname@example.org
|Last Updated ( Wednesday, 13 July 2022 )|