Meta's MultiModal, MultiLingual Translator

Written by Sue Gee

Tuesday, 21 January 2025

Meta has taken us a long way towards creating a Babel Fish, a tool that helps individuals translate speech between any two languages. This is thanks to SEAMLESSM4T which is open-source for non-commercial use and which Meta hopes will propel further research on inclusive speech translation technologies.

SEAMLESSM4T

SEAMLESSM4T, where M4T stands for Massively Multilingual and Multimodal Machine Translation, is a single model that supports speech-to-speech translation (currently from 101 to 36 languages), speech-to-text translation (from 101 to 96 languages), text-to-speech translation (from 96 to 36 languages), text-to-text translation (in 96 languages) and automatic speech recognition (in 96 languages). The model was originally unveiled in 2023 and, in keeping with Meta's policy of supporting open science was open-sourced under a Creative Commons licence to allow researchers and developers to build on its work. Meta also released the metadata of SeamlessAlign, its multimodal translation dataset, totaling 470,000 hours of mined speech and text alignments.

SEAMLESSM4T builds on work Meta and others have made over the years in the quest to create a universal translator. In 2022 I reported on No Language Left Behind (NLLB), a text-to-text machine translation model that supports 200 languages. Since then this has been integrated into Wikipedia as one of its translation providers. Meta had also demoed a Universal Speech Translator, which was the first direct speech-to-speech translation system for Hokkien, a language without a widely used writing system. Through this, Meta developed SpeechMatrix, the first large-scale multilingual speech-to-speech translation dataset. Meta also shared Massively Multilingual Speech, which provides automatic speech recognition, language identification, and speech synthesis technology across more than 1,100 languages. SeamlessM4T draws on findings from all of these projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources and with state-of-the-art results.

In this video, Paco Guzmán introduces the main features of SEAMLESSM4T and Sravya Pouri gives us a demonstration of "code switching", which happens when a multi-lingual speaker switches languages while they are speaking, using Hindi, Telegu and English. This showcases the model's automatic speech recognition.

This video shows translations between English and Russian in speech-to-speech, speech-to-text and text-to speech together with language recognition:

This month a paper, "Joint speech and text machine translation for up to 100 languages", authored by the SEAMLESS Communication Team, a group of 68 multi-national researchers, has been published in Nature. The paper reveals that SEAMLESSM4T

outperforms the existing state-of-the-art cascaded systems, achieving up to 8% and 23% higher BLEU (Bilingual Evaluation Understudy) scores in speech-to-text and speech-to-speech tasks, respectively. Beyond quality, when tested for robustness, our system is, on average, approximately 50% more resilient against background noise and speaker variations in speech-to-text tasks than the previous state-of-the-art systems.

The paper also outlines how the model incorporates strategies to mitigate gender bias and toxicity, ensuring more inclusive and safer translations.

One response to the paper includes:

SEAMLESSM4T represents a step forward in building inclusive and accessible systems, offering an effective bridge between cultures and languages for application in both digital and face-to-face contexts.

a conclusion with which I concur.

If you want to try the model for yourself,you can do so here:

https://seamless.metademolab.com/

You'll need a computer equipped with a camera and a microphone.

More Information

SEAMLESS Communication Team. Joint speech and text machine translation for up to 100 languages. Nature 637, 587–593 (2025).

Microsoft Research Achieves Human Parity For Chinese English Translation

Transcription On Par With Human Accuracy

Speech Recognition Milestone

Neural Networks Applied To Machine Translation

Speech Recognition Breakthrough

Skype Translator Cracks Language Barrier

Facebook Open Sources Natural Language Processing Model

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

Action Figure Craze Overruns OpenAI
13/04/2025

If you're on social media, you'll probably have seen a lot of 'action figure' posts, where people show off images of themselves, their dog or their cat in the form of an action figure, complete with s [ ... ]

+ Full Story

High Level of AI Adoption For Java Development
19/03/2025

The 2025 Java Productivity Report from Perforce reveals that Java 17 is now the most widely used JDK with Java 21 also well used, indicating a marked trend towards LTS versions. Over 80% of resp [ ... ]

+ Full Story

More News

Comments

or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 21 January 2025 )

More Information

Related Articles

Comments