Meta's MultiModal, MultiLingual Translator
Written by Sue Gee   
Tuesday, 21 January 2025

Meta has taken us a long way towards creating a Babel Fish, a tool that helps individuals translate speech between any two languages. This is thanks to SEAMLESSM4T which is open-source for non-commercial use and which Meta hopes will propel further research on inclusive speech translation technologies.

SEAMLESSM4T

SEAMLESSM4T, where M4T stands for Massively Multilingual and Multimodal Machine Translation, is a single model that supports speech-to-speech translation (currently from 101 to 36 languages), speech-to-text translation (from 101 to 96 languages), text-to-speech translation (from 96 to 36 languages), text-to-text translation (in 96 languages) and automatic speech recognition (in 96 languages).  The model was originally unveiled in 2023 and, in keeping with Meta's policy of supporting open science  was open-sourced under a Creative Commons licence to allow researchers and developers to build on its work. Meta also released the metadata of SeamlessAlign, its multimodal translation dataset, totaling 470,000 hours of mined speech and text alignments.

SEAMLESSM4T builds on work Meta and others have made over the years in the quest to create a universal translator. In 2022 I reported on No Language Left Behind (NLLB), a text-to-text machine translation model that supports 200 languages. Since then this has been integrated into Wikipedia as one of its translation providers. Meta had also demoed a Universal Speech Translator, which was the first direct speech-to-speech translation system for Hokkien, a language without a widely used writing system. Through this, Meta developed SpeechMatrix, the first large-scale multilingual speech-to-speech translation dataset. Meta also shared Massively Multilingual Speech, which provides automatic speech recognition, language identification, and speech synthesis technology across more than 1,100 languages. SeamlessM4T draws on findings from all of these projects to enable a multilingual and multimodal translation experience stemming from a single model, built across a wide range of spoken data sources and with state-of-the-art results.

In this video, Paco Guzmán introduces the main features of  SEAMLESSM4T and Sravya Pouri gives us a demonstration of "code switching", which happens when a multi-lingual speaker switches languages while they are speaking, using Hindi, Telegu and English. This showcases the model's automatic speech recognition. 

This video shows translations between English and Russian in speech-to-speech, speech-to-text and text-to speech together with language recognition:

 

 

This month a paper, "Joint speech and text machine translation for up to 100 languages", authored by the SEAMLESS Communication Team, a group of 68 multi-national researchers, has been published in Nature. The paper reveals that SEAMLESSM4T

outperforms the existing state-of-the-art cascaded systems, achieving up to 8% and 23% higher BLEU (Bilingual Evaluation Understudy) scores in speech-to-text and speech-to-speech tasks, respectively. Beyond quality, when tested for robustness, our system is, on average, approximately 50% more resilient against background noise and speaker variations in speech-to-text tasks than the previous state-of-the-art systems.

The paper also outlines how the model incorporates strategies to mitigate gender bias and toxicity, ensuring more inclusive and safer translations.

One response to the paper includes:

SEAMLESSM4T represents a step forward in building inclusive and accessible systems, offering an effective bridge between cultures and languages for application in both digital and face-to-face contexts.

a conclusion with which I concur.

If you want to try the model for yourself,you can do so here:

https://seamless.metademolab.com/

You'll need a computer equipped with a camera and a microphone.  

meta

More Information

SEAMLESS Communication Team. Joint speech and text machine translation for up to 100 languages. Nature 637, 587–593 (2025). 

Related Articles

Microsoft Research Achieves Human Parity For Chinese English Translation 

Transcription On Par With Human Accuracy

Speech Recognition Milestone

Neural Networks Applied To Machine Translation

Speech Recognition Breakthrough 

Skype Translator Cracks Language Barrier  

Facebook Open Sources Natural Language Processing Model

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


Linux Foundation Launches Chromium Browser Support
10/01/2025

The Linux Foundation and Google have announced a partnership and launched an initiative in support of Chromium-based browsers.



Pico RP2350 Security Bounty Won
15/01/2025

Making hardware secure is more difficult than you might think, which is the reason I was confident that Raspberry Pi would have to pay out its $20,000 bounty offered to anyone who could break the secu [ ... ]


More News

espbook

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Tuesday, 21 January 2025 )