“Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition,” says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. “The mere number of languages they are supporting is a tremendous achievement.”
Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker of the University of Ottawa’s School of Translation & Interpretation, who didn’t work on Seamless. “Languages are a reflection of cultures, and cultures have their own ways of knowing things,” she says.
When it comes to applications like medicine or law, machine translations need to be thoroughly checked by a human, she says. If not, misunderstandings can result. For example, when Google Translate was used to translate public health information about the covid-19 vaccine from the Virginia Department of Health in January 2021, it translated “not mandatory” in English into “not necessary” in Spanish, changing the whole meaning of the message.
AI models have much more examples to train on in some languages than others. This means current speech-to-speech models may be able to translate a language like Greek into English, where there may be many examples, but cannot translate from Swahili to Greek. The team behind Seamless aimed to solve this problem by pre-training the model on millions of hours of spoken audio in different languages. This pre-training allowed it to recognize general patterns in language, making it easier to process less widely spoken languages because it already had some baseline for what spoken language is supposed to sound like.
The system is open-source, which the researchers hope will encourage others to build upon its current capabilities. But some are skeptical of how useful it may be compared with available alternatives. “Google’s translation model is not as open-source as Seamless, but it’s way more responsive and fast, and it doesn’t cost anything as an academic,” says Jaiswal.
The most exciting thing about Meta’s system is that it points to the possibility of instant interpretation across languages in the not-too-distant future—like the Babel fish in Douglas Adams’ cult novel The Hitchhiker’s Guide to the Galaxy. SeamlessM4T is faster than existing models but still not instant. That said, Meta claims to have a newer version of Seamless that’s as fast as human interpreters.
“While having this kind of delayed translation is okay and useful, I think simultaneous translation will be even more useful,” says Kenny Zhu, director of the Arlington Computational Linguistics Lab at the University of Texas at Arlington, who is not affiliated with the new research.