Meta is working on Super Translator: This is the MMS project
Meta has presented an AI language model that is called Massively Multilingual Speech (MMS) and thus already expresses where its strengths should lie. Anyway, it’s not a ChatGPT clone.
In fact, MMS can recognize over 4,000 spoken languages and produce spoken text (text-to-speech) in over 1,100 languages. Meta remains true to its previous line and publishes the language model as open source, “so that other researchers can build on our work,” the company said in one go blog post.
According to Meta, MMS is primarily intended to “make a small contribution to preserving the incredible diversity of languages in the world”. Now it is initially just a matter of effort to train speech recognition and text-to-speech models such as MMS.
The more hours of audio training that can be fed with the accompanying transcripts, the better. But for languages that aren’t widely spoken in industrialized nations, or are even at risk of becoming extinct in the next few decades, “this data just doesn’t exist,” Meta says.
Editor’s Recommendations
Therefore, Meta took an unusual approach and took on the audio recordings of translated religious texts: “We turned to religious texts, such as the Bible, which have been translated into many different languages and whose translations have been extensively studied for text-based language translation research.”
By including recordings from the Bible and similar texts, the AI experts from Meta succeeded in increasing the number of available languages of the model to over 4,000. In the end, however, there was neither a focus on ideological formulations nor on male speakers, as might be feared in view of the source material.
Meta attributes this to the fact that “we use a connectionist temporal classification (CTC) approach, which is much more constrained compared to large language models (LLM) or sequence-to-sequence models for speech recognition.”
Specifically, Meta compared its MMS to OpenAI’s Whisper and “found that models trained on Massively Multilingual Speech data achieve half the word error rate” and also cover “11 times more languages”.
By releasing MMS for open-source research, Meta hopes to reverse the trend that technology has limited the world’s languages to those most commonly supported by big-tech applications. This is a maximum of 100.
Meta says, “We envision a world where technology has the opposite effect, encouraging people to keep their languages alive because they can access information and use technology in their preferred language.”