Uncategorized

Meta is working on Super Translator: This is the MMS project

Meta has presented an AI language model that is called Massively Multilingual Speech (MMS) and thus already expresses where its strengths should lie. Anyway, it’s not a ChatGPT clone.




MMS released as open source

In fact, MMS can recognize over 4,000 spoken languages ​​and produce spoken text (text-to-speech) in over 1,100 languages. Meta remains true to its previous line and publishes the language model as open source, “so that other researchers can build on our work,” the company said in one go blog post.

Meta’s MMS AI is said to be able to recognize over 4,000 languages ​​(represented by green dots) and transcribe 1,107 languages ​​(purple triangles). (Image: Meta)

According to Meta, MMS is primarily intended to “make a small contribution to preserving the incredible diversity of languages ​​in the world”. Now it is initially just a matter of effort to train speech recognition and text-to-speech models such as MMS.

The more hours of audio training that can be fed with the accompanying transcripts, the better. But for languages ​​that aren’t widely spoken in industrialized nations, or are even at risk of becoming extinct in the next few decades, “this data just doesn’t exist,” Meta says.




Unusual training texts from the Bible and Co

Therefore, Meta took an unusual approach and took on the audio recordings of translated religious texts: “We turned to religious texts, such as the Bible, which have been translated into many different languages ​​and whose translations have been extensively studied for text-based language translation research.”

By including recordings from the Bible and similar texts, the AI ​​experts from Meta succeeded in increasing the number of available languages ​​of the model to over 4,000. In the end, however, there was neither a focus on ideological formulations nor on male speakers, as might be feared in view of the source material.




Special training approach eliminates sources of error

Meta attributes this to the fact that “we use a connectionist temporal classification (CTC) approach, which is much more constrained compared to large language models (LLM) or sequence-to-sequence models for speech recognition.”

Specifically, Meta compared its MMS to OpenAI’s Whisper and “found that models trained on Massively Multilingual Speech data achieve half the word error rate” and also cover “11 times more languages”.

By releasing MMS for open-source research, Meta hopes to reverse the trend that technology has limited the world’s languages ​​to those most commonly supported by big-tech applications. This is a maximum of 100.

Meta says, “We envision a world where technology has the opposite effect, encouraging people to keep their languages ​​alive because they can access information and use technology in their preferred language.”

Almost finished!

Please click on the link in the confirmation email to complete your registration.

Would you like more information about the newsletter? Find out more now

Leave a Reply

Your email address will not be published. Required fields are marked *