iPhones will soon be able to imitate voices with just 15 minutes of training material
AI text to speech is nothing new. Do you want to build a youtube career with videos of Donald Trump, Joe Biden and Barack Obama hating each other while playing Minecraft? It’s possible! For now it is a requirement that the voice you wish to use is one of a famous person. AI software needs quite a bit of data to imitate a voice somewhat accurately. So as long as you didn’t have hours and hours of audio material of yourself online, you fell outside the danger zone for these kinds of jokes. If it’s up to Apple, that’s about to change. You wonder why the hell it is necessary.
Apple announces new features
Last Tuesday, Apple released a list of new accessibility features. Including a tool for both iPhones and iPads with which an AI can convert any text into audio with 15 minutes of sound. This should make it possible, for example, to type out a message during facetime, which is then read out by the software.
The target group are people who gradually lose their voice, such as people with ALS. When they lose their voice permanently, the possibility remains to continue to converse via a detour. “Accessibility is part of everything we do here at Apple,” said Sarah Herrlinger, senior director of global accessibility policies and initiatives. Quite a title. The message may sound sympathetic, but ultimately a company like Apple has only one reason to give as many people as possible access to their devices. According to Apple, the feature should be available later this year.
Concerns about AI
Let’s face it, there will certainly be people for whom this kind of functionality will improve their quality of life. And that is especially very nice for those people. Nevertheless, it is highly questionable whether it is desirable that commercially available software will soon be able to imitate a voice with so little material. It is no coincidence that more and more people from the AI world are expressing their concerns about the technology.
No one will have missed it, so-called ‘deepfakes’ are a problem. AI-generated voices, sometimes accompanied by similarly generated images, are becoming increasingly difficult to distinguish from the real thing. Scammers all over the world are watching the developments with water. Putting everything in the mouth of authority figures simply has many malicious applications. It is clear that current technologies already offer too many possibilities for this. But with the current technology, a lot of image and/or sound material is needed to ‘train’ an AI. As a result, the victims are currently mainly people who can be seen online or on TV a lot. A mostly tech savvy and affluent demographic.
Older people who are already being bullied en masse with something as simple as e-mails are even more vulnerable. Just think: if your grandmother gets a call from an AI bot asking you to transfer money to a certain account number with your voice, would she do it?
According to Apple, no security risks
According to Apple, Personal Voice, as the software will be called, uses “on-device machine learning” to keep user data secure and therefore private.” Apple does not go further than that promise in the message. Perhaps it will be announced later how the specific systems to prevent the voice profiles from falling into the hands of fraudsters work. The question is whether you want to take a company that is at the forefront of a sector where the motto is “move fast and break things” at their word.
Apple is not the only company working on similar functionality. Amazon also announced last year that it is working on software that makes it possible to generate an AI voice profile from very limited audio files. As an example of the applications, Amazon mentioned the possibility to have Alexa speak with the voice of a deceased loved one. Well…