At Amazon’s re:MARS 2022 conference, the company’s senior vice president and head scientist for Alexa, Rohit Prasad, announced that the company is working on a feature that would allow the digital assistant to synthesize short audio clips of an individual’s voice and turn it into longer speech., first reported by TechCrunch.
Prasad also demonstrated the feature, wherein the voice of a deceased one (a grandmother, in the demonstration) was used to read a grandson a bedtime story. “Alexa, can Grandma finish reading me the Wizard of Oz?” asked the grandson. Alexa confirmed that she can fulfil the request, and she did so in her regular voice, before switching to a more human-like tone that mimicked the said grandmother.
According to Prasad, the company was able to accomplish such significant results using just one minute of sample speech.
“This required inventions where we had to learn to produce a high-quality voice with less than a minute of recording versus hours of recording the studio,” said Prasad. “The way we made it happen is by framing the problem as a voice conversion task and not a speech generation path. We are unquestionably living in the golden era of AI, where our dreams and science.”
In essence, the goal of the technology is to make memories of loved ones live on forever. Amazon didn’t mention when such technology would be available to the public, but the company’s demonstration is sure to bring along scrutiny and concerns over audio deep fakes.