By Bahar Gholipour
Even the most natural-sounding computerized voices—whether it’s Apple’s Siri or Amazon’s Alexa—still sound like, well, computers. Montreal-based start-up Lyrebird is looking to change that with an artificially intelligent system that learns to mimic a person’s voice by analyzing speech recordings and the corresponding text transcripts as well as identifying the relationships between them. Introduced last week, Lyrebird’s speech synthesis can generate thousands of sentences per second—significantly faster than existing methods—and mimic just about any voice, an advancement that raises ethical questions about how the technology might be used and misused.
The ability to generate natural-sounding speech has long been a core challenge for computer programs that transform text into spoken words. Artificial intelligence (AI) personal assistants such as Siri, Alexa, Microsoft’s Cortana and the Google Assistant all use text-to-speech software to create a more convenient interface with their users. Those systems work by cobbling together words and phrases from prerecorded files of one particular voice. Switching to a different voice—such as having Alexa sound like a man—requires a new audio file containing every possible word the device might need to communicate with users.
Lyrebird’s system can learn the pronunciations of characters, phonemes and words in any voice by listening to hours of spoken audio. From there it can extrapolate to generate completely new sentences and even add different intonations and emotions. Key to Lyrebird’s approach are artificial neural networks—which use algorithms designed to help them function like a human brain—that rely on deep-learning techniques to transform bits of sound into speech. A neural network takes in data and learns patterns by strengthening connections between layered neuronlike units.
Continue reading by clicking the name of the source below.