BM has created a new AI algorithm that, five minutes after reading the interlocutor’s voice, is able to independently pronounce any text with his voice.
According to the company, the new AI algorithm is able to build dialogs in real time and adapt to various conversation styles and voice tones. The developers note that due to the synthesis of neural speech based on modular architecture, they "managed to create a realistic computer voice."
The system consists of three components: a predictor of the prosody function, a predictor of acoustic characteristics and a neural vocoder. Together, all three components allow you to accurately determine the style of the speaker, as well as adjust the pitch and energy of the speech, taking into account acoustic distortion. According to the company, only five minutes of listening to the interlocutor are enough to train a neural network.
You can find examples of the sound of the new speech synthesizer on the IBM Watson service website.