Baidu, sometimes called "Chinese Google," announced the launch of a neural network, which simulates a human voice better and faster than any analogue. She studies the original sound of the voice and "clones" it, adding the necessary shades and accents, if necessary. A key feature of the novelty is the speed of analysis of acoustic data.
In 2017, the predecessor of this novelty was presented, the Baidu Deep Voice project based on AI, which required a 30-minute study of the source material to generate a new voice. The Adobe VoCo tool does this in 20 minutes, the Canadian startup Lyrebird in just a minute of processing. The new Baidu technology, which does not yet have its own name, fits in a few seconds.
The commercial potential of such an innovative development is incredibly wide, and the first thing that comes to mind, of course, is fraud and falsification of data. Cloning individuals, movements and generating video "with the participation" of a particular person, de facto, is already available, and can even be streamed. It is enough to add voice guidance and get an optimized copy of the personality, for example, to circumvent biometric identification systems.
It's quite an "animate" electronic assistant, who speaks the voice of a favorite character. Digital nurse, able to calm a child or pet with the voice of a senior member of the family. Possibility of habitual communication for a person who has lost the ability to speak, albeit temporarily. Recording audiobooks or voice-over of a text in a well-known voice without the need to bother its owner, etc.