Russian specialists from the Samsung AI Center-Moscow Artificial Intelligence Center, in collaboration with engineers from the Skolkovo Institute of Science and Technology, developed a system capable of creating realistic animated images of people's faces based on just a few static human frames. Usually, in this case, the use of large databases of images is required, but in the example presented by the developers, the system was trained to create an animated image of a person’s face from just eight static frames, and in some cases one was enough. More details on the development are reported in an article published in the online repository of ArXiv.org.
As a rule, it is rather difficult to reproduce a photorealistic personalized module of a human face due to the high photometric, geometric and kinematic complexity of the reproduction of a human head. This is explained not only by the complexity of modeling the face as a whole (there are a large number of modeling approaches for this), but also by the complexity of modeling certain features: oral cavity, hair, and so on. The second complicating factor is our predisposition to catch even minor flaws in the finished model of human heads. This low tolerance for modeling errors explains the current prevalence of non-photorealistic avatars used in newsgroups.
According to the authors, the system, called Fewshot learning, is capable of creating very realistic models of talking heads of people and even portrait pictures. The algorithms produce a synthesis of the image of the head of the same person with the lines of the landmark face, taken from another video fragment, or using landmarks of the face of another person. Developers used an extensive celebrity video database as a source of training material for the system. To get the most accurate “talking head”, the system needs to use more than 32 images.
To create more realistic animated face images, developers used previous developments in generative-competitive modeling (GAN, where the neural network thinks about the details of the image, actually becoming an artist), as well as the machine meta-learning approach, where each element of the system is trained and designed to solve specific task.