a brief history of the emergence and the nuances of the technology

Deepfake with Nicolas Cage
Deepfake with Nicolas Cage

We have already written about deepfakes. Now the technology for generating such content has reached a very high level. True, even a very well-prepared deepfake can (still) be distinguished with the help of specialized technologies. Most recently, deepfakes were made for entertainment or to annoy someone, now they are beginning to be used in the interests of various companies – television, cinema, etc. Actors no longer even need to personally voice the characters of films or cartoons – technologies allow synthesizing any words and phrases supposedly uttered by the actor himself (tonality, pitch, etc. is synthesized very accurately).

True, there is another side of the coin. In addition to the entertainment industry, deepfakes are also used to compose fake videos with famous people. These can be the same actors, politicians, entrepreneurs, or anyone else. The combination of AI technologies with CGI technologies gives a truly amazing effect and vast opportunities that can be used both for the good and for not the most legitimate purposes. How does it all work?

A bit of history

Audrey Hepburn in 2014 Dove Chocolate Commercial
Audrey Hepburn in 2014 Dove Chocolate Commercial

Technologies for synthesizing video and audio are not new. They have been developed since the late 90s of the XX century. Of course, various attempts have been made before, but we are talking about those technologies that have been continued. So, in 1997, the Video Rewrite company presented a technology that made it possible to form a video where facial articulation coincided with a synthesized audio track. Those. articulatory facial expressions were modeled, which fully corresponded to the computer-synthesized audio track.

But these were only the first attempts, which have been actively developing for two decades. Now we have access to technologies for processing voice, combining computer graphics with real video, and much more. AI is not used everywhere, but still the most realistic systems are formed on the basis of machine learning.

For quite a long time, these technologies were known only to a limited circle of specialists. But in 2009, Avatar appeared, which was a fairly convincing demonstration of such technologies. Then, in 2014, young Audrey Hepburn “starred” in a chocolate commercial – her face was transferred to the actress’s face using specialized software. In 2019, the film “The Irishman” was released, in which the faces of the actors were greatly rejuvenated with the help of artificial intelligence. True, the film was criticized, since imposing a 3D mask on the face, albeit realistic, is not all.

Be that as it may, the technology for creating deepfakes has gradually evolved and improved. But a few years ago, articulatory facial expressions and other elements of synthesized videos were set by software – in the overwhelming majority of cases it was “manual” work, and not a deepfake with real-time mode.

The emergence of “real” deepfakes

Deepfake technology simplifies the process of synthesizing images and creating soundtracks with specified parameters through the use of neural networks. They learn from hundreds or even thousands of examples of faces and voices associated with them. After that, the AI ​​shows very impressive results.

Why Deepfake? This technology got its name in 2017. A Reddit user has created some not-so-decent videos of celebrity faces. The nickname of this user was Deepfake, so they decided to use this word to refer to technologies that were essentially similar. And I must say that such technologies began to develop very actively.

In a relatively short period of time, there has been an explosive growth in software companies offering solutions for the synthetic generation of video and audio – not only people, but entire scenes. YouTube channels with Deepfakes like Shamrock and Ctrl Shift Face have gained a large following. Easy-to-use deepfake applications are common. The entertainment industry creates completely artificial characters that become famous. An example is Lil Mikela or Mikela Sousa.

There is no doubt that more and more realistic deepfakes will appear over time. Already now they do not surprise anyone, but in the near future they will become commonplace. But who is responsible for the development of technology, where is it used, how it works, and what can we expect in the future?

Modern players

Most of the largest tech players and entertainment companies are actively exploring the synthetic media industry. Amazon is striving to make Alexa’s voice more realistic, Disney is exploring how to use face swap technology in films, and hardware makers like Nvidia are pushing the boundaries of synthetic avatars, as well as services for filmmaking and television.

But there are also organizations that create technologies to distinguish fake from reality. These include, for example, Microsoft and DARPA.

By the way, most deepfake software is open source, which makes it possible for even small companies to work with deepfakes. There are a lot of projects. These are, for example, Wombo, Avatarify, FaceApp, Reface, MyHeritage and many others.

On-premise and cloud solutions offer more sophisticated technologies. Synthesia is a leading provider of synthetic avatars used for training, customer service, video production and more. Users can choose from a ready-made avatar (as in the example above) or, after a 40-minute session of collaboration with the service, make a realistic equivalent, made in their own image and likeness.

Increasingly, realistic voice generation is being used in everything: working with clients, automatic translation, etc. Examples of players from this area are Respeecher and Resemble AI. The former can change a user’s voice to that of another, control their age, or translate into another language, while the latter allows users to create deepfakes of their own voice.

How deepfakes are created

Classical computer image processing uses complex algorithms created using traditional software. These algorithms are extremely complex. As mentioned above, more recently deepfakes have been a controlled model that is controlled by developers. The overwhelming majority of the element of articulatory facial expressions was rigidly prescribed in the algorithms.

Now the situation has changed – all this can no longer be controlled so tightly, specialized software automatically combines speech and lip / face movements of the “talking head”.

However, creating a convincing deepfake requires large volumes of video, still images, voice recordings, and sometimes even a head scan of a real actor, followed by analysis as training input. For example, Synthesia clients record a pre-prepared speech for about 40 minutes on video, so that this content can then be used for training by neural networks.

But videos of celebrities, including politicians, athletes, actors, are overwhelmed on the Internet. Accordingly, based on these materials, you can create very realistic deepfakes.

Despite the very impressive results, the deepfakes generated by AI are not ideal. Deepfakes have a number of well-visible (not for humans, for specialized software) features that still make it possible to distinguish reality from fiction. These are, for example, the nuances of lighting and shadows, blinking, articulation, expression and tone of voice. All of these must be properly combined to create a convincing deepfake.

Images and Videos

Quite a part in this direction is used by Variational Autoencoder – VAE. It is a generative model that finds application in many areas of research, from the generation of new human faces to the creation of completely artificial music. As for the video, VAE allows you to quickly transfer the features of facial expressions and articulation of a certain person to the formed volumetric model. VAE has been around for a long time, but the deepfake created with this technology is easy to spot.

But since 2017, generative adversarial networks (GAN) have been developing. Here, in fact, two neural networks are combined into a single whole. One, the “discriminator”, determines the realism of the model created by another neural network. The output is a model that has been “approved” and is the most realistic.

But this is a simplified explanation. In practice, everything is more complicated – other neural networks and elements are also used. The result is really impressive. On the site This Person Does Not Exist you can see very realistic faces of people who never existed. This same technology was used to show the digital twin of Salvador Dali. The double “works” in the museum of Dali himself, telling about the details of the life and work of this famous person.

What’s next?

There are many positive aspects. Movies and cartoons can be easily translated into other languages, quickly and almost without human intervention. In this case, the voice of the translation will be original – as if the actor himself speaks another language.

There are also a lot of problems – after all, even not very high-quality deepfakes of politicians and other celebrities can be used for not very good purposes. And they will believe them, because not every person is able to understand what is in front of him – a fake or a real character. Over time, it will become more difficult to distinguish truth from fiction.

Deepfakes can generally undermine viewers’ trust in the media – after all, if you can’t tell the truth from fiction, who can you trust? But now some companies are working on technologies for detecting deepfakes. IBM has even tried to use blockchain technology for this purpose.

Deepfakes also raise many questions about who owns what content, what to do with licenses, and how to punish violators. So, already now, actors are entering into contracts with companies, allowing them to use their image and voice in commercials or films. But it is likely that some companies will use celebrity deepfakes for their own purposes without any permission.

Be that as it may, the market for technologies of this kind is still developing, there are many unoccupied niches. So all of this will continue to evolve, despite the problems mentioned above. But time will tell whether it will be possible to solve the problems themselves.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *