NVIDIA EditGAN – Sketch-Based Image Editing

Today, with the help of sketches, which are slightly more complex than children’s drawings, EditGAN allows you to change specific facial features (eyes, eyebrows) or even the wheels of a car in a photograph. Previously, this task was extremely difficult.

Sharing a quick overview of the development of the University of Toronto, MIT and NVidia while we start flagship Data Science course… In addition, in this article we will touch on ethical issues in the field of AI.

Demonstration


Watch this video:


Examples of EditGAN results. Image from work Ling et al., 2021, EditGAN

As a rule, to manage specific characteristics of an image, huge amounts of data are required, and also experts who must know what characteristics need to be changed in the model in order to obtain the desired image.

But EditGAN, in order to map segmentation to images, learns from just a few tagged examples, which allows you to edit images using segmentation, or in other words, using sketches. Image quality is maintained, allowing you to achieve unprecedented levels of detail. This is a very big leap forward, but even steeper is the way the method works under the hood. Let’s take a look at her!

The solution uses StyleGAN2 – the best model available today. I will not dive into its details. I will assume you have some basic knowledge of StyleGAN2: it takes an image, encodes it into a compressed subspace, and uses a type of model called a generator to transform that encoded subspace into another image. This also works when using the encoded information directly; the most important thing here is the generator.

How the GAN generator works


So, the generator takes information from the subspace (latent subspace), which contains a lot of information about our image and its features. But this space is multidimensional, it is difficult to visualize.

By the way, we wrote about this:
“Stunningly Beautiful: How to Display Dozens of Traits in Data”

Our task is to determine what part of this subspace is responsible for the reconstruction of a particular feature of the image. This is where EditGAN comes into play, which will not only tell you about the areas of responsibility of subspaces, but also allow you to edit them with a sketch that is easy to draw.

EditGAN will encode your image or simply take a certain latent code and generate a segmentation map as well as the image itself. In other words, that segmentation and images are in the same subspace, the model learns to work in this way, and this allows you to touch only the necessary characteristics, and not do something else. You just need to change the segmentation image – the rest will happen by itself.

The model is trained only on the new segmentation, and the StyleGAN for the original image remains fixed. This allows the model to associate segmentation with the same subspace that the generator needs to reconstruct the image.

Then, if the training was correct, you can simply edit this segmentation, and it will change the image accordingly!

Click to see a larger version EditGAN overview (1) and process (2-4). Image from work Ling et al., 2021, EditGAN

EditGAN basically assigns a specific class to each pixel, such as head, ear, eye, and so on, and independently manages these classes with masks that cover pixels of other classes in latent space.

Bird segmentation map. Image from work

Ling et al., 2021, EditGAN

This way, each pixel has its own label, and EditGAN will decide which label to edit and restore the image, changing only the editing area. Voila! By connecting the generated image with a segmentation map, EditGAN allows you to edit this map as you like and modify the image!

Of course, after learning from these examples, he works with images that he has not seen before. The results, as with all GANs, are limited by the type of images on which the generator was trained. You will not be able to use the network on images of cats if you have trained it on cars. Nonetheless, this is quite impressive, and I love how the researchers are trying to provide ways to intuitively work with the GAN.

“Ethics of Artificial Intelligence” (Martina Todaro)

“Ethics of Artificial Intelligence”

More recently, Norway passed a new law that prohibits advertisers and social media influencers from posting retouched advertising photos without a separate retouching message. The amendment requires the disclosure of information about edits before and after the shoot, about the use of Snapchat and Instagram filters that change a person’s appearance. According to

Vice

The edits to be reported include “enlarged lips, narrowed waists, and exaggerated muscles.” Will this become the norm in other countries as well?

I remember back in the early 2000s, photo retouching was the domain of graphic designers. And only professionals could afford this. Now this task has become easier for everyone. Thanks to Nvidia and a number of other big tech companies, photo retouching has become so easy, pervasive, and socially acceptable that people don’t even question if it’s actually useful. This is a free service, like many others, why not use it?

In a networked, consensus-based society where getting attention is important and interesting is valuable, people (not just social media leaders) have strong incentives to edit images, and the market is thriving. Of course, if this practice is so widespread and accepted, then it is quite normal to expect retouched images. But it doesn’t seem to be the case.

It has been shown how unrealistic beauty standards that are imposed by social media negatively affect users’ self-esteem.
Eating disorders, mental health problems and suicidal ideation among young people are consequences of this. [2]

This trade-off is clearly part of a broader “social dilemma”: there is a mismatch between the interests of the group and the incentives of individuals. Therefore, in my opinion, Norway is only leading this march, many other institutions should (and probably will) take action on this issue. The UK has already done so, and Europe is likely to follow.

Links

  1. Ling, H., Kreis, K., Li, D., Kim, SW, Torralba, A. and Fidler, S., 2021, May. EditGAN: High-Precision Semantic Image Editing… In Thirty-Fifth Conference on Neural Information Processing Systems.
  2. Code and interactive tools (coming soon).

You can continue studying GAN on our courses:

Find out the details here

Other professions and courses

Data Science and Machine Learning

Python, web development

Mobile development

Java and C #

From the basics to the depth

And

All courses

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *