A couple of years ago, I showed how we can use AI to transfer styles, such as CycleGAN, for graphics conversion one game to make it look like another, like Fortnite in PUBG. This project is still one of the most viewed, and two years later, it is still attracting new viewers. This means that there is a lot of interest in this line of AI research, but unfortunately we have not seen much progress in making such prototypes a reality yet. Although they were presented versions of this AIrunning at higher resolutions, they needed multiple GPUs for training, which is impractical for real world use.
Fortunately, after a lot of time, we finally have an article showing significant progress in trying to reduce the processing power required to train this AI. An article from the University of California at Berkeley and Adobe is titled “Contrastive Learning for unpaired image transformationth “(CUT).
Using the same dataset and the same hardware with the same GPU that I used last time, this new model allowed me to go from 256p to 400p for synthesized images. Moreover, it took me a little less than 2 hours to train the model, compared to 8+ hours last time.
CycleGAN and the Patchwise Contrastive Framework.
There is a significant difference in the amount of computing power required compared to CycleGAN. So how does this approach differ from CycleGAN? Now he uses the framework Patchwise Contrastive Learning, which requires significantly less graphics memory and computation compared to CycleGAN.
The generating network is learning to convert Fortnite image to PUBG. If you remember, in CycleGAN, we would create another network that tries to convert PUBG to Fortnite to calculate reconstruction error, and this creates a huge overhead in terms of GPU power and memory requirements.
And here we are using Contrastive Loss. First, instead of working with all images at once, this method focuses on extracting fragments from the input and output images. The task of our model here is to determine which of several input keys is a positive match with our validation fragment obtained from the synthesized image. This is called Contrastive Learning, and it allows the model to learn better feature representation through self-control.
Comparison with CycleGAN
This new approach is the reason that images synthesized using this method have sharper boundaries between objects and retain more information from the original image after transformation.
And remember, all of this also has lower GPU requirements, so this is fantastic! To learn more about the results of this article on other datasets, visit page of this project…
The translation of the material was prepared on the eve of the start of the “Computer Vision” course from OTUS. If you are interested in studying in this area, we recommend that you look open house record, in which we talk in detail about the learning process, and also invite everyone to sign up for a free demo lesson on the topic: “Computer vision in sports analytics”…