changing objects in video by prompt, getting a moving object with one click in Wunjo AI

Greetings to all those interested in generative neural networks, image and video generation from prompt! In this article, I want to share news about the latest update to my open source project, Wunjo AI, in which I’m reimagining how deepfakes can be created using Stable Diffusion. Let’s take a look at what changes version 1.6 brings and how Wunjo AI now allows you to easily edit videos using text queries and create masks for moving objects with one click. In addition, I will also introduce a new tool that allows you to extract objects from videos with transparent backgrounds, making them more versatile for later use, such as in design.

Getting object masks with one click

Let’s start with small innovations to the new tools. Version 1.6 introduced the ability to extract objects from video at different points in time with a transparent background. Here’s how it’s done:

  1. Open the Removal and Retouching panel and upload your media content.

  2. Click on the object you want to extract. Once you receive the object, if it meets your expectations, add it for further processing.

  3. Use the right mouse click to include an area and left click to exclude it. For example, you might try to get two different masks for two small, fast-moving objects with different timing characteristics. You can select a frame in advance or specify the desired start and end time for object extraction.

    Panel for removing objects and retouching

    Panel for removing objects and retouching

  4. In the options, you can choose whether to keep the mask, set the color, or make it transparent. There is also an option to adjust the similarity of the masks, since moving objects may be slightly different in different frames. It is also worth noting that the first time you click on an object, if you have not loaded the segmentation model in advance, it will be automatically loaded. The model ranges in size from 1 GB to 2 GB depending on your choice of CPU or GPU. This is important to consider when planning the use of resources.

    Options panel for obtaining a mask

    Options panel for obtaining a mask

  5. Note that complex objects may exhibit artifacts in some frames, but you can correct them in the next processing iteration as the mask extraction process also works on individual images. And here is our result:

    We get the result

    We get the result

Improved object retouching

Version 1.6 added a new method for removing objects from videos, which only works on the GPU. Previously, the neural network removed objects from each frame based only on its weights and an idea of ​​what should be in the place of the removed object. This often resulted in unnatural results due to noise. The new method for improved object removal introduces a new concept based on the analysis of a burst of 50 frames. The neural network analyzes this group of frames and, based on this information, tries to fill the area, including an idea of ​​what should be in place of the distant object. This new method is available in the Object Removal and Retouching panel called “Improved object removal

Options:

  • Mask Thickness: This option adds an outline to the mask, which is useful if the segmentation did not highlight the outline of the object, but it still needs to be included in the mask for removal.

  • Resizing: Since this method is resource intensive, some frames may consume a lot of resources and have to be reduced in size. The Resize option allows you to merge the original video and the enlarged area after removing objects. This is done by scaling, and this method will be more noticeable if there is a large difference in the sizes of the original video and the resulting one.

    Options

    Options

VRAM resources and video resolution limits:

  • 19 GB VRAM: 1280×1280

  • 7 GB VRAM: 720×720

  • 6 GB VRAM: 640×640

  • 2 GB VRAM: 320×320

Deleting objects

Deleting objects

These restrictions are important to consider when working with video on your device.

In update 1.6.1, I plan to further improve the results by introducing the ability to send intermediate frames in batches of 50 frames with already processed frames, which will make transitions smoother and more imperceptible.

Converting a deepfake based on text prompts

What’s the point? The idea is to use the previous approach to object segmentation to automatically create masks and then pass each object to Stable Diffusion with corresponding text prompts. In this process, objects or even the entire background can be completely redrawn based on text directions.

Video conversion panel by text request

Video conversion panel by text request

Let’s look at this with a specific example: Let’s say we have two objects moving at the same time and we want to apply the text clues “Blonde” and “Brown Jacket” to them respectively. This example uses the default model, but you can easily replace it with your own. In addition, two different pre-processors are available, one of which smooths out differences in images more, and the other brightens them. You can also adjust the number of frames between generations and options for merging masks, similar to retouching and removing objects.

Options

Options

There is also a ControlNet option. This is important because changing objects does not mean they will match the surrounding context. For example, when generating a person, we want his head not to turn in different directions, but to be fixed as in the original frames. To achieve this goal it is used ControlNet with one of two anatators: Canny and Hed.

Redrawing each frame takes a long time, and there may be differences between frames, such as if you are generating a video in Deforum. Therefore, frames are generated at a certain interval, and then they are superimposed on the main video using style transfer using Ebsynth. This allows you to make changes smoother and more invisible. Let’s look at the generation process:

The first step is to create masks for each object

The first step is to create masks for each object

The second step is generating images with interval and ControlNet

The second step is generating images with interval and ControlNet

The final result is in resolution 512x512

The final result is in resolution 512×512

It is important to note that this process is resource intensive, runs only on the GPU, and consumes a lot of VRAM. The maximum resolution depends on the available VRAM on your device, and is limited as follows:

VRAM resources and video resolution limits:

  • 24 GB VRAM: 1280×1280

  • 18 GB VRAM: 1024×1024

  • 14 GB VRAM: 768×768

  • 10 GB VRAM: 640×640

  • 8 GB VRAM: 576×576

  • 7 GB VRAM: 512×512

If you have used Stable Diffusion, you will have noticed that the quality of the image depends not only on the model, but also on the size of the generated image. For example, I only have 8 GB VRAM on my home machine, and the video I generate cannot be larger than 512×512 (the aspect ratio can be any) – this leaves its mark on the quality of the result.

You can integrate Stable Diffusion models from Hugging Face or CivitAI by adding them to your Wunjo AI framework. This allows you to more flexibly configure the generation process.

In order to add a model to the application, you need to put your model in the directory.wunjo/deepfake/diffusionopen .wunjo/deepfake/custom_diffusion.json and add the name of your model, and since you want it to appear in Wunjo AI, for example:

{    
  "revAnimated_v11.safetensors": "revAnimated_v11.safetensors" 
}

Let’s try this model in action. Let’s say you want to change the entire frame except one object. In this case, you can select the appropriate model, set a mask and enable the “pass” option in the prompt field to exclude the object from generation. Then select the “Change Background” option and enter a new text query and a negative query. Please note that the “seed” parameter allows you to repeat the results if you need to make adjustments only to part of the image, using the same seed you can get the same result for video components whose generation suits you, and change only what you don’t think it’s good enough.

Panel for converting video by text queries

Panel for converting video by text queries

Applying your Stable Diffusion model

Applying your Stable Diffusion model

It is important to note that the files for generating videos can be quite large, and if there are problems with your internet connection, loading models from Hugging Face may be unstable. For example, I have problems downloading large models from Hugging Face without using a VPN. To avoid this, you can load models manually. A table with models is also provided in the text. After downloading, the models must be placed .wunjo/deepfake/diffusionor the models will be downloaded automatically upon first launch.

Initially, the application is distributed for the Windows version for the CPU for several reasons. One of them was the author’s lack of access to the Windows platform. It is also worth considering the restrictions on the size of the created installation file of 2 GB and other technical restrictions. However, in documentationprovides instructions on how you can run an application on Windows on a graphics processing unit (GPU).

In addition, for the convenience of users, it is possible to create a portable version of the application using briefcase build. The portable version already contains all the necessary libraries and a Python interpreter, allowing users to share the application without the need to install additional components.

This update also improves the optimization of replacing a face in a video from a photo and removing objects from a video, reducing memory requirements and improving performance.

These are all the innovations in Wunjo AI that I wanted to share with you. Tell us about your opinion on whether it is necessary to develop video generation in Wunjo AI, despite the demands on resources, and I would like to know whether you would like to see in the application the ability to generate music and sounds based on text queries. Leave your comments!

And I almost forgot, link to open source project and the site from which you can download installersto install with one click and video about it.

If you have never heard of this project before, here is a video about all the functionality of Wunjo AI.

Enjoy and see you again!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *