Creating animated videos using the Stable Diffusion neural network, step-by-step guide

Friends, hello everyone, even from me you have heard more than once that you can now do literally everything in neural networks: create texts, generate music, make deepfakes, create images and even videos. Previously, it was possible to generate videos only in large paid projects like Gen-2 from Runway or in the still free discord bot Pika Labs. Local solutions were poor or required top-end video cards.

But times are changing and the community has created an open and accessible solution that allows you to create beautiful animated videos right in our favorite Stable Diffusion in the Automatic 1111 interface. This solution will be discussed today.

What we need:

  1. Neural network Stable Diffusion in the interface Automatic 1111 version 1.6 or older;

  2. Extension AnimateDiff;

  3. My favorite model of the 1.5 family epiCRealism and a suitable prompt.

In this guide I will not focus on installing the Automatic 1111; it is described in detail on github pageor watch the first 5 minutes of my video Clean installation of the Stable Diffusion neural network in the Automatic 1111 interface, but for now you don’t have to download the SDXL model, today we will talk only about 1.5 models and the lore of movements for them. I’ll talk about motion models for SDXL next time, so don’t forget to rate this post and leave a comment so I can see that the topic is really interesting.

Installing the extension

Automatic 1111 is installed and ready for use. All that remains is to install the extension which will allow us to create animated generations. The extension is called AnimateDiffthis is the port of the same name research projectthere you can also download a version separate from Automatic, but it has much less features and ease of use.

To install the extension, go to the tab Extensionsthen Install from URL and just paste the link into the first field https://github.com/continue-revolution/sd-webui-animatediff and click Install. We wait for the installation to complete and reboot the automation.

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

But that’s not all, now you need to download motion models, the easiest way to do this is with google drive. Download mm_sd_v15_v2.ckpt and the whole folder MotionLoRA. Model mm_sd_v15_v2.ckpt post here stable-diffusion-webui\extensions\sd-webui-animatediff\modeland as usual we put the lore in stable-diffusion-webui\models\Lorayou can right in the folder MotionLoRAit’s even more convenient, we’ll return to the lore later.

Almost everything, all that remains is to go to Settings/Optimization and activate the checkbox Pad prompt/negative prompt to be the same length. And click Apply Settings. But most likely you will do this when you re-read the guide for the second time, not understanding why your animation consists of two different pieces in one GIF.

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

Hurray, everything is ready, we should have an accordion with a title AnimateDiff. If it doesn’t appear, try rebooting the automation again. And if it appears, then copy any request from model pages which you are using, set the settings as in my screenshot below and start the generation.

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

I will create an animation according to my own prompt, this is the mascot character of one of my projects, so the same girl named Jenna will accompany us throughout the guide. This is what happened to me.

How it works

A model specially trained on video is integrated into an existing Stable Diffusion 1.5 model during generation and thus sets the direction of movement according to the data on which it was trained. ControlNet works in a similar way. This allows you to use any Stable Diffusion 1.5 models created by the community as the basis for your video. This technique will work with both anime and photorealistic models; below you will see an example with the same prompt as the girl on the beach, but on a different model.

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

The secret of this technology is very simple, the main achievement, which is also the main curse, is that all frames are generated on “one sheet”, this allows you to achieve consistency in the video, because all frames are created from one frozen latent space. But this also imposes performance limitations, because this single canvas takes up a lot of video memory. If you have less than 6GB of video memory, it’s not even worth trying. Optimally 12GB.

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

This also affects the work of the seed, if you thought that you would first generate a picture, find a cool seed, and then enable AnimateDiff and the picture would simply become animated, then I have to disappoint you, with the extension enabled the result will be completely different. Your seed will be used to build the entire large canvas with all the frames and this will give a completely different result. So we will have to select a seed randomly, simply by generating it until we find the right one.

Extension settings

I won’t talk now about all the settings and functions, such as video stylization, or automatically changing prompts to make analogues of loopbacks, these are all topics for separate large guides. Now I will tell you only about the most basic and useful ones.

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

I think the model selector is clear without words, but let’s look at the formats in more detail, MP4 the most convenient format for further use, and it will be saved in the best quality, but it is not displayed in the browser for preview, and GIF is displayed, so I generate both usually.

Also note that png info (generation information) is not written in formats other than png, which means that if you don’t check the box for png or txt, you simply won’t be able to restore generation information later. I recommend saving the final generation separately each time as a png.

Number of frames – how many frames we will generate, the more – the longer. In addition, longer videos turn out to be “torn”. Optimally 16.

Context batch size – the size of the generation context window depends on the model. In our example, the model was trained on videos of 16 frames and generates them best, it’s fine if it’s less, it’s disgusting if it’s more.

FPS – number of frames per second, optimally 8-16, can be interpolated in other programs later.

Closed loop – video loop mode, N – none, R.P. – subtraction mode, the extension will try to subtract frames and loop the video; R+P add mode, the extension will try to add frames to loop the video; A – aggressive mode, the extension will try to forcefully loop the video, sometimes it leads to artifacts, and sometimes to good results as below, here the stitching is almost invisible.

All other settings are of little interest to us or are used in other operating modes. Let’s move on to the most delicious!

Motion Laura

Probably one of the coolest things is special movement lore that is only compatible with that very model mm_sd_v15_v2.ckptthey simply don’t work with others.

Each of the 8 lore is trained for its own type of movement: Zoom In – approach; Zoom Out – distance; Pan Left – shift to the left; Pan Right – shift to the right; Pan Up – upward shift; Pan Down – downward displacement; Rolling Anti-Clockwise – counterclockwise rotation; Rolling Clockwise – clockwise rotation. Using these lore, you can realize almost any director’s idea.

Connecting ENT

Connecting lore into prompt motion is as easy as regular ones, open the Lora tab, and just click on the lore with the desired effect. It is recommended to set the weight to 0.7; at a higher weight there will be more artifacts, but you are free to experiment with this. Let’s see how the lore works using the example with the girl and the fireworks from the beginning of the publication. I won’t change the request, I’ll just connect different lore for movement one by one. Don’t forget to switch Closed loop to mode N, to get more movement when using lor.

Zoom In – zoom in

Zoom Out – moving away

Pan Down – shift down

Pan Up – shift up

Pan Left – shift to the left

Pan Right – shift to the right

Rolling Anti-Clockwise – counterclockwise rotation

Rolling Clockwise – clockwise rotation

The coolest ones of course are zooming in and out, some spoil the composition and you have to play with the weight, because the shift to the left is noticeably less than the shift to the right at the same weight, but overall this is a fantastic tool for creativity, now available locally.

Secrets and tips

Use simple prompts. Motion model trained on video from a data set WebVid-10MI will give a couple of examples of video-description pairs so that you have an idea.

Lonely beautiful woman sitting on the tent looking outside. wind on the hair and camping on the beach near the colors of water and shore. freedom and alternative tiny house for traveler lady drinking.

Billiards, concentrated young woman playing in club

And just like that, here’s a whole data set, some videos with a quick change of frames and a one-sentence description, and some almost static with a lot of artistic description. So if you thought that it would be possible to set, for example, the specific movement of the car in your video, or some other more specific values, then most likely you will not succeed. Use more “artistic descriptions”.

Do not exceed 16 frames. The model was trained on a video of 16 frames, so despite all the crutches, longer videos will still twitch.

Do not use multiple lores at the same time. Each lore adds more noise and as a result interferes with the operation of the motion model, the same applies to the controlnet, it can be used, but each layer will reduce the movement.

If a shutterstock logo appears on your video, then work on the negative prompt or change the negative embedding. This is especially noticeable when using motion lore, and it is not always possible to fully resolve it at the generation stage.

Use Topaz Video AI. All videos in the post were run with the following settings:

Creating animated videos in the Stable Diffusion neural network, step-by-step guide Artificial intelligence, Neural networks, Stable Diffusion, Neural network art, Guide, Video editing, TikTok, YouTube, Video, No sound, Long post

Everything you love works! You can use ControlNet, and Hires with this extension. fix, but I don’t recommend setting the magnification to more than 1.3-1.5, even on powerful video cards. You can use ADetailer, but you should not do this, because there will be unwanted movement in the video even with low denomination. You can use any facewap and do something like this with your face:

And of course, experiment with the models, not all work equally well, some have a strange gray haze, but most of the top models from Civita performed well.

Friends, that’s all for me, from this guide you learned how to create short animated videos from your generations, understood how the technology works, got acquainted with motion lore and now you can create a small short film yourself, or even something cooler. Share your results in the comments and in our chat of neuro-operators.

I talk more about neural networks on my website YouTubeV telegramon Busti, I will be glad for your subscription and support. See you on streams, the next one is on Monday, subscribe so you don’t miss it.

Hugged everyone, Ilya – Neural Dreaming.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *