fast-dreambooth. Imba for fine-tuning StableDiffusion

Hi all! For anyone looking for a starting point to dive into the world of generative neural networks, in this article I will show you how to teach StableDiffusion to generate images in your style. Or even your face.

Image preparation.

StableDiffusion can learn from your images and the first thing you need to do is to prepare high quality images. I tried to collect images of the Belgian opera singer Werner Van Mechelen

Werner Van Mechelen
Werner Van Mechelen

After that, you need to crop the images to a 512×512 square, this can be done using this service https://www.birme.net/?target_width=512&target_height=512 Upload images there, select the crop area and save as a zip file to your computer.

Birme
Birme

Then you need to rename the images as follows. The same name + (n) by this name you will make a request to the neural network to generate your image. So choose something that is not busy, so as not to confuse StableDiffusion, I use the sks token.

Image naming
Image naming

collab initialization

copy this version of dreamBooth https://colab.research.google.com/github/TheLastBen/fast-stable-diffusion/blob/main/fast-DreamBooth.ipynb#scrollTo=O3KHGKqyeJp9

And run these two cells

Next, you need to select a model and provide a token for downloading the model. Here we select the model version 1.5. You can try your luck with version 2, but I did not start training through this colab on the second version of the model, probably the fact is that it appeared just a few days ago and this is not a stable version yet. also the second version is harder to learn, so i recommend using v1.5

If you do not have an account on Hugging Face, then create it and issue yourself a token so that the colab can download the StableDiffusion model https://huggingface.co/settings/tokens

Next, we create a session, the Fast-Dreambooth/Sessions/werner-van-mechelen folder will be created on Google Drive, and we also select the Contains_faces option, since I taught men in photographs, then I select Male.

Now we launch a cell with loading pictures and drag pictures there. After the pictures are loaded, the cell will be executed.

Now you need to start training and select some options

Training_Steps Multiply the number of pictures by 200 and enter this number. This will be the number of training steps.
Resolution 512
fp16 half precision, if this option is enabled, less memory is used
enable_text_encoder_training should be left enabled, as teaching a text encoder improves the quality of the results.
Train_text_encoder_for select the percentage of how many steps the text encoder will train. The higher the percentage, the more accurately the network will repeat the input images. But due to this, the ability to “stylize” is reduced. The lower the percentage, the easier it will be to style your image.
Save_Checkpoint_Every_n_Steps I do not recommend enabling this checkbox, as it can take up a lot of space on Google Drive.

We wait until the training is completed and run the resulting model

As a result, the notepad will give a link to the following interface

I’m looking at sample queries on this site open art here you can find what images the network generates for various queries and try to find what you like. Not all images are successful, you can play around with requests and just try to execute one request several times, since each time you will receive different images.

Here are some results:

Conclusion

This notebook provides a very simple interface for training the network, and in addition, the 1.5 network is very easy to train. The training process for 20 images takes approximately 2-3 hours. According to my observations, it is easiest to train it on faces and generate different character art.

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *