!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py
%pip install -qq git+https://github.com/ShivamShrirao/diffusers
%pip install -q -U --pre triton
%pip install -q accelerate transformers ftfy bitsandbytes==0.35.0 gradio natsort safetensors xformers
Stable Difussion Fine-tuning with Dreambooth
Imagine the scenario where we need to generate images that resemble a specific person’s face, but that face is not included in the model’s training data, we would rely on the model’s ability to generalize from its learned representations. fine-tuning is a great approach for this scenario.
The model, after being trained on a diverse set of faces, should have grasped the underlying patterns and features common to human faces. A representation of a specific person’s face, such as a sketch or a description, would be inputted, and the model would generate a new face that aligns with the input as closely as possible.
Fine-tuning is a technique used to train a custom model based on existing models, enabling the generation of custom images.
For example, personal photos can be added to the model, allowing it to generate unique images in various scenarios such as mountains, forests, streets, and so on.
Methods of Fine Tuning
There are several methods to apply fine tuning:
- Additional Training
This involves training a base model with an additional dataset. For instance, you can train stable diffusion with an additional old car dataset to orient the aesthetics of the cars to that specific type.
- Dreambooth Algorithm
Initially developed by Google, this technique allows for injecting custom subjects into the models. Due to its architecture, it is possible to achieve great results using only 3 or 5 custom images.
In this course, the focus is on using the Dreambooth algorithm for fine-tuning the Stable Diffusion model.
Implementation
In the implementation, we’ll use an image of a name in his 30s that we’ll call John. This man was generated using Stable Diffusion to avoid copyright infringement. You can simply use a picture of yourself if you want to try things out.
Installing the libraries
Loading the model
= "runwayml/stable-diffusion-v1-5"
model_sd = "/content/stable_diffusion_weights/john" output_dir
!mkdir -p $output_dir
Training
Dreambooth training requires a unique identifier, the class name, and images of the subject to be inserted. The images form the dataset. The unique identifier needs to be a term associated with no concept or feature recognized by the model. The class is the type of object you want to generate.
Three components are needed:
- Unique Identifier
This is a unique name that does not exist in the model. In our case, we will use john.
- Class Name
This is the type of object that will be generated. In our case, we will generate faces of people.
- images
These are the training datasets. In our case, we have uploaded ten images.
Instance prompt a photo of [unique identifier] [class name]
Class prompt > a photo of [class name]
The instance prompt will be as follows: > a photo of john person
As the subject is a person, the class prompt will be as follows: > a photo of a person
In this demonstration, we’ll be utilizing photos of John as basis to train the Stable Diffusion Model, aiming to generate images similar to John.
Creating the Instance and Class Prompts
We need to create a new variable concepts_list. It will be a list in Python. Then we need to specify the parameters.
= [
concepts_list
{"instance_prompt": "john",
"class_prompt": "photo of a person",
"instance_data_dir": "/content/data/john",
"class_data_dir": "/content/data/person"
} ]
Creating Directories and JSON File
We need to create directories and convert this variable into a JSON file
import json
import os
import random
for c in concepts_list:
"instance_data_dir"], exist_ok=True) os.makedirs(c[
with open("concepts_list.json", "w") as f:
=4) json.dump(concepts_list, f, indent
Upload Training Images
- Ensure that images are of size 512 x 512. If they are not, need to resize them.
- Use the file upload feature in the left sidebar.
- Move the images to folder
/content/data/john
Specifying Parameters
We need to specify some parameters before running the algorithm
= 15
num_imgs = num_imgs * 12
num_class_images = num_imgs * 100
max_num_steps = 1e-6
learning_rate = int(max_num_steps / num_imgs) lr_warmup_steps
print(num_imgs, num_class_images, max_num_steps, learning_rate, lr_warmup_steps)
15 180 1500 1e-06 100
The learning_rate
is a hyperparameter that determines the step size at which an optimization algorithm (like gradient descent) proceeds while learning from the data. It controls how much to change the model in response to the estimated error each time the model weights are updated.
If the learning rate is too small, the model will need many updates to converge to the best values, which can take a long time. On the other hand, if the learning rate is too large, the updates may be too significant and the model may pass over the optimal solution, or even diverge.
lr_warmup_steps
is a hyperparameter used in the learning rate scheduling strategy, specifically in the warmup phase of training.
Learning rate warmup is a strategy where the learning rate is initially set to a small value and gradually increased to the maximum or initial learning rate. This is done over a certain number of steps or epochs, which is what lr_warmup_steps
refers to.
The purpose of this strategy is to prevent the model from overfitting early in the training process. By starting with a smaller learning rate, the model makes smaller adjustments and doesn’t converge too quickly to a suboptimal solution. After the warmup steps, the learning rate is increased to allow the model to learn more quickly and converge to the optimal solution.
Execute Training
Finally, we can train the algorithm
!python3 train_dreambooth.py \
--pretrained_model_name_or_path=$model_sd \
--pretrained_vae_name_or_path="stabilityai/sd-vae-ft-mse" \
--instance_data_dir=$output_dir \
--output_dir=$output_dir \
--revision="fp16" \
--with_prior_preservation --prior_loss_weight=1.0 \
--seed=777 \
--resolution=512 \
--train_batch_size=1 \
--train_text_encoder \
--mixed_precision="fp16" \
--use_8bit_adam \
--gradient_accumulation_steps=1 \
--learning_rate=$learning_rate \
--lr_scheduler="constant" \
--lr_warmup_steps=$lr_warmup_steps \
--num_class_images=$num_class_images \
--sample_batch_size=4 \
--max_train_steps=$max_num_steps \
--save_interval=10000 \
--save_sample_prompt="john" \
--concepts_list="concepts_list.json"
This process will take about 20 minutes to finish. If an error occurs during training, ensure that the images or datasets are in the correct folder. Once the training is complete, we can proceed to perform the first tests.
Images and model weights are stored
The weights directory is a specific location in the file system where the weights of a trained machine learning model are stored.
These weights are the learned parameters that the model uses to make predictions or decisions.
They are typically saved so that the model can be reused later, either for further training, for fine-tuning on a different task, or for direct inference.
from natsort import natsorted
from glob import glob
import os
= natsorted(glob(output_dir + os.sep + '*'))[-1]
weights_dir print('Weights directory: ', weights_dir)
Weights directory: /content/stable_diffusion_weights/john/1500
import os
import matplotlib.pyplot as plt
from PIL import Image
# function to display images in grid
def grid_img(imgs, rows=1, cols=3, scale=1):
assert len(imgs) == rows * cols
= imgs[0].size
w, h = int(w*scale), int(h*scale)
w, h
= Image.new('RGB', size=(cols*w, rows*h))
grid = grid.size
grid_w, grid_h
for i, img in enumerate(imgs):
= img.resize((w,h), Image.ANTIALIAS)
img =(i%cols*w, i//cols*h))
grid.paste(img, boxreturn grid
= output_dir
weights_folder = sorted([f for f in os.listdir(weights_folder) if f != "0"], key = lambda x: int(x))
folders
= []
imgs_test
for imgs, folder in enumerate(folders):
= os.path.join(weights_folder, folder)
folder_path = os.path.join(folder_path, "samples")
image_folder = [f for f in os.listdir(image_folder)]
images
for i in images:
= os.path.join(image_folder, i)
img_path = Image.open(img_path)
r
imgs_test.append(r)
# show images that generated after training
=1, cols=4, scale=1) grid_img(imgs_test, rows
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
Convert the weights into (checkpoint)
Checkpoints are used to save and load the progress of training, allowing you to resume training from the exact point it was stopped.
Converting the weights into a checkpoint involves saving the current state of the model, including its learned weights, into a format that can be easily loaded later.
This process allows for the model’s state to be preserved, so that the training process can be resumed later if needed, or the trained model can be used for generating images.
= weights_dir + "/model.ckpt"
ckpt_path
= "--half"
half_arg
!python convert_diffusers_to_original_stable_diffusion.py --model_path $weights_dir --checkpoint_path $ckpt_path $half_arg
print(f"Converted to ckpt and saved in {ckpt_path}")
Inference / Generating images (tests the fine tune model)
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display
= weights_dir
model_path print(model_path)
/content/stable_diffusion_weights/john/1500
= StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16).to('cuda') pipe
= DDIMScheduler.from_config(pipe.scheduler.config)
pipe.scheduler
pipe.enable_xformers_memory_efficient_attention()
= lambda images, clip_input: (images, False)
pipe.safety_checker
= 555 seed
= "face portrait of john playing guitar in the restaurant, realistic, hd, vivid, sunset"
prompt = "bad anatomy, ugly, deformed, desfigured, distorted face, poorly drawn hands, poorly drawn face, poorly drawn feet, blurry, low quality, low definition, lowres, out of frame, out of image, cropped, cut off, signature, watermark"
negative_prompt = 5
num_samples = 7.5
guidance_scale = 30
num_inference_steps = 512
height = 512
width
= 123
seed print("Seed: {}".format(str(seed)))
= torch.Generator(device='cuda').manual_seed(seed)
generator
with autocast("cuda"), torch.inference_mode():
= pipe(
imgs
prompt,=negative_prompt,
negative_prompt=height, width=width,
height=num_samples,
num_images_per_prompt=num_inference_steps,
num_inference_steps=guidance_scale,
guidance_scale=generator
generator
).images
for img in imgs:
display(img)
Seed: 123
Testing multiple prompts
= ["photo of john person, closeup, mountain fuji in the background, natural lighting",
prompt "photo of john person in the desert, closeup, pyramids in the background, natural lighting, frontal face",
"photo of john person in the forest, natural lighting, frontal face",
"photo of john person as an astronaut, natural lighting, frontal face, closeup, starry sky in the background",
"face portrait of john in the snow, realistic, hd, vivid, sunset"]
= ["bad anatomy, ugly, deformed, desfigured, distorted face, poorly drawn hands, poorly drawn face, poorly drawn feet, blurry, low quality, low definition, lowres, out of frame, out of image, cropped, cut off, signature, watermark" ] * len(prompt)
negative_prompt = 1
num_samples = 8
guidance_scale = 75
num_inference_steps = 512
height = 512
width
= 88
seed print("Seed: {}".format(str(seed)))
= torch.Generator(device='cuda').manual_seed(seed)
generator
with autocast("cuda"), torch.inference_mode():
= pipe(
imgs
prompt,=negative_prompt,
negative_prompt=height, width=width,
height=num_samples,
num_images_per_prompt=num_inference_steps,
num_inference_steps=guidance_scale,
guidance_scale=generator
generator
).images
for img in imgs:
display(img)
Seed: 88
More prompt examples
- in the forest, in cairo, in cairo desert, in a western scene, in star wars, in mountain fuji, in the snow, etc.
Other combinations:
photo of john person, closeup, mountain fuji in the background, natural lighting
digital painting of john in the snow, realistic, hd, vivid, sunset
watercolor painting of john person, realistic, blue and orange tones
digital painting of john person, hyperrealistic, fantasy, Surrealist, painted by Alphonse Mucha
painting of john person in star wars, realistic, 4k ultra hd, blue and red tones
photo of john person, in an armor, realistic, visible face, colored, detailed face, ultra detailed, natural lighting
photo of john person, cyberpunk, vivid, realistic, 4k ultra hd
anime painting of john person, chill day, by tim okamura, noah bradley, trending on artstation
Saving the results
!mkdir results
mkdir: cannot create directory ‘results’: File exists
for i, img in enumerate(imgs):
'results/result_{}.png'.format(i+1)) img.save(
Exercise Fine Tuning
# Installing Libs
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/examples/dreambooth/train_dreambooth.py
!wget -q https://github.com/ShivamShrirao/diffusers/raw/main/scripts/convert_diffusers_to_original_stable_diffusion.py
%pip install -qq git+https://github.com/ShivamShrirao/diffusers
%pip install -q -U --pre triton
%pip install -q accelerate transformers ftfy bitsandbytes==0.35.0 gradio natsort safetensors xformers
%pip install rggrader
# @title #### Student Identity
= "your student id" # @param {type:"string"}
student_id = "your name" # @param {type:"string"} name
# @title #### 00. Fine-tuning with Dreambooth
from rggrader import submit_image
import torch
from torch import autocast
from diffusers import StableDiffusionPipeline, DDIMScheduler
from IPython.display import display
import os
import matplotlib.pyplot as plt
from PIL import Image
from natsort import natsorted
from glob import glob
import json
import random
# TODO:
# 1. Load Model: Load 'runwayml/stable-diffusion-v1-5'.
# 2. Prepare Images: use your photos to train the provided model.
# 3. Fine-Tune: Train the model on your dataset.
# 4. Generate Faces: Use the fine-tuned model to create new faces.
# 5. Save Results: Store the generated images in the 'results' folder.
# 6. Upload Image: Choose one image from 'results' and upload it for review.
# Note: Create folder '/content/data/input_image' to upload Training Image
# Loading model and create output dir
= "runwayml/stable-diffusion-v1-5"
model_sd = "/content/stable_diffusion_weights/student_data"
output_dir !mkdir -p $output_dir
# Put your code here:
= None
imgs
# ---- End of your code ----
# Saving the results
!mkdir results
for i, img in enumerate(imgs):
'results/result_{}.png'.format(i+1)) img.save(
# Submit Method
= "00_fine_tuning"
assignment_id = "00_fine_tuning_with_dreambooth"
question_id 'your_image.png') # change 'your_image.png' to the name of the image you want to upload (eg. results/result_3.png) submit_image(student_id, question_id,