!pip install diffusers==0.14
!pip install -q accelerate transformers xformers
ControlNet with Stable Diffusion
This ControlNet is considered another way to guide the results in terms of composition and general of the image, as we have learned before.
As we know, it is possible to generate images from text, from images, and even by training custom model. However, Control Net introduces a new way to guide the generation of images, for example, we can control the Depth to Image, where both a text prompt and a depth image are used to condition the model. This allows you to get even more accurate results than the common image-to-image technique.
About the technique
- Paper Adding Conditional Control to Text-to-Image Diffusion Models published in February 2023
- ControlNet was developed from the idea that only text is not enough to solve all problems in image generation.
- First version: https://github.com/lllyasviel/ControlNet#below-is-controlnet-10
- Diagram and additional explanation: https://github.com/lllyasviel/ControlNet#stable-diffusion–controlnet
Paper: https://arxiv.org/pdf/2302.05543.pdf
ControlNet is a method used to manage the behavior of a neural network. It does this by adjusting the input conditions of the building blocks of the neural network, which are called network blocks
. For example, in a restnet
pretrained CNN model, residual network
is a network block
.
ControlNet 1.0
The image illustrates how to apply a ControlNet to any neural network block.
The
x
andy
represent deep features in neural networks. These are the complex representations that the network learns from the input data.The
+
symbol refers to feature addition, which is a way of combining the information from different features.The
c
represents an extra condition that is added to the neural network. This could be any additional information that you want the network to consider when making its predictions.
In implementing ControlNet, there are various techniques that can be used to condition the model. However, for this discussion, the focus will be on two specific methods:
- Edge Detection using Canny Edge
This technique involves identifying the boundaries of objects within an image. The Canny Edge Detection method is a popular algorithm that’s used to detect a wide range of edges in images. It’s used to help the model understand the shapes present in the input.
- Pose Estimation using Open Pose
This technique is about understanding the pose of a person in an image or video. Open Pose is a library that allows for real-time multi-person keypoint detection. It can identify where people are and how they are posed in an image or video. This information can be used to condition the model to understand and learn from the poses present in the input.
For more detailed information about implementing ControlNet and the various techniques used to condition the model, you can refer to the ControlNet GitHub repository. This resource provides comprehensive documentation, code examples, and further reading to help you understand and implement ControlNet effectively.
Installing the libraries
opencv-contrib-python
is a library for computer vision tasks, including edge detection using the Canny edge algorithm.controlnet-aux
is a library that contains auxiliary functions for the Control Net model.
!pip install -q opencv-contrib-python
!pip install -q controlnet_aux
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
import cv2
from PIL import Image
import numpy as np
#function show image as grid
def grid_img(imgs, rows=1, cols=3, scale=1):
assert len(imgs) == rows * cols
= imgs[0].size
w, h = int(w*scale), int(h*scale)
w, h
= Image.new('RGB', size=(cols*w, rows*h))
grid = grid.size
grid_w, grid_h
for i, img in enumerate(imgs):
= img.resize((w,h), Image.ANTIALIAS)
img =(i%cols*w, i//cols*h))
grid.paste(img, boxreturn grid
Generating Images Using Edges
ControlNet Model + Canny Edge
This is the algorithm used to extract the edges of images. It will be easier to understand during the implementations.
- More information about the model: https://huggingface.co/lllyasviel/sd-controlnet-canny
We are creating the variable control_net_canny_model with the corresponding link to download it from the repository.
= 'lllyasviel/sd-controlnet-canny'
controlnet_canny_model = ControlNetModel.from_pretrained(controlnet_canny_model, torch_dtype=torch.float16) control_net_canny
= StableDiffusionControlNetPipeline.from_pretrained('runwayml/stable-diffusion-v1-5',
pipe =control_net_canny,
controlnet=torch.float16) torch_dtype
from diffusers import UniPCMultistepScheduler
= UniPCMultistepScheduler.from_config(pipe.scheduler.config) pipe.scheduler
pipe.enable_attention_slicing() pipe.enable_xformers_memory_efficient_attention()
pipe.enable_model_cpu_offload()
Loading the image
Now we can load the image.
- Image source: https://unsplash.com/pt-br/fotografias/OjhSUsHUIYM
= Image.open('/content/bird2.jpg')
img img
Detecting edges using Canny Edge
In this technique, we are going to use the Canny edge algorithm to extract only the borders of the image. So instead of sending the whole image to the algorithm, we are going to send only the borders.
We are going to create a function that will receive as parameter this image and will return the edges. We don’t need to worry about it because OpenCV has a pre-built function, so we just need to call it to extract the edges.
- More about the algorithm: http://justin-liang.com/tutorials/canny/
- More about the implemetation in OpenCV: https://docs.opencv.org/3.4/da/d22/tutorial_py_canny.html
def canny_edge(img, low_threshold = 100, high_threshold = 200):
= np.array(img)
img
= cv2.Canny(img, low_threshold, high_threshold)
img
= img[:, :, None]
img
= np.concatenate([img, img, img], axis = 2)
img
= Image.fromarray(img)
canny_img
return canny_img
we can visualize the edges.
= canny_edge(img)
canny_img canny_img
we are able to visualize only the edges that have been extracted. Just a reminder that instead of sending the whole image to the algorithm, we are going to send only the edges. Then the algorithm will be able to generate new birds according to the edges.
We create a prompt, a seed for reproducibility, and a generator. Then we call the pipeline, sending the prompt and the edges of the image as parameters.
= "realistic photo of a blue bird with purple details, high quality, natural light"
prompt = ""
neg_prompt
= 777
seed = torch.Generator(device="cuda").manual_seed(seed)
generator
= pipe(
imgs
prompt,
canny_img,=neg_prompt,
negative_prompt=generator,
generator=20,
num_inference_steps )
0] imgs.images[
We can see a high-quality image that is related to the edges and is also in accordance with the prompt.
We can perform tests using different prompts and negative prompts.
= ["realistic photo of a blue bird with purple details, high quality, natural light",
prompt "realistic photo of a bird in new york during autumn, city in the background",
"oil painting of a black bird in the desert, realistic, vivid, fantasy, surrealist, best quality, extremely detailed",
"digital painting of a blue bird in space, stars and galaxy in the background, trending on artstation"]
= ["blurred, lowres, bad anatomy, ugly, worst quality, low quality, monochrome, signature"] * len(prompt)
neg_prompt
= 777
seed = torch.Generator(device="cuda").manual_seed(seed)
generator
= pipe(
imgs
prompt,
canny_img,=neg_prompt,
negative_prompt=generator,
generator=20,
num_inference_steps )
1, len(prompt), scale=0.75) grid_img(imgs.images,
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
Let’s try with another image
= Image.open("/content/wolf.jpg") img
= canny_edge(img, 200, 255)
canny_img
1, 2) grid_img([img, canny_img],
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
= ["realistic photo of a wolf, high quality, natural light, full moon",
prompt "realistic photo of a wolf in the snow, best quality, extremely detailed",
"oil painting of wolf the desert, canyons in the background, realistic, vivid, fantasy, surrealist, best quality, extremely detailed",
"watercolor painting of a wolf in space, blue and purple tones, stars and earth in the background"]
= ["blurred, lowres, bad anatomy, ugly, worst quality, low quality, monochrome, signature"] * len(prompt)
neg_prompt
= 777
seed = torch.Generator(device="cuda").manual_seed(seed)
generator
= pipe(
imgs
prompt,
canny_img,=neg_prompt,
negative_prompt=generator,
generator=20,
num_inference_steps )
1, len(prompt), scale=0.75) grid_img(imgs.images,
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
Generating Images Using Poses
We will learn how to generate images using poses.
If the desired images cannot be found, there are several online 3D software options available for creating posed images:
- 3D software to create posed images:
- Magicposer: https://magicposer.com/
- Posemyart: https://posemy.art/
Loading the model to extract poses
The first step is to download the model from controlnet_aux
, a library we will import.
We will also import the OpenposeDetector
. We will send an image to this detector and it will return the pose of that image.
from controlnet_aux import OpenposeDetector
= OpenposeDetector.from_pretrained('lllyasviel/ControlNet') pose_model
Extract The Pose
First, the image is loaded. Then, the pose is extracted using the pose_model
function.
We will see the pose that has been extracted from the image.
The extracted keypoints represent specific points related to various body parts such as the head, shoulders, arms, hands, legs, feet, and so on.
= Image.open('/content/pose01.jpg') img_pose
= pose_model(img_pose)
pose =1, cols=2, scale=0.75) grid_img([img_pose, pose], rows
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
Loading the ControlNet model
The next step is to load the ControlNet model.
= ControlNetModel.from_pretrained('thibaud/controlnet-sd21-openpose-diffusers', torch_dtype=torch.float16)
controlnet_pose_model = StableDiffusionControlNetPipeline.from_pretrained('stabilityai/stable-diffusion-2-1-base',
sd_controlpose =controlnet_pose_model,
controlnet=torch.float16) torch_dtype
sd_controlpose.enable_model_cpu_offload()
sd_controlpose.enable_attention_slicing() sd_controlpose.enable_xformers_memory_efficient_attention()
from diffusers import DEISMultistepScheduler
= DEISMultistepScheduler.from_config(sd_controlpose.scheduler.config) sd_controlpose.scheduler
= 555
seed = torch.Generator(device="cuda").manual_seed(seed)
generator = "professional photo of a young woman in the street, casual fashion, sharp focus, insanely detailed, photorealistic, sunset, side light"
prompt = "ugly, tiling, closed eyes, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face"
neg_prompt
= sd_controlpose(
imgs
prompt,
pose,=neg_prompt,
negative_prompt=4,
num_images_per_prompt=generator,
generator=20,
num_inference_steps
)1, 4, 0.75) grid_img(imgs.images,
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
Trying Different Images and Prompts
Let’s switch things up and use a different pose image.
= Image.open("man-pose.jpg")
img_pose
= pose_model(img_pose)
pose
1, 2, scale=0.5) grid_img([img_pose, pose],
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
= 999
seed = torch.Generator(device="cuda").manual_seed(seed)
generator = "professional photo of a young asian man in the office, formal fashion, smile, waring hat, sharp focus, insanely detailed, photorealistic, side light"
prompt = "ugly, tiling, closed eyes, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face"
neg_prompt
= sd_controlpose(
imgs
prompt,
pose,=neg_prompt,
negative_prompt=4,
num_images_per_prompt=generator,
generator=20,
num_inference_steps
)1, 4, 0.75) grid_img(imgs.images,
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
= 123
seed = ["oil painting walter white wearing a suit and black hat and sunglasses, face portrait, in the desert, realistic, vivid",
prompt "oil painting walter white wearing a jedi brown coat, face portrait, wearing a hood, holding a cup of coffee, in another planet, realistic, vivid",
"professional photo of walter white wearing a space suit, face portrait, in mars, realistic, vivid",
"professional photo of walter white in the kitchen, face portrait, realistic, vivid"]
= ["helmet, ugly, tiling, closed eyes, poorly drawn hands, poorly drawn feet, poorly drawn face, out of frame, extra limbs, disfigured, deformed, body out of frame, bad anatomy, watermark, signature, cut off, low contrast, underexposed, overexposed, bad art, beginner, amateur, distorted face"] * len(prompt)
neg_prompt = 1
num_imgs
= torch.Generator(device="cuda").manual_seed(seed)
generator = sd_controlpose(
imgs
prompt,
pose,=neg_prompt,
negative_prompt=generator,
generator=20,
num_inference_steps
)1, len(prompt), 0.75) grid_img(imgs.images,
DeprecationWarning: ANTIALIAS is deprecated and will be removed in Pillow 10 (2023-07-01). Use LANCZOS or Resampling.LANCZOS instead.
img = img.resize((w,h), Image.ANTIALIAS)
Improve The Result
For enhanced results:
- Experiment with various schedulers. Euler A is also suggested for use with ControlNet.
- Modify the parameters (CFG, steps, etc.).
- Employ effective negative prompts.
- Tailor the prompt to closely match the initial pose.
- Providing more context about the action is advisable. For instance, “walking in the street” typically yields better outcomes than simply “in the street”.
- Inpainting can be utilized to correct faces that haven’t been generated accurately.
Exercise ControlNet
# @title #### Student Identity
= "your student id" # @param {type:"string"}
student_id = "your name" # @param {type:"string"} name
# Intalling Libs
%pip install diffusers==0.14
%pip install -q accelerate transformers xformers
%pip install -q controlnet_aux
%pip install rggrader
# @title #### 00. Generating Images Using Poses
from rggrader import submit_image
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
import torch
import cv2
from PIL import Image
from controlnet_aux import OpenposeDetector
from diffusers import DEISMultistepScheduler
# TODO:
# 1. Use the 'lllyasviel/ControlNet' model to extract the pose from the reference image. This model will allow us to understand the pose that is present in the image.
# 2. Use the ControlNet models 'thibaud/controlnet-sd21-openpose-diffusers' and 'stabilityai/stable-diffusion-2-1-base' to generate the desired image. These models will take the pose extracted from the previous step and use it to generate a new image.
# 3. The image generation will be based on the prompt that you input. Make sure your prompt is clear and describes the image you want to generate accurately.
# 4. Once the image is generated, save it in the 'results' folder. This will ensure that you can easily locate and review the image later.
# 5. Finally, select one of the generated images to upload. This image will be the final output of your exercise.
# NOTE: Remember, the quality of the generated image will greatly depend on the accuracy of the pose extracted from the reference image and the clarity of your prompt.
# Loading model and create output dir
= OpenposeDetector.from_pretrained('lllyasviel/ControlNet')
pose_model
# Put your code here:
= None
imgs
# ---- End of your code ----
# Saving the results
!mkdir results
for i, img in enumerate(imgs.images):
'results/result_{}.png'.format(i+1)) img.save(
# Submit Method
= "00_controlnet"
assignment_id = "00_generating_images_using_poses"
question_id 'your_image.png') # change 'your_image.png' to the name of the image you want to upload (eg. results/result_3.png) submit_image(student_id, question_id,