From 71ae84c53b2160d0267193470121e60eae4f6326 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Wed, 15 Mar 2023 10:09:26 +0530 Subject: [PATCH 1/4] add: controlnet entry to training section in the docs. --- docs/source/en/_toctree.yml | 2 + docs/source/en/training/controlnet.mdx | 281 +++++++++++++++++++++++++ docs/source/en/training/overview.mdx | 3 + 3 files changed, 286 insertions(+) create mode 100644 docs/source/en/training/controlnet.mdx diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 3364e2b1e21e..93024af32c62 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -91,6 +91,8 @@ title: Text-to-image - local: training/lora title: Low-Rank Adaptation of Large Language Models (LoRA) + - local: training/controlnet + title: Adding Conditional Control to Text-to-Image Diffusion Models title: Training - sections: - local: conceptual/philosophy diff --git a/docs/source/en/training/controlnet.mdx b/docs/source/en/training/controlnet.mdx new file mode 100644 index 000000000000..8ee46bc018fe --- /dev/null +++ b/docs/source/en/training/controlnet.mdx @@ -0,0 +1,281 @@ + + +# ControlNet + +[Adding Conditional Control to Text-to-Image Diffusion Models](https://arxiv.org/abs/2302.05543) (ControlNet) by Lvmin Zhang and Maneesh Agrawala. + +This example is based on the [training example in the original ControlNet repository](https://github.com/lllyasviel/ControlNet/blob/main/docs/train.md). It trains a ControlNet to fill circles using a [small synthetic dataset](https://huggingface.co/datasets/fusing/fill50k). + +## Installing the dependencies + +Before running the scripts, make sure to install the library's training dependencies: + +**Important** + +To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: +```bash +git clone https://github.com/huggingface/diffusers +cd diffusers +pip install -e . +``` + +Then cd in the example folder and run +```bash +pip install -r requirements.txt +``` + +And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: + +```bash +accelerate config +``` + +Or for a default accelerate configuration without answering questions about your environment + +```bash +accelerate config default +``` + +Or if your environment doesn't support an interactive shell e.g. a notebook + +```python +from accelerate.utils import write_basic_config +write_basic_config() +``` + +## Circle filling dataset + +The original dataset is hosted in the [ControlNet repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip). We re-uploaded it to be compatible with `datasets` [here](https://huggingface.co/datasets/fusing/fill50k). Note that `datasets` handles dataloading within the training script. + +Our training examples use [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) as the original set of ControlNet models were trained from it. However, ControlNet can be trained to augment any Stable Diffusion compatible model (such as [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1). + +## Training + +Our training examples use two test conditioning images. They can be downloaded by running + +```sh +wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png + +wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png +``` + + +```bash +export MODEL_DIR="runwayml/stable-diffusion-v1-5" +export OUTPUT_DIR="path to save model" + +accelerate launch train_controlnet.py \ + --pretrained_model_name_or_path=$MODEL_DIR \ + --output_dir=$OUTPUT_DIR \ + --dataset_name=fusing/fill50k \ + --resolution=512 \ + --learning_rate=1e-5 \ + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ + --train_batch_size=4 +``` + +This default configuration requires ~38GB VRAM. + +By default, the training script logs outputs to tensorboard. Pass `--report_to wandb` to use weights and +biases. + +Gradient accumulation with a smaller batch size can be used to reduce training requirements to ~20 GB VRAM. + +```bash +export MODEL_DIR="runwayml/stable-diffusion-v1-5" +export OUTPUT_DIR="path to save model" + +accelerate launch train_controlnet.py \ + --pretrained_model_name_or_path=$MODEL_DIR \ + --output_dir=$OUTPUT_DIR \ + --dataset_name=fusing/fill50k \ + --resolution=512 \ + --learning_rate=1e-5 \ + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ + --train_batch_size=1 \ + --gradient_accumulation_steps=4 +``` + +## Example results + +#### After 300 steps with batch size 8 + +| | | +|-------------------|:-------------------------:| +| | red circle with blue background | +![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![red circle with blue background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_300_steps.png) | +| | cyan circle with brown floral background | +![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![cyan circle with brown floral background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_300_steps.png) | + + +#### After 6000 steps with batch size 8: + +| | | +|-------------------|:-------------------------:| +| | red circle with blue background | +![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png) | ![red circle with blue background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/red_circle_with_blue_background_6000_steps.png) | +| | cyan circle with brown floral background | +![conditioning image](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_2.png) | ![cyan circle with brown floral background](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/cyan_circle_with_brown_floral_background_6000_steps.png) | + +## Training on a 16 GB GPU + +Optimizations: +- Gradient checkpointing +- bitsandbyte's 8-bit optimizer + +[bitandbytes install instructions](https://github.com/TimDettmers/bitsandbytes#requirements--installation). + +```bash +export MODEL_DIR="runwayml/stable-diffusion-v1-5" +export OUTPUT_DIR="path to save model" + +accelerate launch train_controlnet.py \ + --pretrained_model_name_or_path=$MODEL_DIR \ + --output_dir=$OUTPUT_DIR \ + --dataset_name=fusing/fill50k \ + --resolution=512 \ + --learning_rate=1e-5 \ + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ + --train_batch_size=1 \ + --gradient_accumulation_steps=4 \ + --gradient_checkpointing \ + --use_8bit_adam +``` + +## Training on a 12 GB GPU + +Optimizations: +- Gradient checkpointing +- bitsandbyte's 8-bit optimizer +- xformers +- set grads to none + +```bash +export MODEL_DIR="runwayml/stable-diffusion-v1-5" +export OUTPUT_DIR="path to save model" + +accelerate launch train_controlnet.py \ + --pretrained_model_name_or_path=$MODEL_DIR \ + --output_dir=$OUTPUT_DIR \ + --dataset_name=fusing/fill50k \ + --resolution=512 \ + --learning_rate=1e-5 \ + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ + --train_batch_size=1 \ + --gradient_accumulation_steps=4 \ + --gradient_checkpointing \ + --use_8bit_adam \ + --enable_xformers_memory_efficient_attention \ + --set_grads_to_none +``` + +When using `enable_xformers_memory_efficient_attention`, please make sure to install `xformers` by `pip install xformers`. + +## Training on an 8 GB GPU + +We have not exhaustively tested DeepSpeed support for ControlNet. While the configuration does +save memory, we have not confirmed the configuration to train successfully. You will very likely +have to make changes to the config to have a successful training run. + +Optimizations: +- Gradient checkpointing +- xformers +- set grads to none +- DeepSpeed stage 2 with parameter and optimizer offloading +- fp16 mixed precision + +[DeepSpeed](https://www.deepspeed.ai/) can offload tensors from VRAM to either +CPU or NVME. This requires significantly more RAM (about 25 GB). + +Use `accelerate config` to enable DeepSpeed stage 2. + +The relevant parts of the resulting accelerate config file are + +```yaml +compute_environment: LOCAL_MACHINE +deepspeed_config: + gradient_accumulation_steps: 4 + offload_optimizer_device: cpu + offload_param_device: cpu + zero3_init_flag: false + zero_stage: 2 +distributed_type: DEEPSPEED +``` + +See [documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more DeepSpeed configuration options. + +Changing the default Adam optimizer to DeepSpeed's Adam +`deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup but +it requires CUDA toolchain with the same version as pytorch. 8-bit optimizer +does not seem to be compatible with DeepSpeed at the moment. + +```bash +export MODEL_DIR="runwayml/stable-diffusion-v1-5" +export OUTPUT_DIR="path to save model" + +accelerate launch train_controlnet.py \ + --pretrained_model_name_or_path=$MODEL_DIR \ + --output_dir=$OUTPUT_DIR \ + --dataset_name=fusing/fill50k \ + --resolution=512 \ + --validation_image "./conditioning_image_1.png" "./conditioning_image_2.png" \ + --validation_prompt "red circle with blue background" "cyan circle with brown floral background" \ + --train_batch_size=1 \ + --gradient_accumulation_steps=4 \ + --gradient_checkpointing \ + --enable_xformers_memory_efficient_attention \ + --set_grads_to_none \ + --mixed_precision fp16 +``` + +## Performing inference with the trained ControlNet + +The trained model can be run the same as the original ControlNet pipeline with the newly trained ControlNet. +Set `base_model_path` and `controlnet_path` to the values `--pretrained_model_name_or_path` and +`--output_dir` were respectively set to in the training script. + +```py +from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler +from diffusers.utils import load_image +import torch + +base_model_path = "path to model" +controlnet_path = "path to controlnet" + +controlnet = ControlNetModel.from_pretrained(controlnet_path, torch_dtype=torch.float16) +pipe = StableDiffusionControlNetPipeline.from_pretrained( + base_model_path, controlnet=controlnet, torch_dtype=torch.float16 +) + +# speed up diffusion process with faster scheduler and memory optimization +pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config) +# remove following line if xformers is not installed +pipe.enable_xformers_memory_efficient_attention() + +pipe.enable_model_cpu_offload() + +control_image = load_image("./conditioning_image_1.png") +prompt = "pale golden rod circle with old lace background" + +# generate image +generator = torch.manual_seed(0) +image = pipe( + prompt, num_inference_steps=20, generator=generator, image=control_image +).images[0] + +image.save("./output.png") +``` diff --git a/docs/source/en/training/overview.mdx b/docs/source/en/training/overview.mdx index 3fbb1fd20846..5ad3a1f06cc1 100644 --- a/docs/source/en/training/overview.mdx +++ b/docs/source/en/training/overview.mdx @@ -38,6 +38,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie - [Text Inversion](./text_inversion) - [Dreambooth](./dreambooth) - [LoRA Support](./lora) +- [ControlNet](./controlnet) If possible, please [install xFormers](../optimization/xformers) for memory efficient attention. This could help make your training faster and less memory intensive. @@ -47,6 +48,8 @@ If possible, please [install xFormers](../optimization/xformers) for memory effi | [**Text-to-Image fine-tuning**](./text2image) | ✅ | ✅ | | [**Textual Inversion**](./text_inversion) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_textual_inversion_training.ipynb) | [**Dreambooth**](./dreambooth) | ✅ | - | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/sd_dreambooth_training.ipynb) +| [**Training with LoRA**](./lora) | ✅ | - | - | +| [**ControlNet**](./controlnet) | ✅ | ✅ | - | ## Community From 49b85ebf0e40c6b84b5b002562fd21b22e7f3ed1 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Wed, 15 Mar 2023 10:12:03 +0530 Subject: [PATCH 2/4] formatting. --- docs/source/en/training/controlnet.mdx | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/docs/source/en/training/controlnet.mdx b/docs/source/en/training/controlnet.mdx index 8ee46bc018fe..fc1530efa4bf 100644 --- a/docs/source/en/training/controlnet.mdx +++ b/docs/source/en/training/controlnet.mdx @@ -50,6 +50,7 @@ Or if your environment doesn't support an interactive shell e.g. a notebook ```python from accelerate.utils import write_basic_config + write_basic_config() ``` @@ -273,9 +274,7 @@ prompt = "pale golden rod circle with old lace background" # generate image generator = torch.manual_seed(0) -image = pipe( - prompt, num_inference_steps=20, generator=generator, image=control_image -).images[0] +image = pipe(prompt, num_inference_steps=20, generator=generator, image=control_image).images[0] image.save("./output.png") ``` From 8c99e580ef1f2d21e26ddea9eaa0f9aa7d47f0b0 Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Thu, 16 Mar 2023 08:42:03 +0530 Subject: [PATCH 3/4] Apply suggestions from code review Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com> --- docs/source/en/_toctree.yml | 2 +- docs/source/en/training/controlnet.mdx | 60 ++++++++++++++------------ 2 files changed, 34 insertions(+), 28 deletions(-) diff --git a/docs/source/en/_toctree.yml b/docs/source/en/_toctree.yml index 93024af32c62..5aba62b1613c 100644 --- a/docs/source/en/_toctree.yml +++ b/docs/source/en/_toctree.yml @@ -92,7 +92,7 @@ - local: training/lora title: Low-Rank Adaptation of Large Language Models (LoRA) - local: training/controlnet - title: Adding Conditional Control to Text-to-Image Diffusion Models + title: ControlNet title: Training - sections: - local: conceptual/philosophy diff --git a/docs/source/en/training/controlnet.mdx b/docs/source/en/training/controlnet.mdx index fc1530efa4bf..82cf885b88e8 100644 --- a/docs/source/en/training/controlnet.mdx +++ b/docs/source/en/training/controlnet.mdx @@ -18,18 +18,22 @@ This example is based on the [training example in the original ControlNet reposi ## Installing the dependencies -Before running the scripts, make sure to install the library's training dependencies: +Before running the scripts, make sure to install the library's training dependencies. -**Important** + -To make sure you can successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the install up to date as we update the example scripts frequently and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: +To successfully run the latest versions of the example scripts, we highly recommend **installing from source** and keeping the installation up to date. We update the example scripts frequently and install example-specific requirements. + + + +To do this, execute the following steps in a new virtual environment: ```bash git clone https://github.com/huggingface/diffusers cd diffusers pip install -e . ``` -Then cd in the example folder and run +Then navigate into the example folder and run: ```bash pip install -r requirements.txt ``` @@ -40,13 +44,13 @@ And initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) e accelerate config ``` -Or for a default accelerate configuration without answering questions about your environment +Or for a default 🤗Accelerate configuration without answering questions about your environment: ```bash accelerate config default ``` -Or if your environment doesn't support an interactive shell e.g. a notebook +Or if your environment doesn't support an interactive shell like a notebook: ```python from accelerate.utils import write_basic_config @@ -56,13 +60,13 @@ write_basic_config() ## Circle filling dataset -The original dataset is hosted in the [ControlNet repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip). We re-uploaded it to be compatible with `datasets` [here](https://huggingface.co/datasets/fusing/fill50k). Note that `datasets` handles dataloading within the training script. +The original dataset is hosted in the ControlNet [repo](https://huggingface.co/lllyasviel/ControlNet/blob/main/training/fill50k.zip), but we re-uploaded it [here](https://huggingface.co/datasets/fusing/fill50k) to be compatible with 🤗 Datasets so that it can handle the data loading within the training script. -Our training examples use [Stable Diffusion 1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) as the original set of ControlNet models were trained from it. However, ControlNet can be trained to augment any Stable Diffusion compatible model (such as [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1). +Our training examples use [`runwayml/stable-diffusion-v1-5`](https://huggingface.co/runwayml/stable-diffusion-v1-5) because that is what the original set of ControlNet models was trained on. However, ControlNet can be trained to augment any compatible Stable Diffusion model (such as [`CompVis/stable-diffusion-v1-4`](https://huggingface.co/CompVis/stable-diffusion-v1-4)) or [`stabilityai/stable-diffusion-2-1`](https://huggingface.co/stabilityai/stable-diffusion-2-1). ## Training -Our training examples use two test conditioning images. They can be downloaded by running +Download the following images to condition our training with: ```sh wget https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/controlnet_training/conditioning_image_1.png @@ -88,8 +92,8 @@ accelerate launch train_controlnet.py \ This default configuration requires ~38GB VRAM. -By default, the training script logs outputs to tensorboard. Pass `--report_to wandb` to use weights and -biases. +By default, the training script logs outputs to tensorboard. Pass `--report_to wandb` to use Weights & +Biases. Gradient accumulation with a smaller batch size can be used to reduce training requirements to ~20 GB VRAM. @@ -132,11 +136,12 @@ accelerate launch train_controlnet.py \ ## Training on a 16 GB GPU -Optimizations: +Enable the following optimizations to train on a 16GB GPU: + - Gradient checkpointing -- bitsandbyte's 8-bit optimizer +- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) -[bitandbytes install instructions](https://github.com/TimDettmers/bitsandbytes#requirements--installation). +Now you can launch the training script: ```bash export MODEL_DIR="runwayml/stable-diffusion-v1-5" @@ -158,11 +163,11 @@ accelerate launch train_controlnet.py \ ## Training on a 12 GB GPU -Optimizations: +Enable the following optimizations to train on a 12GB GPU: - Gradient checkpointing -- bitsandbyte's 8-bit optimizer -- xformers -- set grads to none +- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) +- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed) +- set gradients to `None` ```bash export MODEL_DIR="runwayml/stable-diffusion-v1-5" @@ -189,22 +194,23 @@ When using `enable_xformers_memory_efficient_attention`, please make sure to ins ## Training on an 8 GB GPU We have not exhaustively tested DeepSpeed support for ControlNet. While the configuration does -save memory, we have not confirmed the configuration to train successfully. You will very likely +save memory, we have not confirmed whether the configuration trains successfully. You will very likely have to make changes to the config to have a successful training run. -Optimizations: +Enable the following optimizations to train on a 8GB GPU: - Gradient checkpointing -- xformers -- set grads to none +- bitsandbyte's 8-bit optimizer (take a look at the [installation]((https://github.com/TimDettmers/bitsandbytes#requirements--installation) instructions if you don't already have it installed) +- xFormers (take a look at the [installation](https://huggingface.co/docs/diffusers/training/optimization/xformers) instructions if you don't already have it installed) +- set gradients to `None` - DeepSpeed stage 2 with parameter and optimizer offloading - fp16 mixed precision [DeepSpeed](https://www.deepspeed.ai/) can offload tensors from VRAM to either CPU or NVME. This requires significantly more RAM (about 25 GB). -Use `accelerate config` to enable DeepSpeed stage 2. +You'll have to configure your environment with `accelerate config` to enable DeepSpeed stage 2. -The relevant parts of the resulting accelerate config file are +The configuration file should look like this: ```yaml compute_environment: LOCAL_MACHINE @@ -221,7 +227,7 @@ See [documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspee Changing the default Adam optimizer to DeepSpeed's Adam `deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup but -it requires CUDA toolchain with the same version as pytorch. 8-bit optimizer +it requires a CUDA toolchain with the same version as PyTorch. 8-bit optimizer does not seem to be compatible with DeepSpeed at the moment. ```bash @@ -243,9 +249,9 @@ accelerate launch train_controlnet.py \ --mixed_precision fp16 ``` -## Performing inference with the trained ControlNet +## Inference -The trained model can be run the same as the original ControlNet pipeline with the newly trained ControlNet. +The trained model can be run with the [`StableDiffusionControlNetPipeline`]. Set `base_model_path` and `controlnet_path` to the values `--pretrained_model_name_or_path` and `--output_dir` were respectively set to in the training script. From 5d1e552bd8a457a58562db0dc643c972bf0e230b Mon Sep 17 00:00:00 2001 From: Sayak Paul Date: Thu, 16 Mar 2023 08:43:21 +0530 Subject: [PATCH 4/4] wrap in a tip block. --- docs/source/en/training/controlnet.mdx | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/docs/source/en/training/controlnet.mdx b/docs/source/en/training/controlnet.mdx index 82cf885b88e8..6b7539b89b07 100644 --- a/docs/source/en/training/controlnet.mdx +++ b/docs/source/en/training/controlnet.mdx @@ -223,8 +223,12 @@ deepspeed_config: distributed_type: DEEPSPEED ``` + + See [documentation](https://huggingface.co/docs/accelerate/usage_guides/deepspeed) for more DeepSpeed configuration options. + + Changing the default Adam optimizer to DeepSpeed's Adam `deepspeed.ops.adam.DeepSpeedCPUAdam` gives a substantial speedup but it requires a CUDA toolchain with the same version as PyTorch. 8-bit optimizer