From 98e57e6323ac4e7991e4fe503fe22ae83669a8e8 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 17:07:04 +0200 Subject: [PATCH 01/32] [Examples readme] --- examples/C | 63 ++++++++ examples/README.md | 69 +++++++++ examples/community-examples/README.md | 207 ++++++++++++++++++++++++++ examples/training/requirements.txt | 3 + 4 files changed, 342 insertions(+) create mode 100644 examples/C create mode 100644 examples/README.md create mode 100644 examples/community-examples/README.md create mode 100644 examples/training/requirements.txt diff --git a/examples/C b/examples/C new file mode 100644 index 000000000000..46d38cadd73d --- /dev/null +++ b/examples/C @@ -0,0 +1,63 @@ + + +# 🧨 Diffusers Examples + +Diffusers examples are a collection of best-practices on how to use the `diffusers` library and +aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. + +More specifically, this means: + +- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/training/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/training/requirements) and execute the example script. +- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. +- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. +- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling +point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. + +We provide **official** examples for both [training](https://github.com/huggingface/diffusers/tree/main/examples/training) and [inference](https://github.com/huggingface/diffusers/tree/main/examples/inference) +that cover the most popular training and inference use cases of diffusion models. +*Official* examples are **actively** maintained by the `diffusers` maintainers and +for which we try to rigorously follow our example philosophy as defined above. +If you feel like an important examples (for either inference or training) is missing, as always, we +are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you πŸ€—. + +In additon, we provide **community** examples, which are examples added and maintained by our community. +For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. +Example that we deem not (yet) popular/important enough to go into the *official* examples or that don't +fully follow the philosophy defined above should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder including both examples for either inference or training. +Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to the `diffusers` library to show to the community how you like to use `diffusers`. + +## Training +Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: + +| Task | πŸ€— Accelerate | πŸ€— Datasets | Colab +|---|---|:---:|:---:|:---:|:---:| +| [**Unconditional Image Generation**](https://github.com/huggingface/transformers/tree/main/examples/training/train_unconditional.py) | βœ… | βœ… | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) + + +## Important note + +**Important** + +To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: +```bash +git clone https://github.com/huggingface/transformers +cd transformers +pip install . +``` +Then cd in the example folder of your choice and run +```bash +pip install -r requirements.txt +``` diff --git a/examples/README.md b/examples/README.md new file mode 100644 index 000000000000..8ec5dcf37de8 --- /dev/null +++ b/examples/README.md @@ -0,0 +1,69 @@ + + +# 🧨 Diffusers Examples + +Diffusers examples are a collection of best-practices on how to use the `diffusers` library and +aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. + +More specifically, this means: + +- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/training/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/training/requirements) and execute the example script. +- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. +- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. +- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling +point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. + +We provide **official** examples for both [training](https://github.com/huggingface/diffusers/tree/main/examples/training) and [inference](https://github.com/huggingface/diffusers/tree/main/examples/inference) +that cover the most popular training and inference use cases of diffusion models. +*Official* examples are **actively** maintained by the `diffusers` maintainers and +for which we try to rigorously follow our example philosophy as defined above. +If you feel like an important examples (for either inference or training) is missing, as always, we +are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you πŸ€—. + +In additon, we provide **community** examples, which are examples added and maintained by our community. +For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. +Example that we deem not (yet) popular/important enough to go into the *official* examples or that don't +fully follow the philosophy defined above should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder including both examples for either inference or training. +Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to the `diffusers` library to show to the community how you like to use `diffusers`. + +## Training + +Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: + +| Task | πŸ€— Accelerate | πŸ€— Datasets | Colab +|---|---|:---:|:---:|:---:|:---:| +| [**Unconditional Image Generation**](https://github.com/huggingface/transformers/tree/main/examples/training/train_unconditional.py) | βœ… | βœ… | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) + +## Inference + +Inference examples show how to build task-specific [Pipelines]( ) + + + +## Important note + +**Important** + +To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: +```bash +git clone https://github.com/huggingface/transformers +cd transformers +pip install . +``` +Then cd in the example folder of your choice and run +```bash +pip install -r requirements.txt +``` diff --git a/examples/community-examples/README.md b/examples/community-examples/README.md new file mode 100644 index 000000000000..e03d5e569f46 --- /dev/null +++ b/examples/community-examples/README.md @@ -0,0 +1,207 @@ +

+
+ +
+

+

+ + GitHub + + + GitHub release + + + Contributor Covenant + +

+ +πŸ€— Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves +as a modular toolbox for inference and training of diffusion models. + +More precisely, πŸ€— Diffusers offers: + +- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). +- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)). +- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)). +- Training examples to show how to train the most popular diffusion models (see [examples/training](https://github.com/huggingface/diffusers/tree/main/examples/training)). +- Inference examples to show how to create custom pipelines for advanced tasks such as image2image, in-painting (see [examples/inference](https://github.com/huggingface/diffusers/tree/main/examples/inference)) + +## Quickstart + +In order to get started, we recommend taking a look at two notebooks: + +- The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. + Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library. +- The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your + diffuser model on an image dataset, with explanatory graphics. + +## **New 🎨🎨🎨** Stable Diffusion is now fully compatible with `diffusers`! + +Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. +See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. + +You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-3), read the license and tick the checkbox if you agree. You have to be a registered user in πŸ€— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. + +```py +# make sure you're logged in with `huggingface-cli login` +from torch import autocast +from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler + +lms = LMSDiscreteScheduler( + beta_start=0.00085, + beta_end=0.012, + beta_schedule="scaled_linear" +) + +pipe = StableDiffusionPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-3", + scheduler=lms, + use_auth_token=True +).to("cuda") + +prompt = "a photo of an astronaut riding a horse on mars" +with autocast("cuda"): + image = pipe(prompt)["sample"][0] + +image.save("astronaut_rides_horse.png") +``` + +For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) +and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0). + +## Examples + +There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using `DiffusionPipeline`s and Google Colab) and interactive web-tools. + +### Running Code + +If you want to run the code yourself πŸ’», you can try out: +- [Text-to-Image Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256) +```python +# !pip install diffusers transformers +from diffusers import DiffusionPipeline + +model_id = "CompVis/ldm-text2im-large-256" + +# load model and scheduler +ldm = DiffusionPipeline.from_pretrained(model_id) + +# run pipeline in inference (sample random noise and denoise) +prompt = "A painting of a squirrel eating a burger" +images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6)["sample"] + +# save images +for idx, image in enumerate(images): + image.save(f"squirrel-{idx}.png") +``` +- [Unconditional Diffusion with discrete scheduler](https://huggingface.co/google/ddpm-celebahq-256) +```python +# !pip install diffusers +from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline + +model_id = "google/ddpm-celebahq-256" + +# load model and scheduler +ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference + +# run pipeline in inference (sample random noise and denoise) +image = ddpm()["sample"] + +# save image +image[0].save("ddpm_generated_image.png") +``` +- [Unconditional Latent Diffusion](https://huggingface.co/CompVis/ldm-celebahq-256) +- [Unconditional Diffusion with continous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024) + +**Other Notebooks**: +* [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg), +* [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg), + +### Web Demos +If you just want to play around with some web demos, you can try out the following πŸš€ Spaces: +| Model | Hugging Face Spaces | +|-------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| Text-to-Image Latent Diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/text2img-latent-diffusion) | +| Faces generator | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/celeba-latent-diffusion) | +| DDPM with different schedulers | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/fusing/celeba-diffusion) | +| Conditional generation from sketch (*SOON*) | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/huggingface/diffuse-the-rest) | +| Composable diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Shuang59/Composable-Diffusion) | + +## Definitions + +**Models**: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to *denoise* a noisy input to an image. +*Examples*: UNet, Conditioned UNet, 3D UNet, Transformer UNet + +

+ +
+ Figure from DDPM paper (https://arxiv.org/abs/2006.11239). +

+ +**Schedulers**: Algorithm class for both **inference** and **training**. +The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. +*Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902) + +

+ +
+ Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239). +

+ + +**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ... +*Examples*: Glide, Latent-Diffusion, Imagen, DALL-E 2 + +

+ +
+ Figure from ImageGen (https://imagen.research.google/). +

+ +## Philosophy + +- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper. +- Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio. +- Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion). + +## Installation + +**With `pip`** + +```bash +pip install --upgrade diffusers # should install diffusers 0.2.4 +``` + +**With `conda`** + +```sh +conda install -c conda-forge diffusers +``` + +## In the works + +For the first release, πŸ€— Diffusers focuses on text-to-image diffusion techniques. However, diffusers can be used for much more than that! Over the upcoming releases, we'll be focusing on: + +- Diffusers for audio +- Diffusers for reinforcement learning (initial work happening in https://github.com/huggingface/diffusers/pull/105). +- Diffusers for video generation +- Diffusers for molecule generation (initial work happening in https://github.com/huggingface/diffusers/pull/54) + +A few pipeline components are already being worked on, namely: + +- BDDMPipeline for spectrogram-to-sound vocoding +- GLIDEPipeline to support OpenAI's GLIDE model +- Grad-TTS for text to audio generation / conditional audio generation + +We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a [GitHub issue](https://github.com/huggingface/diffusers/issues) mentioning what you would like to see. + +## Credits + +This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today: + +- @CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion) +- @hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by @pesser, available [here](https://github.com/pesser/pytorch_diffusion) +- @ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim). +- @yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch) + +We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models) as well as @crowsonkb and @rromb for useful discussions and insights. diff --git a/examples/training/requirements.txt b/examples/training/requirements.txt new file mode 100644 index 000000000000..bbc690556020 --- /dev/null +++ b/examples/training/requirements.txt @@ -0,0 +1,3 @@ +accelerate +torchvision +datasets From 6a902a90411127ed10241531af9bcf1db0e697bd Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 17:38:49 +0200 Subject: [PATCH 02/32] Improve --- examples/C | 63 ------------------- examples/README.md | 38 +++++------ .../README.md | 0 .../README.md | 0 .../requirements.txt | 0 .../train_unconditional.py | 0 src/diffusers/pipelines/README.md | 26 ++++++-- .../stable_diffusion}/image_to_image.py | 0 .../pipelines/stable_diffusion}/inpainting.py | 0 .../pipelines/stable_diffusion}/readme.md | 0 10 files changed, 37 insertions(+), 90 deletions(-) delete mode 100644 examples/C rename examples/{community-examples => community}/README.md (100%) rename examples/{training => unconditional_image_generation}/README.md (100%) rename examples/{training => unconditional_image_generation}/requirements.txt (100%) rename examples/{training => unconditional_image_generation}/train_unconditional.py (100%) rename {examples/inference => src/diffusers/pipelines/stable_diffusion}/image_to_image.py (100%) rename {examples/inference => src/diffusers/pipelines/stable_diffusion}/inpainting.py (100%) rename {examples/inference => src/diffusers/pipelines/stable_diffusion}/readme.md (100%) diff --git a/examples/C b/examples/C deleted file mode 100644 index 46d38cadd73d..000000000000 --- a/examples/C +++ /dev/null @@ -1,63 +0,0 @@ - - -# 🧨 Diffusers Examples - -Diffusers examples are a collection of best-practices on how to use the `diffusers` library and -aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. - -More specifically, this means: - -- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/training/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/training/requirements) and execute the example script. -- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. -- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. -- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling -point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. - -We provide **official** examples for both [training](https://github.com/huggingface/diffusers/tree/main/examples/training) and [inference](https://github.com/huggingface/diffusers/tree/main/examples/inference) -that cover the most popular training and inference use cases of diffusion models. -*Official* examples are **actively** maintained by the `diffusers` maintainers and -for which we try to rigorously follow our example philosophy as defined above. -If you feel like an important examples (for either inference or training) is missing, as always, we -are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you πŸ€—. - -In additon, we provide **community** examples, which are examples added and maintained by our community. -For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. -Example that we deem not (yet) popular/important enough to go into the *official* examples or that don't -fully follow the philosophy defined above should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder including both examples for either inference or training. -Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to the `diffusers` library to show to the community how you like to use `diffusers`. - -## Training -Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: - -| Task | πŸ€— Accelerate | πŸ€— Datasets | Colab -|---|---|:---:|:---:|:---:|:---:| -| [**Unconditional Image Generation**](https://github.com/huggingface/transformers/tree/main/examples/training/train_unconditional.py) | βœ… | βœ… | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) - - -## Important note - -**Important** - -To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: -```bash -git clone https://github.com/huggingface/transformers -cd transformers -pip install . -``` -Then cd in the example folder of your choice and run -```bash -pip install -r requirements.txt -``` diff --git a/examples/README.md b/examples/README.md index 8ec5dcf37de8..ef84f596f2a5 100644 --- a/examples/README.md +++ b/examples/README.md @@ -15,31 +15,23 @@ limitations under the License. # 🧨 Diffusers Examples -Diffusers examples are a collection of best-practices on how to use the `diffusers` library and -aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. +Diffusers examples are a collection of best-practices on how to use the `diffusers` library. +**Note**: If you are looking for **official** examples on how to use `diffusers` for inference, +please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines) + +Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. More specifically, this means: -- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/training/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/training/requirements) and execute the example script. +- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements) and execute the example script. - **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. - **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. - **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. -We provide **official** examples for both [training](https://github.com/huggingface/diffusers/tree/main/examples/training) and [inference](https://github.com/huggingface/diffusers/tree/main/examples/inference) -that cover the most popular training and inference use cases of diffusion models. -*Official* examples are **actively** maintained by the `diffusers` maintainers and -for which we try to rigorously follow our example philosophy as defined above. -If you feel like an important examples (for either inference or training) is missing, as always, we -are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you πŸ€—. - -In additon, we provide **community** examples, which are examples added and maintained by our community. -For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. -Example that we deem not (yet) popular/important enough to go into the *official* examples or that don't -fully follow the philosophy defined above should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder including both examples for either inference or training. -Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to the `diffusers` library to show to the community how you like to use `diffusers`. - -## Training +We provide **official** examples that cover the most popular tasks of diffusion models. +*Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. +If you feel like an important examples, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you πŸ€—. Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: @@ -47,16 +39,16 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie |---|---|:---:|:---:|:---:|:---:| | [**Unconditional Image Generation**](https://github.com/huggingface/transformers/tree/main/examples/training/train_unconditional.py) | βœ… | βœ… | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -## Inference - -Inference examples show how to build task-specific [Pipelines]( ) - +## Community +In additon, we provide **community** examples, which are examples added and maintained by our community. +Community examples can consists of both *training* examples and *inference* pipelines. +For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. +Examples that we deem not (yet) popular/important enough to go into the [official training examples]( ) or [official pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), or that don't fully following the philosophy defined above, should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. +**Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to show to the community how you like to use `diffusers` πŸͺ„. ## Important note -**Important** - To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: ```bash git clone https://github.com/huggingface/transformers diff --git a/examples/community-examples/README.md b/examples/community/README.md similarity index 100% rename from examples/community-examples/README.md rename to examples/community/README.md diff --git a/examples/training/README.md b/examples/unconditional_image_generation/README.md similarity index 100% rename from examples/training/README.md rename to examples/unconditional_image_generation/README.md diff --git a/examples/training/requirements.txt b/examples/unconditional_image_generation/requirements.txt similarity index 100% rename from examples/training/requirements.txt rename to examples/unconditional_image_generation/requirements.txt diff --git a/examples/training/train_unconditional.py b/examples/unconditional_image_generation/train_unconditional.py similarity index 100% rename from examples/training/train_unconditional.py rename to examples/unconditional_image_generation/train_unconditional.py diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 19d59bf93db4..9222193cc148 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -1,14 +1,32 @@ -# Pipelines +# 🧨 Diffusers Pipelines +Pipelines provide a simple API to run diffusion systems that possibly include multiple different modeling +compenents in inference. + +**Note** Pipelines do and should not offer an any functionality to train independent diffusion models or other components of a diffusion system, such as text encoders, image generation models, or super-resolution models. If you are looking for *offical* training examples, please have a look at [exampels](https://github.com/huggingface/diffusers/tree/main/examples). + +## How to diffusers pipelines work + + +## How to write your own diffusion pipeline + +Diffusers pipelines consists of - Pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box - Pipelines should stay as close as possible to their original implementation - Pipelines can include components of other library, such as text-encoders. -## API +### Philosophy + +Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. +More specifically, this means: -TODO(Patrick, Anton, Suraj) +- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements) and execute the example script. +- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. +- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. +- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling +point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. -## Examples +### Examples - DDPM for unconditional image generation in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm/pipeline_ddpm.py). - DDIM for unconditional image generation in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim/pipeline_ddim.py). diff --git a/examples/inference/image_to_image.py b/src/diffusers/pipelines/stable_diffusion/image_to_image.py similarity index 100% rename from examples/inference/image_to_image.py rename to src/diffusers/pipelines/stable_diffusion/image_to_image.py diff --git a/examples/inference/inpainting.py b/src/diffusers/pipelines/stable_diffusion/inpainting.py similarity index 100% rename from examples/inference/inpainting.py rename to src/diffusers/pipelines/stable_diffusion/inpainting.py diff --git a/examples/inference/readme.md b/src/diffusers/pipelines/stable_diffusion/readme.md similarity index 100% rename from examples/inference/readme.md rename to src/diffusers/pipelines/stable_diffusion/readme.md From 5df441e1877a6e28e886b9d68f93659127dc3680 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 20:29:26 +0200 Subject: [PATCH 03/32] more --- src/diffusers/pipelines/README.md | 58 +++++++++++++++++++++++++------ 1 file changed, 48 insertions(+), 10 deletions(-) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 9222193cc148..358069ed3a94 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -1,22 +1,60 @@ # 🧨 Diffusers Pipelines -Pipelines provide a simple API to run diffusion systems that possibly include multiple different modeling -compenents in inference. +Pipelines provide a simply way to run state-of-the-art difffusion models in inference. +Most diffusion models consits of multiple independently-trained models and highly adaptable scheduler +components - all of which are needed to have a functioning end-to-end diffusion model. -**Note** Pipelines do and should not offer an any functionality to train independent diffusion models or other components of a diffusion system, such as text encoders, image generation models, or super-resolution models. If you are looking for *offical* training examples, please have a look at [exampels](https://github.com/huggingface/diffusers/tree/main/examples). +As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three indepently trained models: +- [Autoencoder](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/vae.py#L392) +- [Conditional Unet](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/unet_2d_condition.py#L12) +- [CLIP text encoder](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPTextModel) +- a scheduler component, [scheduler](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py), +- a [CLIPFeatureExtractor](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPFeatureExtractor), +- as well as a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py). +All of these components are necessary to run stable diffusion in inference even thought they were trained +or created independently from each other. -## How to diffusers pipelines work +To that end, we strive to offer all open-sourced, state-of-the-art diffusion models under a unified API. +More specifically, we strive to provide pipelines that +- 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implemetation according to the corresponding paper (*e.g.* [LatentDiffusionPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)), +- 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section), +- 3. are easy to understand with code that is self-explanatory and be can read along-side the official paper (see [Pipelines summary](#pipelines-summary)), +- 4. can easily be contributed by the community (see the [Contribution](#contribution) section). +**Note** that pipelines do and should not offer any training functionality. +If you are looking for *official* training examples, please have a look at [examples](https://github.com/huggingface/diffusers/tree/main/examples). -## How to write your own diffusion pipeline -Diffusers pipelines consists of -- Pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box -- Pipelines should stay as close as possible to their original implementation -- Pipelines can include components of other library, such as text-encoders. +## Pipelines Summary -### Philosophy +The following ta + + +## Pipelines API + +Diffusion models often consists of multiple independently trained models or created componets. + + +Each model has been trained indepently on a different task and the scheduler can easily be swapped out against another scheduler. +During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality: + +- [`from_pretrained` method](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L139) that accepts a Hugging Face Hub repository id, *e.g.* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) or a path to a local directory, *e.g.* +"./stable-diffusion". To correctly retrieve which models and components should be loaded one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be +loaded into the pipelines. More specifically, for each model/component one needs to define the format `: ["", ""]`. `` is the attribute name given to the loaded instance of `` which can be found in the library or pipeline folder called `""`. +- [`save_pretrained`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L90) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. +In additon, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated +from the local path. +- [`to`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L118) which accepts a `string` or `torch.device` to move all models that are of type `torch.nn.Module` to the passed device. The behavior is fully analogous to [PyTorch's `to` method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to). +- [`__call__`] method to use the pipeline in inference. The `__call__` defines infenence logic of the pipeline and should ideally encompass all parts from pre-processing to fowarding tensors to the different models and scheduler components as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for +each pipeline, one should look directly into the respective pipeline. + +**Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should +not be used for training. If you want to store the gradients during the forward pass, we recommend writing your own pipeline, see also our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/commmunity) + +## Contribution + +As always, we are more than happy about any contribution Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. More specifically, this means: From 04e2a9dfb63736d58e7107a895074919d8866029 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 21:30:21 +0200 Subject: [PATCH 04/32] save --- tests/test_pipelines.py | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index e6375ce06f30..61f0b1c78548 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -35,6 +35,8 @@ ScoreSdeVePipeline, ScoreSdeVeScheduler, StableDiffusionPipeline, + StableDiffusionInpaintingPipeline, + StableDiffusionInPaintPipeline, UNet2DModel, ) from diffusers.pipeline_utils import DiffusionPipeline @@ -390,3 +392,24 @@ def test_lms_stable_diffusion_pipeline(self): assert image.shape == (1, 512, 512, 3) expected_slice = np.array([0.9077, 0.9254, 0.9181, 0.9227, 0.9213, 0.9367, 0.9399, 0.9406, 0.9024]) assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 + + @slow + @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") + def test_stable_diffusion_pipeline(self): + model_id = "CompVis/stable-diffusion-v1-1" + model_id = "/home/patrick/stable-diffusion-v1-4" + + init_image = torch.ones( + mask_image = torch.ones( + pipe = StableDiffusionInPaintPipeline.from_pretrained(model_id, use_auth_token=True).to(torch_device) + + prompt = "a photograph of an astronaut riding a horse" + generator = torch.Generator(device=torch_device).manual_seed(0) + image = pipe([prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy")[ + "sample" + ] + + image_slice = image[0, -3:, -3:, -1] + assert image.shape == (1, 512, 512, 3) + expected_slice = np.array([0.9077, 0.9254, 0.9181, 0.9227, 0.9213, 0.9367, 0.9399, 0.9406, 0.9024]) + assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 From f2ee1fac003fe029733c48612b7dcba5ef777a30 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 21:30:25 +0200 Subject: [PATCH 05/32] save --- ...mage.py => pipeline_stable_diffusion_img2img.py} | 9 ++++++--- ...ting.py => pipeline_stable_diffusion_inpaint.py} | 13 ++++++++----- 2 files changed, 14 insertions(+), 8 deletions(-) rename src/diffusers/pipelines/stable_diffusion/{image_to_image.py => pipeline_stable_diffusion_img2img.py} (96%) rename src/diffusers/pipelines/stable_diffusion/{inpainting.py => pipeline_stable_diffusion_inpaint.py} (95%) diff --git a/src/diffusers/pipelines/stable_diffusion/image_to_image.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py similarity index 96% rename from src/diffusers/pipelines/stable_diffusion/image_to_image.py rename to src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index e5f34ad3df36..1be71813b656 100644 --- a/src/diffusers/pipelines/stable_diffusion/image_to_image.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -3,13 +3,16 @@ import numpy as np import torch - import PIL -from diffusers import AutoencoderKL, DDIMScheduler, DiffusionPipeline, PNDMScheduler, UNet2DConditionModel -from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker + from tqdm.auto import tqdm from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer +from ...models import AutoencoderKL, UNet2DConditionModel +from ...pipeline_utils import DiffusionPipeline +from ...schedulers import DDIMScheduler, PNDMScheduler +from .safety_checker import StableDiffusionSafetyChecker + def preprocess(image): w, h = image.size diff --git a/src/diffusers/pipelines/stable_diffusion/inpainting.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py similarity index 95% rename from src/diffusers/pipelines/stable_diffusion/inpainting.py rename to src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index d8082c6209b5..eb90530798a4 100644 --- a/src/diffusers/pipelines/stable_diffusion/inpainting.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -3,11 +3,14 @@ import numpy as np import torch - import PIL -from diffusers import AutoencoderKL, DDIMScheduler, DiffusionPipeline, PNDMScheduler, UNet2DConditionModel -from diffusers.pipelines.stable_diffusion import StableDiffusionSafetyChecker + from tqdm.auto import tqdm +from ...models import AutoencoderKL, UNet2DConditionModel +from ...pipeline_utils import DiffusionPipeline +from ...schedulers import DDIMScheduler, PNDMScheduler +from .safety_checker import StableDiffusionSafetyChecker + from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer @@ -34,7 +37,7 @@ def preprocess_mask(mask): return mask -class StableDiffusionInpaintingPipeline(DiffusionPipeline): +class StableDiffusionInPaintPipeline(DiffusionPipeline): def __init__( self, vae: AutoencoderKL, @@ -108,7 +111,7 @@ def __call__( # check sizes if not mask.shape == init_latents.shape: - raise ValueError(f"The mask and init_image should be the same size!") + raise ValueError("The mask and init_image should be the same size!") # get the original timestep using init_timestep init_timestep = int(num_inference_steps * strength) + offset From 5a1c49c9f39a553fbdb8844536501e1cbf01842b Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 21:47:15 +0200 Subject: [PATCH 06/32] save more --- examples/inference/README.md | 106 ++++++++++++++++++ src/diffusers/__init__.py | 7 +- src/diffusers/pipelines/__init__.py | 6 +- .../pipelines/stable_diffusion/__init__.py | 7 +- .../pipeline_stable_diffusion_img2img.py | 4 +- .../pipeline_stable_diffusion_inpaint.py | 6 +- .../utils/dummy_transformers_objects.py | 13 --- tests/test_pipelines.py | 54 ++++++--- 8 files changed, 170 insertions(+), 33 deletions(-) create mode 100644 examples/inference/README.md diff --git a/examples/inference/README.md b/examples/inference/README.md new file mode 100644 index 000000000000..55e17663b5e1 --- /dev/null +++ b/examples/inference/README.md @@ -0,0 +1,106 @@ +# Inference Examples + +**The inference examples folder is deprecated and will be removed in a future version**. + +- For `Image-to-Image text-guided generation with Stable Diffusion`, please have a look at +- For `In-painting using Stable Diffusion`, please have a look at +- For `Tweak prompts reusing seeds and latents`, please have a look at + + +DELETE THE FOLLOWING + +## Installing the dependencies + +Before running the scipts, make sure to install the library's dependencies: + +```bash +pip install diffusers transformers ftfy +``` + +## + +The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! + +### How to use it + + +```python +from torch import autocast +import requests +from PIL import Image +from io import BytesIO + +from image_to_image import StableDiffusionImg2ImgPipeline, preprocess + +# load the pipeline +device = "cuda" +pipe = StableDiffusionImg2ImgPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, + use_auth_token=True +).to(device) + +# let's download an initial image +url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" + +response = requests.get(url) +init_image = Image.open(BytesIO(response.content)).convert("RGB") +init_image = init_image.resize((768, 512)) +init_image = preprocess(init_image) + +prompt = "A fantasy landscape, trending on artstation" + +with autocast("cuda"): + images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"] + +images[0].save("fantasy_landscape.png") +``` +You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) + +## Tweak prompts reusing seeds and latents + +You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). + + +## In-painting using Stable Diffusion + +The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. + +### How to use it + +```python +from io import BytesIO + +from torch import autocast +import requests +import PIL + +from inpainting import StableDiffusionInpaintingPipeline + +def download_image(url): + response = requests.get(url) + return PIL.Image.open(BytesIO(response.content)).convert("RGB") + +img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" +mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" + +init_image = download_image(img_url).resize((512, 512)) +mask_image = download_image(mask_url).resize((512, 512)) + +device = "cuda" +pipe = StableDiffusionInpaintingPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, + use_auth_token=True +).to(device) + +prompt = "a cat sitting on a bench" +with autocast("cuda"): + images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75)["sample"] + +images[0].save("cat_on_bench.png") +``` + +You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/in_painting_with_stable_diffusion_using_diffusers.ipynb) diff --git a/src/diffusers/__init__.py b/src/diffusers/__init__.py index 26922447e4ea..232727b637cb 100644 --- a/src/diffusers/__init__.py +++ b/src/diffusers/__init__.py @@ -38,6 +38,11 @@ if is_transformers_available(): - from .pipelines import LDMTextToImagePipeline, StableDiffusionPipeline + from .pipelines import ( + LDMTextToImagePipeline, + StableDiffusionImg2ImgPipeline, + StableDiffusionInPaintPipeline, + StableDiffusionPipeline, + ) else: from .utils.dummy_transformers_objects import * diff --git a/src/diffusers/pipelines/__init__.py b/src/diffusers/pipelines/__init__.py index e4e1fffa2eb7..709c5451eca6 100644 --- a/src/diffusers/pipelines/__init__.py +++ b/src/diffusers/pipelines/__init__.py @@ -13,4 +13,8 @@ if is_transformers_available(): from .latent_diffusion import LDMTextToImagePipeline - from .stable_diffusion import StableDiffusionPipeline + from .stable_diffusion import ( + StableDiffusionImg2ImgPipeline, + StableDiffusionInPaintPipeline, + StableDiffusionPipeline, + ) diff --git a/src/diffusers/pipelines/stable_diffusion/__init__.py b/src/diffusers/pipelines/stable_diffusion/__init__.py index 5306ba821a1e..49bb7978c5d0 100644 --- a/src/diffusers/pipelines/stable_diffusion/__init__.py +++ b/src/diffusers/pipelines/stable_diffusion/__init__.py @@ -3,4 +3,9 @@ if is_transformers_available(): - from .pipeline_stable_diffusion import StableDiffusionPipeline, StableDiffusionSafetyChecker + from .pipeline_stable_diffusion import ( + StableDiffusionImg2ImgPipeline, + StableDiffusionInPaintPipeline, + StableDiffusionPipeline, + StableDiffusionSafetyChecker, + ) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index 1be71813b656..e094c6a535a6 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -3,8 +3,8 @@ import numpy as np import torch -import PIL +import PIL from tqdm.auto import tqdm from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer @@ -80,6 +80,8 @@ def __call__( self.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs) + init_image = preprocess(init_image) + # encode the init image into latents and scale the latents init_latents = self.vae.encode(init_image.to(self.device)).sample() init_latents = 0.18215 * init_latents diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index eb90530798a4..4f3ad3b2467f 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -3,16 +3,16 @@ import numpy as np import torch -import PIL +import PIL from tqdm.auto import tqdm +from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer + from ...models import AutoencoderKL, UNet2DConditionModel from ...pipeline_utils import DiffusionPipeline from ...schedulers import DDIMScheduler, PNDMScheduler from .safety_checker import StableDiffusionSafetyChecker -from transformers import CLIPFeatureExtractor, CLIPTextModel, CLIPTokenizer - def preprocess_image(image): w, h = image.size diff --git a/src/diffusers/utils/dummy_transformers_objects.py b/src/diffusers/utils/dummy_transformers_objects.py index 34e0c8bec150..dc929427221a 100644 --- a/src/diffusers/utils/dummy_transformers_objects.py +++ b/src/diffusers/utils/dummy_transformers_objects.py @@ -2,16 +2,3 @@ # flake8: noqa from ..utils import DummyObject, requires_backends - -class LDMTextToImagePipeline(metaclass=DummyObject): - _backends = ["transformers"] - - def __init__(self, *args, **kwargs): - requires_backends(self, ["transformers"]) - - -class StableDiffusionPipeline(metaclass=DummyObject): - _backends = ["transformers"] - - def __init__(self, *args, **kwargs): - requires_backends(self, ["transformers"]) diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index 61f0b1c78548..7716d70d6605 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -15,11 +15,13 @@ import tempfile import unittest +from io import BytesIO import numpy as np import torch import PIL +import requests from diffusers import ( DDIMPipeline, DDIMScheduler, @@ -34,9 +36,9 @@ PNDMScheduler, ScoreSdeVePipeline, ScoreSdeVeScheduler, - StableDiffusionPipeline, - StableDiffusionInpaintingPipeline, + StableDiffusionImg2ImgPipeline, StableDiffusionInPaintPipeline, + StableDiffusionPipeline, UNet2DModel, ) from diffusers.pipeline_utils import DiffusionPipeline @@ -395,19 +397,45 @@ def test_lms_stable_diffusion_pipeline(self): @slow @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") - def test_stable_diffusion_pipeline(self): - model_id = "CompVis/stable-diffusion-v1-1" - model_id = "/home/patrick/stable-diffusion-v1-4" + def test_stable_diffusion_img2img_pipeline(self): + model_id = "CompVis/stable-diffusion-v1-4" + pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, use_auth_token=True) - init_image = torch.ones( - mask_image = torch.ones( - pipe = StableDiffusionInPaintPipeline.from_pretrained(model_id, use_auth_token=True).to(torch_device) + # TODO(PVP) - move to hf-internal-testing + url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" + response = requests.get(url) + init_image = PIL.Image.open(BytesIO(response.content)).convert("RGB") + init_image = init_image.resize((768, 512)) - prompt = "a photograph of an astronaut riding a horse" - generator = torch.Generator(device=torch_device).manual_seed(0) - image = pipe([prompt], generator=generator, guidance_scale=7.5, num_inference_steps=10, output_type="numpy")[ - "sample" - ] + prompt = "A fantasy landscape, trending on artstation" + + with torch.autocast("cuda"): + image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"][0] + + image_slice = image[0, -3:, -3:, -1] + assert image.shape == (1, 512, 512, 3) + expected_slice = np.array([0.9077, 0.9254, 0.9181, 0.9227, 0.9213, 0.9367, 0.9399, 0.9406, 0.9024]) + assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 + + @slow + @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") + def test_stable_diffusion_in_paint_pipeline(self): + model_id = "CompVis/stable-diffusion-v1-4" + pipe = StableDiffusionInPaintPipeline.from_pretrained(model_id, use_auth_token=True) + + # TODO(PVP) - move to hf-internal-testing + url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" + response = requests.get(url) + init_image = PIL.Image.open(BytesIO(response.content)).convert("RGB") + init_image = init_image.resize((768, 512)) + mask_image = init_image + + prompt = "A fantasy landscape, trending on artstation" + + with torch.autocast("cuda"): + image = pipe( + prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5 + )["sample"][0] image_slice = image[0, -3:, -3:, -1] assert image.shape == (1, 512, 512, 3) From c90c8d0d2fcee0958c047efcaf8f47afa8ab6bb4 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 23:09:19 +0200 Subject: [PATCH 07/32] up --- examples/README.md | 2 +- examples/inference/README.md | 106 +------------ examples/inference/image_to_image.py | 1 + examples/inference/inpainting.py | 1 + src/diffusers/pipelines/README.md | 143 +++++++++++++++--- .../pipeline_stable_diffusion_img2img.py | 5 +- 6 files changed, 128 insertions(+), 130 deletions(-) create mode 100644 examples/inference/image_to_image.py create mode 100644 examples/inference/inpainting.py diff --git a/examples/README.md b/examples/README.md index ef84f596f2a5..57f54d5a9797 100644 --- a/examples/README.md +++ b/examples/README.md @@ -36,7 +36,7 @@ If you feel like an important examples, we are more than happy to welcome a [Fea Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: | Task | πŸ€— Accelerate | πŸ€— Datasets | Colab -|---|---|:---:|:---:|:---:|:---:| +|---|---|:---:|:---:| | [**Unconditional Image Generation**](https://github.com/huggingface/transformers/tree/main/examples/training/train_unconditional.py) | βœ… | βœ… | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) ## Community diff --git a/examples/inference/README.md b/examples/inference/README.md index 55e17663b5e1..52d66be8e228 100644 --- a/examples/inference/README.md +++ b/examples/inference/README.md @@ -1,106 +1,8 @@ # Inference Examples **The inference examples folder is deprecated and will be removed in a future version**. +**Officially supported inference examples can be found in the [Pipelines folder](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines)**. -- For `Image-to-Image text-guided generation with Stable Diffusion`, please have a look at -- For `In-painting using Stable Diffusion`, please have a look at -- For `Tweak prompts reusing seeds and latents`, please have a look at - - -DELETE THE FOLLOWING - -## Installing the dependencies - -Before running the scipts, make sure to install the library's dependencies: - -```bash -pip install diffusers transformers ftfy -``` - -## - -The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! - -### How to use it - - -```python -from torch import autocast -import requests -from PIL import Image -from io import BytesIO - -from image_to_image import StableDiffusionImg2ImgPipeline, preprocess - -# load the pipeline -device = "cuda" -pipe = StableDiffusionImg2ImgPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - revision="fp16", - torch_dtype=torch.float16, - use_auth_token=True -).to(device) - -# let's download an initial image -url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - -response = requests.get(url) -init_image = Image.open(BytesIO(response.content)).convert("RGB") -init_image = init_image.resize((768, 512)) -init_image = preprocess(init_image) - -prompt = "A fantasy landscape, trending on artstation" - -with autocast("cuda"): - images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"] - -images[0].save("fantasy_landscape.png") -``` -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) - -## Tweak prompts reusing seeds and latents - -You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). - - -## In-painting using Stable Diffusion - -The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. - -### How to use it - -```python -from io import BytesIO - -from torch import autocast -import requests -import PIL - -from inpainting import StableDiffusionInpaintingPipeline - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" - -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) - -device = "cuda" -pipe = StableDiffusionInpaintingPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - revision="fp16", - torch_dtype=torch.float16, - use_auth_token=True -).to(device) - -prompt = "a cat sitting on a bench" -with autocast("cuda"): - images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75)["sample"] - -images[0].save("cat_on_bench.png") -``` - -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/in_painting_with_stable_diffusion_using_diffusers.ipynb) +- For `Image-to-Image text-guided generation with Stable Diffusion`, please have a look at the official [Pipeline examples](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines#examples) +- For `In-painting using Stable Diffusion`, please have a look at the official [Pipeline examples](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines#examples) +- For `Tweak prompts reusing seeds and latents`, please have a look at the official [Pipeline examples](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines#examples) diff --git a/examples/inference/image_to_image.py b/examples/inference/image_to_image.py new file mode 100644 index 000000000000..a636f6608e7c --- /dev/null +++ b/examples/inference/image_to_image.py @@ -0,0 +1 @@ +from diffusers import StableDiffusionImg2ImgPipeline # noqa F401 diff --git a/examples/inference/inpainting.py b/examples/inference/inpainting.py new file mode 100644 index 000000000000..c27a5235d026 --- /dev/null +++ b/examples/inference/inpainting.py @@ -0,0 +1 @@ +from diffusers import StableDiffusionInPaintPipeline as StableDiffusionInpaintingPipeline # noqa F401 diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 358069ed3a94..c6dddbaeb328 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -1,8 +1,8 @@ # 🧨 Diffusers Pipelines Pipelines provide a simply way to run state-of-the-art difffusion models in inference. -Most diffusion models consits of multiple independently-trained models and highly adaptable scheduler -components - all of which are needed to have a functioning end-to-end diffusion model. +Most diffusion systems consits of multiple independently-trained models and highly adaptable scheduler +components - all of which are needed to have a functioning end-to-end diffusion system. As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three indepently trained models: - [Autoencoder](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/vae.py#L392) @@ -14,7 +14,7 @@ As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) All of these components are necessary to run stable diffusion in inference even thought they were trained or created independently from each other. -To that end, we strive to offer all open-sourced, state-of-the-art diffusion models under a unified API. +To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. More specifically, we strive to provide pipelines that - 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implemetation according to the corresponding paper (*e.g.* [LatentDiffusionPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)), - 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section), @@ -27,9 +27,25 @@ If you are looking for *official* training examples, please have a look at [exam ## Pipelines Summary -The following ta - - +The following table summarizes all officially supported pipelines, their corresponding paper, and if +available a colab notebook to directly try them out. + +| Pipeline | Paper | Tasks | Colab +|---|---|:---:|:---:| +| [ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim) | []() | Unconditional Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm) | []() | Unconditional Image Generation | +| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | []() | Text-to-Image Generation | +| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | []() | Unconditional Image Generation | +| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | []() | Unconditional Image Generation | +| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | []() | Unconditional Image Generation | +| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | []() | Unconditional Image Generation | +| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | []() | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| | []() | Image-to-Image Translation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| | []() | Image In-Painting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [stochatic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochatic_karras_ve) | []() | Unconditional Image Generation | + +**Note**: Many pipelines give a very simple examples of how to play around with the diffusion systems as present in the corresponding paper. +However, this is just an example and almost all pipelines can be adapted to use different scheduler components or even different model componets. Some examples are shown in the [Examples](#examples) below. ## Pipelines API @@ -54,22 +70,99 @@ not be used for training. If you want to store the gradients during the forward ## Contribution -As always, we are more than happy about any contribution -Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. -More specifically, this means: - -- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements) and execute the example script. -- **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. -- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. -- **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling -point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. - -### Examples - -- DDPM for unconditional image generation in [pipeline_ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm/pipeline_ddpm.py). -- DDIM for unconditional image generation in [pipeline_ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim/pipeline_ddim.py). -- PNDM for unconditional image generation in [pipeline_pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm/pipeline_pndm.py). -- Latent diffusion for text to image generation / conditional image generation in [pipeline_latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion/pipeline_latent_diffusion.py). -- Glide for text to image generation / conditional image generation in [pipeline_glide](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/glide/pipeline_glide.py). -- BDDMPipeline for spectrogram-to-sound vocoding in [pipeline_bddm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/bddm/pipeline_bddm.py). -- Grad-TTS for text to audio generation / conditional audio generation in [pipeline_grad_tts](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/grad_tts/pipeline_grad_tts.py). +We are more than happy about any contribution to the offically supported pipelines πŸ€—. We aspire +all of our pipelines to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. + +- **Self-contained**: A pipeline shall be as self-contained as possible. More specifically, this means that all functionality should be either directly defined in the pipeline file iteslf, should be inherited from (and only from) the [`DiffusionPipeline` class](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L56) or be directly attached to the model and scheduler components of the pipeline. +- **Easy-to-use**: Pipelines should be extremely easy to use - one should be able to load the pipeline and +use it for its designated task, *e.g.* text-to-image generation, in just a couple of lines of code. Most +logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method. +- **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. + +We try to make the pipeline code as readable as possible so that each part from pre-processing to diffusing to post-processing can easily be adapted. If you would like the community to benefit from your customized pipeline, we would ❀️ to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/commmunity). If however you feel like an important pipeline is missing that deserves to be among the official pipelines in your opinion, a contribution to the [official pipeplines](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines) would be even better πŸ€—. +- **One-purpose-only**: Pipelines should be used for one task and one task only. Even two tasks are from a modeling point of view very similar, *e.g.* image2image translation and in painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*. + +## Examples + +### Image-to-Image text-guided generation with Stable Diffusion + +The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! + +```python +from torch import autocast +import requests +from PIL import Image +from io import BytesIO + +from image_to_image import StableDiffusionImg2ImgPipeline, preprocess + +# load the pipeline +device = "cuda" +pipe = StableDiffusionImg2ImgPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, + use_auth_token=True +).to(device) + +# let's download an initial image +url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" + +response = requests.get(url) +init_image = Image.open(BytesIO(response.content)).convert("RGB") +init_image = init_image.resize((768, 512)) +init_image = preprocess(init_image) + +prompt = "A fantasy landscape, trending on artstation" + +with autocast("cuda"): + images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"] + +images[0].save("fantasy_landscape.png") +``` +You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) + +### Tweak prompts reusing seeds and latents + +You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). + + +### In-painting using Stable Diffusion + +The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. + +```python +from io import BytesIO + +from torch import autocast +import requests +import PIL + +from inpainting import StableDiffusionInpaintingPipeline + +def download_image(url): + response = requests.get(url) + return PIL.Image.open(BytesIO(response.content)).convert("RGB") + +img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" +mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" + +init_image = download_image(img_url).resize((512, 512)) +mask_image = download_image(mask_url).resize((512, 512)) + +device = "cuda" +pipe = StableDiffusionInpaintingPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, + use_auth_token=True +).to(device) + +prompt = "a cat sitting on a bench" +with autocast("cuda"): + images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75)["sample"] + +images[0].save("cat_on_bench.png") +``` + +You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/in_painting_with_stable_diffusion_using_diffusers.ipynb) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index e094c6a535a6..3eae5e7ada03 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -51,7 +51,7 @@ def __init__( def __call__( self, prompt: Union[str, List[str]], - init_image: torch.FloatTensor, + init_image: Union[torch.FloatTensor, PIL.Image], strength: float = 0.8, num_inference_steps: Optional[int] = 50, guidance_scale: Optional[float] = 7.5, @@ -80,7 +80,8 @@ def __call__( self.scheduler.set_timesteps(num_inference_steps, **extra_set_kwargs) - init_image = preprocess(init_image) + if not isinstance(init_image, torch.FloatTensor): + init_image = preprocess(init_image) # encode the init image into latents and scale the latents init_latents = self.vae.encode(init_image.to(self.device)).sample() From 036d00bd8ba4d03cb1efedd9a194a7efdac90207 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Fri, 26 Aug 2022 23:15:24 +0200 Subject: [PATCH 08/32] up --- examples/README.md | 2 +- examples/community/README.md | 209 +----------------- .../utils/dummy_transformers_objects.py | 1 - 3 files changed, 5 insertions(+), 207 deletions(-) diff --git a/examples/README.md b/examples/README.md index 57f54d5a9797..62dfcaaa2721 100644 --- a/examples/README.md +++ b/examples/README.md @@ -44,7 +44,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie In additon, we provide **community** examples, which are examples added and maintained by our community. Community examples can consists of both *training* examples and *inference* pipelines. For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. -Examples that we deem not (yet) popular/important enough to go into the [official training examples]( ) or [official pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines), or that don't fully following the philosophy defined above, should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. +For examples that are not yet deemed popular/important enough, but might be very valuable to the community, or that don't fully following the philosophy defined above, should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. **Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to show to the community how you like to use `diffusers` πŸͺ„. ## Important note diff --git a/examples/community/README.md b/examples/community/README.md index e03d5e569f46..1ff604f708e0 100644 --- a/examples/community/README.md +++ b/examples/community/README.md @@ -1,207 +1,6 @@ -

-
- -
-

-

- - GitHub - - - GitHub release - - - Contributor Covenant - -

+# Community Examples -πŸ€— Diffusers provides pretrained diffusion models across multiple modalities, such as vision and audio, and serves -as a modular toolbox for inference and training of diffusion models. +**Community** examples consits of both inference and training examples that have been added by the community. -More precisely, πŸ€— Diffusers offers: - -- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). -- Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)). -- Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)). -- Training examples to show how to train the most popular diffusion models (see [examples/training](https://github.com/huggingface/diffusers/tree/main/examples/training)). -- Inference examples to show how to create custom pipelines for advanced tasks such as image2image, in-painting (see [examples/inference](https://github.com/huggingface/diffusers/tree/main/examples/inference)) - -## Quickstart - -In order to get started, we recommend taking a look at two notebooks: - -- The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. - Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library. -- The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your - diffuser model on an image dataset, with explanatory graphics. - -## **New 🎨🎨🎨** Stable Diffusion is now fully compatible with `diffusers`! - -Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. -See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. - -You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-3), read the license and tick the checkbox if you agree. You have to be a registered user in πŸ€— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. - -```py -# make sure you're logged in with `huggingface-cli login` -from torch import autocast -from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler - -lms = LMSDiscreteScheduler( - beta_start=0.00085, - beta_end=0.012, - beta_schedule="scaled_linear" -) - -pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-3", - scheduler=lms, - use_auth_token=True -).to("cuda") - -prompt = "a photo of an astronaut riding a horse on mars" -with autocast("cuda"): - image = pipe(prompt)["sample"][0] - -image.save("astronaut_rides_horse.png") -``` - -For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) -and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0). - -## Examples - -There are many ways to try running Diffusers! Here we outline code-focused tools (primarily using `DiffusionPipeline`s and Google Colab) and interactive web-tools. - -### Running Code - -If you want to run the code yourself πŸ’», you can try out: -- [Text-to-Image Latent Diffusion](https://huggingface.co/CompVis/ldm-text2im-large-256) -```python -# !pip install diffusers transformers -from diffusers import DiffusionPipeline - -model_id = "CompVis/ldm-text2im-large-256" - -# load model and scheduler -ldm = DiffusionPipeline.from_pretrained(model_id) - -# run pipeline in inference (sample random noise and denoise) -prompt = "A painting of a squirrel eating a burger" -images = ldm([prompt], num_inference_steps=50, eta=0.3, guidance_scale=6)["sample"] - -# save images -for idx, image in enumerate(images): - image.save(f"squirrel-{idx}.png") -``` -- [Unconditional Diffusion with discrete scheduler](https://huggingface.co/google/ddpm-celebahq-256) -```python -# !pip install diffusers -from diffusers import DDPMPipeline, DDIMPipeline, PNDMPipeline - -model_id = "google/ddpm-celebahq-256" - -# load model and scheduler -ddpm = DDPMPipeline.from_pretrained(model_id) # you can replace DDPMPipeline with DDIMPipeline or PNDMPipeline for faster inference - -# run pipeline in inference (sample random noise and denoise) -image = ddpm()["sample"] - -# save image -image[0].save("ddpm_generated_image.png") -``` -- [Unconditional Latent Diffusion](https://huggingface.co/CompVis/ldm-celebahq-256) -- [Unconditional Diffusion with continous scheduler](https://huggingface.co/google/ncsnpp-ffhq-1024) - -**Other Notebooks**: -* [image-to-image generation with Stable Diffusion](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg), -* [tweak images via repeated Stable Diffusion seeds](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) ![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg), - -### Web Demos -If you just want to play around with some web demos, you can try out the following πŸš€ Spaces: -| Model | Hugging Face Spaces | -|-------------------------------- |------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| Text-to-Image Latent Diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/text2img-latent-diffusion) | -| Faces generator | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/CompVis/celeba-latent-diffusion) | -| DDPM with different schedulers | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/fusing/celeba-diffusion) | -| Conditional generation from sketch (*SOON*) | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/huggingface/diffuse-the-rest) | -| Composable diffusion | [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Shuang59/Composable-Diffusion) | - -## Definitions - -**Models**: Neural network that models $p_\theta(\mathbf{x}_{t-1}|\mathbf{x}_t)$ (see image below) and is trained end-to-end to *denoise* a noisy input to an image. -*Examples*: UNet, Conditioned UNet, 3D UNet, Transformer UNet - -

- -
- Figure from DDPM paper (https://arxiv.org/abs/2006.11239). -

- -**Schedulers**: Algorithm class for both **inference** and **training**. -The class provides functionality to compute previous image according to alpha, beta schedule as well as predict noise for training. -*Examples*: [DDPM](https://arxiv.org/abs/2006.11239), [DDIM](https://arxiv.org/abs/2010.02502), [PNDM](https://arxiv.org/abs/2202.09778), [DEIS](https://arxiv.org/abs/2204.13902) - -

- -
- Sampling and training algorithms. Figure from DDPM paper (https://arxiv.org/abs/2006.11239). -

- - -**Diffusion Pipeline**: End-to-end pipeline that includes multiple diffusion models, possible text encoders, ... -*Examples*: Glide, Latent-Diffusion, Imagen, DALL-E 2 - -

- -
- Figure from ImageGen (https://imagen.research.google/). -

- -## Philosophy - -- Readability and clarity is prefered over highly optimized code. A strong importance is put on providing readable, intuitive and elementary code design. *E.g.*, the provided [schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers) are separated from the provided [models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models) and provide well-commented code that can be read alongside the original paper. -- Diffusers is **modality independent** and focuses on providing pretrained models and tools to build systems that generate **continous outputs**, *e.g.* vision and audio. -- Diffusion models and schedulers are provided as concise, elementary building blocks. In contrast, diffusion pipelines are a collection of end-to-end diffusion systems that can be used out-of-the-box, should stay as close as possible to their original implementation and can include components of another library, such as text-encoders. Examples for diffusion pipelines are [Glide](https://github.com/openai/glide-text2im) and [Latent Diffusion](https://github.com/CompVis/latent-diffusion). - -## Installation - -**With `pip`** - -```bash -pip install --upgrade diffusers # should install diffusers 0.2.4 -``` - -**With `conda`** - -```sh -conda install -c conda-forge diffusers -``` - -## In the works - -For the first release, πŸ€— Diffusers focuses on text-to-image diffusion techniques. However, diffusers can be used for much more than that! Over the upcoming releases, we'll be focusing on: - -- Diffusers for audio -- Diffusers for reinforcement learning (initial work happening in https://github.com/huggingface/diffusers/pull/105). -- Diffusers for video generation -- Diffusers for molecule generation (initial work happening in https://github.com/huggingface/diffusers/pull/54) - -A few pipeline components are already being worked on, namely: - -- BDDMPipeline for spectrogram-to-sound vocoding -- GLIDEPipeline to support OpenAI's GLIDE model -- Grad-TTS for text to audio generation / conditional audio generation - -We want diffusers to be a toolbox useful for diffusers models in general; if you find yourself limited in any way by the current API, or would like to see additional models, schedulers, or techniques, please open a [GitHub issue](https://github.com/huggingface/diffusers/issues) mentioning what you would like to see. - -## Credits - -This library concretizes previous work by many different authors and would not have been possible without their great research and implementations. We'd like to thank, in particular, the following implementations which have helped us in our development and without which the API could not have been as polished today: - -- @CompVis' latent diffusion models library, available [here](https://github.com/CompVis/latent-diffusion) -- @hojonathanho original DDPM implementation, available [here](https://github.com/hojonathanho/diffusion) as well as the extremely useful translation into PyTorch by @pesser, available [here](https://github.com/pesser/pytorch_diffusion) -- @ermongroup's DDIM implementation, available [here](https://github.com/ermongroup/ddim). -- @yang-song's Score-VE and Score-VP implementations, available [here](https://github.com/yang-song/score_sde_pytorch) - -We also want to thank @heejkoo for the very helpful overview of papers, code and resources on diffusion models, available [here](https://github.com/heejkoo/Awesome-Diffusion-Models) as well as @crowsonkb and @rromb for useful discussions and insights. +| Example | Description | Author | | +|:----------|:-------------|:-------------|------:| diff --git a/src/diffusers/utils/dummy_transformers_objects.py b/src/diffusers/utils/dummy_transformers_objects.py index dc929427221a..753e3fdbe291 100644 --- a/src/diffusers/utils/dummy_transformers_objects.py +++ b/src/diffusers/utils/dummy_transformers_objects.py @@ -1,4 +1,3 @@ # This file is autogenerated by the command `make fix-copies`, do not edit. # flake8: noqa from ..utils import DummyObject, requires_backends - From 62adb64d1bc87608c66d2b438f9b331aaa886b9c Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 11:02:45 +0200 Subject: [PATCH 09/32] Apply suggestions from code review Co-authored-by: Nathan Lambert Co-authored-by: Pedro Cuenca --- examples/README.md | 10 +++++----- src/diffusers/pipelines/README.md | 22 +++++++++++----------- 2 files changed, 16 insertions(+), 16 deletions(-) diff --git a/examples/README.md b/examples/README.md index 62dfcaaa2721..0c93bc7fc432 100644 --- a/examples/README.md +++ b/examples/README.md @@ -23,15 +23,15 @@ please have a look at [src/diffusers/pipelines](https://github.com/huggingface/d Our examples aspire to be **self-contained**, **easy-to-tweak**, **beginner-friendly** and for **one-purpose-only**. More specifically, this means: -- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements) and execute the example script. +- **Self-contained**: An example script shall only depend on "pip-install-able" Python packages that can be found in a `requirements.txt` file. Example scripts shall **not** depend on any local files. This means that one can simply download an example script, *e.g.* [train_unconditional.py](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/train_unconditional.py), install the required dependencies, *e.g.* [requirements.txt](https://github.com/huggingface/diffusers/blob/main/examples/unconditional_image_generation/requirements.txt) and execute the example script. - **Easy-to-tweak**: While we strive to present as many use cases as possible, the example scripts are just that - examples. It is expected that they won't work out-of-the box on your specific problem and that you will be required to change a few lines of code to adapt them to your needs. To help you with that, most of the examples fully expose the preprocessing of the data and the training loop to allow you to tweak and edit them as required. -- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand Diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. +- **Beginner-friendly**: We do not aim for providing state-of-the-art training scripts for the newest models, but rather examples that can be used as a way to better understand diffusion models and how to use them with the `diffusers` library. We often purposefully leave out certain state-of-the-art methods if we consider them too complex for beginners. - **One-purpose-only**: Examples should show one task and one task only. Even if a task is from a modeling point of view very similar, *e.g.* image super-resolution and image modification tend to use the same model and training method, we want examples to showcase only one task to keep them as readable and easy-to-understand as possible. We provide **official** examples that cover the most popular tasks of diffusion models. *Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. -If you feel like an important examples, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you πŸ€—. +If you feel like another important should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you . Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: @@ -42,9 +42,9 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie ## Community In additon, we provide **community** examples, which are examples added and maintained by our community. -Community examples can consists of both *training* examples and *inference* pipelines. +Community examples can consist of both *training* examples or *inference* pipelines. For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. -For examples that are not yet deemed popular/important enough, but might be very valuable to the community, or that don't fully following the philosophy defined above, should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. +Examples that are useful for the community, but are either not yet deemed popular or not yet following our above philosophy should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. **Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to show to the community how you like to use `diffusers` πŸͺ„. ## Important note diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index c6dddbaeb328..1c2ef9a9b64e 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -1,7 +1,7 @@ # 🧨 Diffusers Pipelines -Pipelines provide a simply way to run state-of-the-art difffusion models in inference. -Most diffusion systems consits of multiple independently-trained models and highly adaptable scheduler +Pipelines provide a simple way to run state-of-the-art diffusion models in inference. +Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler components - all of which are needed to have a functioning end-to-end diffusion system. As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three indepently trained models: @@ -11,7 +11,7 @@ As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) - a scheduler component, [scheduler](https://github.com/huggingface/diffusers/blob/main/src/diffusers/schedulers/scheduling_pndm.py), - a [CLIPFeatureExtractor](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPFeatureExtractor), - as well as a [safety checker](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/safety_checker.py). -All of these components are necessary to run stable diffusion in inference even thought they were trained +All of these components are necessary to run stable diffusion in inference even though they were trained or created independently from each other. To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. @@ -21,7 +21,7 @@ More specifically, we strive to provide pipelines that - 3. are easy to understand with code that is self-explanatory and be can read along-side the official paper (see [Pipelines summary](#pipelines-summary)), - 4. can easily be contributed by the community (see the [Contribution](#contribution) section). -**Note** that pipelines do and should not offer any training functionality. +**Note** that pipelines do not (and should not) offer any training functionality. If you are looking for *official* training examples, please have a look at [examples](https://github.com/huggingface/diffusers/tree/main/examples). @@ -44,12 +44,12 @@ available a colab notebook to directly try them out. | | []() | Image In-Painting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) | [stochatic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochatic_karras_ve) | []() | Unconditional Image Generation | -**Note**: Many pipelines give a very simple examples of how to play around with the diffusion systems as present in the corresponding paper. -However, this is just an example and almost all pipelines can be adapted to use different scheduler components or even different model componets. Some examples are shown in the [Examples](#examples) below. +**Note**: Many pipelines provide very simple examples of how to play around with the diffusion systems as described in the corresponding papers. +However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the [Examples](#examples) below. ## Pipelines API -Diffusion models often consists of multiple independently trained models or created componets. +Diffusion models often consist of multiple independently-trained models or other previously existing components. Each model has been trained indepently on a different task and the scheduler can easily be swapped out against another scheduler. @@ -59,10 +59,10 @@ During inference, we however want to be able to easily load all components and u "./stable-diffusion". To correctly retrieve which models and components should be loaded one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be loaded into the pipelines. More specifically, for each model/component one needs to define the format `: ["", ""]`. `` is the attribute name given to the loaded instance of `` which can be found in the library or pipeline folder called `""`. - [`save_pretrained`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L90) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. -In additon, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated +In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated from the local path. - [`to`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L118) which accepts a `string` or `torch.device` to move all models that are of type `torch.nn.Module` to the passed device. The behavior is fully analogous to [PyTorch's `to` method](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.to). -- [`__call__`] method to use the pipeline in inference. The `__call__` defines infenence logic of the pipeline and should ideally encompass all parts from pre-processing to fowarding tensors to the different models and scheduler components as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for +- [`__call__`] method to use the pipeline in inference. `__call__` defines inference logic of the pipeline and should ideally encompass all aspects of it, from pre-processing to forwarding tensors to the different models and schedulers, as well as post-processing. The API of the `__call__` method can strongly vary from pipeline to pipeline. *E.g.* a text-to-image pipeline, such as [`StableDiffusionPipeline`](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py) should accept among other things the text prompt to generate the image. A pure image generation pipeline, such as [DDPMPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/ddpm) on the other hand can be run without providing any inputs. To better understand what inputs can be adapted for each pipeline, one should look directly into the respective pipeline. **Note**: All pipelines have PyTorch's autograd disabled by decorating the `__call__` method with a [`torch.no_grad`](https://pytorch.org/docs/stable/generated/torch.no_grad.html) decorator because pipelines should @@ -79,8 +79,8 @@ use it for its designated task, *e.g.* text-to-image generation, in just a coupl logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method. - **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. -We try to make the pipeline code as readable as possible so that each part from pre-processing to diffusing to post-processing can easily be adapted. If you would like the community to benefit from your customized pipeline, we would ❀️ to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/commmunity). If however you feel like an important pipeline is missing that deserves to be among the official pipelines in your opinion, a contribution to the [official pipeplines](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines) would be even better πŸ€—. -- **One-purpose-only**: Pipelines should be used for one task and one task only. Even two tasks are from a modeling point of view very similar, *e.g.* image2image translation and in painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*. +We try to make the pipeline code as readable as possible so that each part from pre-processing to diffusing to post-processing can easily be adapted. If you would like the community to benefit from your customized pipeline, we would to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/commmunity). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines) would be even better . +- **One-purpose-only**: Pipelines should be used for one task and one task only. Even if two tasks are very similar from a modeling point of view, *e.g.* image2image translation and in-painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*. ## Examples From fa0a5dca15b4ef7b8eebd9e66a795d090edd74e7 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 19:26:27 +0000 Subject: [PATCH 10/32] up --- .../pipelines/stable_diffusion/__init__.py | 10 +++--- .../pipeline_stable_diffusion_img2img.py | 2 +- .../pipeline_stable_diffusion_inpaint.py | 4 +-- tests/test_pipelines.py | 33 +++++++++++-------- 4 files changed, 26 insertions(+), 23 deletions(-) diff --git a/src/diffusers/pipelines/stable_diffusion/__init__.py b/src/diffusers/pipelines/stable_diffusion/__init__.py index 49bb7978c5d0..1d2d269f49cb 100644 --- a/src/diffusers/pipelines/stable_diffusion/__init__.py +++ b/src/diffusers/pipelines/stable_diffusion/__init__.py @@ -3,9 +3,7 @@ if is_transformers_available(): - from .pipeline_stable_diffusion import ( - StableDiffusionImg2ImgPipeline, - StableDiffusionInPaintPipeline, - StableDiffusionPipeline, - StableDiffusionSafetyChecker, - ) + from .safety_checker import StableDiffusionSafetyChecker + from .pipeline_stable_diffusion import StableDiffusionPipeline + from .pipeline_stable_diffusion_img2img import import StableDiffusionImg2ImgPipeline + from .pipeline_stable_diffusion_inpaint import StableDiffusionInPaintPipeline diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index 3eae5e7ada03..25ab75e45c58 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -51,7 +51,7 @@ def __init__( def __call__( self, prompt: Union[str, List[str]], - init_image: Union[torch.FloatTensor, PIL.Image], + init_image: Union[torch.FloatTensor, PIL.Image.Image], strength: float = 0.8, num_inference_steps: Optional[int] = 50, guidance_scale: Optional[float] = 7.5, diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index 4f3ad3b2467f..0b826dfe58f3 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -64,8 +64,8 @@ def __init__( def __call__( self, prompt: Union[str, List[str]], - init_image: torch.FloatTensor, - mask_image: torch.FloatTensor, + init_image: Union[torch.FloatTensor, PIL.Image.Image], + mask_image: Union[torch.FloatTensor, PIL.Image.Image], strength: float = 0.8, num_inference_steps: Optional[int] = 50, guidance_scale: Optional[float] = 7.5, diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index 0f78f14c138b..af928f66eb19 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -20,8 +20,9 @@ import numpy as np import torch +from datasets import load_dataset + import PIL -import requests from diffusers import ( DDIMPipeline, DDIMScheduler, @@ -425,11 +426,12 @@ def test_stable_diffusion_img2img_pipeline(self): response = requests.get(url) init_image = PIL.Image.open(BytesIO(response.content)).convert("RGB") init_image = init_image.resize((768, 512)) + init_ prompt = "A fantasy landscape, trending on artstation" - with torch.autocast("cuda"): - image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"][0] + generator = torch.Generator(device=torch_device).manual_seed(0) + image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)["sample"][0] image_slice = image[0, -3:, -3:, -1] assert image.shape == (1, 512, 512, 3) @@ -439,22 +441,25 @@ def test_stable_diffusion_img2img_pipeline(self): @slow @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") def test_stable_diffusion_in_paint_pipeline(self): + ds = load_dataset("hf-internal-testing/diffusers-images", split="train") + + init_image = ds[1]["image"].resize((768, 512)) + mask_image = ds[2]["image"].resize((768, 512)) + output_image = ds[3]["image"].resize((768, 512)) + model_id = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionInPaintPipeline.from_pretrained(model_id, use_auth_token=True) + pipe.to(torch_device) - # TODO(PVP) - move to hf-internal-testing - url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - response = requests.get(url) - init_image = PIL.Image.open(BytesIO(response.content)).convert("RGB") - init_image = init_image.resize((768, 512)) - mask_image = init_image + prompt = "A red cat sitting on a parking bench" - prompt = "A fantasy landscape, trending on artstation" + generator = torch.Generator(device=torch_device).manual_seed(0) + image = pipe( + prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5, generator=generator + )["sample"][0] - with torch.autocast("cuda"): - image = pipe( - prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5 - )["sample"][0] + image.save("/home/patrick/diffusers-images/in_paint/red_cat_sitting_on_a_parking_bench.png") + import ipdb; ipdb.set_trace() image_slice = image[0, -3:, -3:, -1] assert image.shape == (1, 512, 512, 3) From d7c2ca8e526663913e5bdcb4b3ae926c62897423 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 21:29:07 +0200 Subject: [PATCH 11/32] make deterministic --- src/diffusers/models/vae.py | 4 ++-- .../stable_diffusion/pipeline_stable_diffusion_inpaint.py | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/src/diffusers/models/vae.py b/src/diffusers/models/vae.py index adbadbeac99a..009db1561d9e 100644 --- a/src/diffusers/models/vae.py +++ b/src/diffusers/models/vae.py @@ -293,8 +293,8 @@ def __init__(self, parameters, deterministic=False): if self.deterministic: self.var = self.std = torch.zeros_like(self.mean).to(device=self.parameters.device) - def sample(self): - x = self.mean + self.std * torch.randn(self.mean.shape).to(device=self.parameters.device) + def sample(self, generator=None): + x = self.mean + self.std * torch.randn(self.mean.shape, generator=generator, device=self.parameters.device) return x def kl(self, other=None): diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index 4f3ad3b2467f..d09518c1b673 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -98,7 +98,7 @@ def __call__( init_image = preprocess_image(init_image).to(self.device) # encode the init image into latents and scale the latents - init_latents = self.vae.encode(init_image).sample() + init_latents = self.vae.encode(init_image, generator=generator).sample() init_latents = 0.18215 * init_latents # prepare init_latents noise to latents From 5eafc4f72fa36f9c6028e4912178870b700b6849 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 19:35:44 +0000 Subject: [PATCH 12/32] up --- src/diffusers/pipelines/stable_diffusion/__init__.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/diffusers/pipelines/stable_diffusion/__init__.py b/src/diffusers/pipelines/stable_diffusion/__init__.py index 1d2d269f49cb..631f6de217e1 100644 --- a/src/diffusers/pipelines/stable_diffusion/__init__.py +++ b/src/diffusers/pipelines/stable_diffusion/__init__.py @@ -5,5 +5,5 @@ if is_transformers_available(): from .safety_checker import StableDiffusionSafetyChecker from .pipeline_stable_diffusion import StableDiffusionPipeline - from .pipeline_stable_diffusion_img2img import import StableDiffusionImg2ImgPipeline + from .pipeline_stable_diffusion_img2img import StableDiffusionImg2ImgPipeline from .pipeline_stable_diffusion_inpaint import StableDiffusionInPaintPipeline From bfebb41106e0e75bcfd11158b35a1ec1c29b1a01 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 21:36:28 +0200 Subject: [PATCH 13/32] better --- tests/test_pipelines.py | 32 ++++++++++++++++---------------- 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index af928f66eb19..ad2fcf7a11cb 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -15,7 +15,6 @@ import tempfile import unittest -from io import BytesIO import numpy as np import torch @@ -418,25 +417,26 @@ def test_lms_stable_diffusion_pipeline(self): @slow @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") def test_stable_diffusion_img2img_pipeline(self): + ds = load_dataset("hf-internal-testing/diffusers-images", split="train") + + init_image = ds[0]["image"].resize((768, 512)) + output_image = ds[4]["image"].resize((768, 512)) + model_id = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, use_auth_token=True) - # TODO(PVP) - move to hf-internal-testing - url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - response = requests.get(url) - init_image = PIL.Image.open(BytesIO(response.content)).convert("RGB") - init_image = init_image.resize((768, 512)) - init_ - prompt = "A fantasy landscape, trending on artstation" generator = torch.Generator(device=torch_device).manual_seed(0) image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)["sample"][0] - image_slice = image[0, -3:, -3:, -1] + image.save("/home/patrick/diffusers-images/in_paint/red_cat_sitting_on_a_parking_bench.png") assert image.shape == (1, 512, 512, 3) - expected_slice = np.array([0.9077, 0.9254, 0.9181, 0.9227, 0.9213, 0.9367, 0.9399, 0.9406, 0.9024]) - assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 + + expected_array = np.array(output_image) + sampled_array = np.array(image) + + assert np.max(np.abs(sampled_array - expected_array)) < 1e-3 @slow @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") @@ -459,9 +459,9 @@ def test_stable_diffusion_in_paint_pipeline(self): )["sample"][0] image.save("/home/patrick/diffusers-images/in_paint/red_cat_sitting_on_a_parking_bench.png") - import ipdb; ipdb.set_trace() - - image_slice = image[0, -3:, -3:, -1] assert image.shape == (1, 512, 512, 3) - expected_slice = np.array([0.9077, 0.9254, 0.9181, 0.9227, 0.9213, 0.9367, 0.9399, 0.9406, 0.9024]) - assert np.abs(image_slice.flatten() - expected_slice).max() < 1e-2 + + expected_array = np.array(output_image) + sampled_array = np.array(image) + + assert np.max(np.abs(sampled_array - expected_array)) < 1e-3 From f662b5148724e745a8c3c603f777c1a7a5919f8d Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 21:38:42 +0200 Subject: [PATCH 14/32] up --- examples/README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/examples/README.md b/examples/README.md index 0c93bc7fc432..36846c89bdc7 100644 --- a/examples/README.md +++ b/examples/README.md @@ -15,7 +15,8 @@ limitations under the License. # 🧨 Diffusers Examples -Diffusers examples are a collection of best-practices on how to use the `diffusers` library. +Diffusers examples are a collection of scripts to demonstrate how to effectively use the `diffusers` library +for a variety of use cases. **Note**: If you are looking for **official** examples on how to use `diffusers` for inference, please have a look at [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines) From 8310d6939443316da4b61d4809d79b908e3c0549 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 21:50:37 +0200 Subject: [PATCH 15/32] add generator to img2img pipe --- .../stable_diffusion/pipeline_stable_diffusion_img2img.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index 25ab75e45c58..3dac53bd9bd1 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -84,7 +84,7 @@ def __call__( init_image = preprocess(init_image) # encode the init image into latents and scale the latents - init_latents = self.vae.encode(init_image.to(self.device)).sample() + init_latents = self.vae.encode(init_image.to(self.device)).sample(generator=generator) init_latents = 0.18215 * init_latents # prepare init_latents noise to latents From 4588f72e89be24f088661057e095c3338a3439d4 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 19:51:10 +0000 Subject: [PATCH 16/32] save --- .../pipeline_stable_diffusion_inpaint.py | 2 +- tests/test_pipelines.py | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index b0bbfe347995..9267fa1bb3b9 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -98,7 +98,7 @@ def __call__( init_image = preprocess_image(init_image).to(self.device) # encode the init image into latents and scale the latents - init_latents = self.vae.encode(init_image, generator=generator).sample() + init_latents = self.vae.encode(init_image).sample(generator=generator) init_latents = 0.18215 * init_latents # prepare init_latents noise to latents diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index ad2fcf7a11cb..7da5290630d6 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -420,7 +420,7 @@ def test_stable_diffusion_img2img_pipeline(self): ds = load_dataset("hf-internal-testing/diffusers-images", split="train") init_image = ds[0]["image"].resize((768, 512)) - output_image = ds[4]["image"].resize((768, 512)) + output_image = ds[3]["image"].resize((768, 512)) model_id = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, use_auth_token=True) @@ -430,13 +430,15 @@ def test_stable_diffusion_img2img_pipeline(self): generator = torch.Generator(device=torch_device).manual_seed(0) image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)["sample"][0] - image.save("/home/patrick/diffusers-images/in_paint/red_cat_sitting_on_a_parking_bench.png") - assert image.shape == (1, 512, 512, 3) + import ipdb; ipdb.set_trace() + + image.save("/home/patrick/diffusers-images/img2img/fanasty_landscape.png") expected_array = np.array(output_image) sampled_array = np.array(image) - assert np.max(np.abs(sampled_array - expected_array)) < 1e-3 + assert sampled_array.shape == (512, 768, 3) + assert np.max(np.abs(sampled_array - expected_array)) < 1e-4 @slow @unittest.skipIf(torch_device == "cpu", "Stable diffusion is supposed to run on GPU") @@ -458,10 +460,8 @@ def test_stable_diffusion_in_paint_pipeline(self): prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5, generator=generator )["sample"][0] - image.save("/home/patrick/diffusers-images/in_paint/red_cat_sitting_on_a_parking_bench.png") - assert image.shape == (1, 512, 512, 3) - expected_array = np.array(output_image) sampled_array = np.array(image) + assert sampled_array.shape == (512, 768, 3) assert np.max(np.abs(sampled_array - expected_array)) < 1e-3 From 4d7d714d55f619038b3ad36ea4cd84d07e89abc6 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Mon, 29 Aug 2022 19:57:08 +0000 Subject: [PATCH 17/32] make pipelines deterministic --- tests/test_pipelines.py | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index 7da5290630d6..8f1cd8d49d6f 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -419,21 +419,18 @@ def test_lms_stable_diffusion_pipeline(self): def test_stable_diffusion_img2img_pipeline(self): ds = load_dataset("hf-internal-testing/diffusers-images", split="train") - init_image = ds[0]["image"].resize((768, 512)) - output_image = ds[3]["image"].resize((768, 512)) + init_image = ds[1]["image"].resize((768, 512)) + output_image = ds[0]["image"].resize((768, 512)) model_id = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, use_auth_token=True) + pipe.to(torch_device) prompt = "A fantasy landscape, trending on artstation" generator = torch.Generator(device=torch_device).manual_seed(0) image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)["sample"][0] - import ipdb; ipdb.set_trace() - - image.save("/home/patrick/diffusers-images/img2img/fanasty_landscape.png") - expected_array = np.array(output_image) sampled_array = np.array(image) @@ -445,9 +442,9 @@ def test_stable_diffusion_img2img_pipeline(self): def test_stable_diffusion_in_paint_pipeline(self): ds = load_dataset("hf-internal-testing/diffusers-images", split="train") - init_image = ds[1]["image"].resize((768, 512)) - mask_image = ds[2]["image"].resize((768, 512)) - output_image = ds[3]["image"].resize((768, 512)) + init_image = ds[2]["image"].resize((768, 512)) + mask_image = ds[3]["image"].resize((768, 512)) + output_image = ds[4]["image"].resize((768, 512)) model_id = "CompVis/stable-diffusion-v1-4" pipe = StableDiffusionInPaintPipeline.from_pretrained(model_id, use_auth_token=True) From 6338a9198404b66a633fd45977be3fac2803a147 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 13:53:57 +0200 Subject: [PATCH 18/32] Update src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py Co-authored-by: Anton Lozhkov --- .../stable_diffusion/pipeline_stable_diffusion_inpaint.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index 9267fa1bb3b9..bf879cf205e8 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -37,7 +37,7 @@ def preprocess_mask(mask): return mask -class StableDiffusionInPaintPipeline(DiffusionPipeline): +class StableDiffusionInpaintPipeline(DiffusionPipeline): def __init__( self, vae: AutoencoderKL, From eff421bd93acb150aecb6e8eda0c22ebd4f7ec4a Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 11:54:28 +0000 Subject: [PATCH 19/32] apply all changes --- examples/inference/inpainting.py | 2 +- src/diffusers/__init__.py | 2 +- src/diffusers/pipelines/__init__.py | 2 +- src/diffusers/pipelines/stable_diffusion/__init__.py | 2 +- tests/test_pipelines.py | 4 ++-- 5 files changed, 6 insertions(+), 6 deletions(-) diff --git a/examples/inference/inpainting.py b/examples/inference/inpainting.py index c27a5235d026..c1777dc1a9b4 100644 --- a/examples/inference/inpainting.py +++ b/examples/inference/inpainting.py @@ -1 +1 @@ -from diffusers import StableDiffusionInPaintPipeline as StableDiffusionInpaintingPipeline # noqa F401 +from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpaintingPipeline # noqa F401 diff --git a/src/diffusers/__init__.py b/src/diffusers/__init__.py index 232727b637cb..d7292e7426f6 100644 --- a/src/diffusers/__init__.py +++ b/src/diffusers/__init__.py @@ -41,7 +41,7 @@ from .pipelines import ( LDMTextToImagePipeline, StableDiffusionImg2ImgPipeline, - StableDiffusionInPaintPipeline, + StableDiffusionInpaintPipeline, StableDiffusionPipeline, ) else: diff --git a/src/diffusers/pipelines/__init__.py b/src/diffusers/pipelines/__init__.py index 709c5451eca6..25f6c928b7f4 100644 --- a/src/diffusers/pipelines/__init__.py +++ b/src/diffusers/pipelines/__init__.py @@ -15,6 +15,6 @@ from .latent_diffusion import LDMTextToImagePipeline from .stable_diffusion import ( StableDiffusionImg2ImgPipeline, - StableDiffusionInPaintPipeline, + StableDiffusionInpaintPipeline, StableDiffusionPipeline, ) diff --git a/src/diffusers/pipelines/stable_diffusion/__init__.py b/src/diffusers/pipelines/stable_diffusion/__init__.py index 631f6de217e1..64fa2ab69ec0 100644 --- a/src/diffusers/pipelines/stable_diffusion/__init__.py +++ b/src/diffusers/pipelines/stable_diffusion/__init__.py @@ -6,4 +6,4 @@ from .safety_checker import StableDiffusionSafetyChecker from .pipeline_stable_diffusion import StableDiffusionPipeline from .pipeline_stable_diffusion_img2img import StableDiffusionImg2ImgPipeline - from .pipeline_stable_diffusion_inpaint import StableDiffusionInPaintPipeline + from .pipeline_stable_diffusion_inpaint import StableDiffusionInpaintPipeline diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index dbb20a9b4cae..e5aece440a5d 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -37,7 +37,7 @@ ScoreSdeVePipeline, ScoreSdeVeScheduler, StableDiffusionImg2ImgPipeline, - StableDiffusionInPaintPipeline, + StableDiffusionInpaintPipeline, StableDiffusionPipeline, UNet2DModel, ) @@ -470,7 +470,7 @@ def test_stable_diffusion_in_paint_pipeline(self): output_image = ds[4]["image"].resize((768, 512)) model_id = "CompVis/stable-diffusion-v1-4" - pipe = StableDiffusionInPaintPipeline.from_pretrained(model_id, use_auth_token=True) + pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, use_auth_token=True) pipe.to(torch_device) prompt = "A red cat sitting on a parking bench" From b73e0a1e3c7aebf28a8c1761ce59d2d4baccc42f Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 11:57:05 +0000 Subject: [PATCH 20/32] more correctnios --- src/diffusers/pipelines/README.md | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 1c2ef9a9b64e..cb8ac4c9f91c 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -94,7 +94,7 @@ import requests from PIL import Image from io import BytesIO -from image_to_image import StableDiffusionImg2ImgPipeline, preprocess +from diffusers import StableDiffusionImg2ImgPipeline # load the pipeline device = "cuda" @@ -111,7 +111,6 @@ url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/st response = requests.get(url) init_image = Image.open(BytesIO(response.content)).convert("RGB") init_image = init_image.resize((768, 512)) -init_image = preprocess(init_image) prompt = "A fantasy landscape, trending on artstation" @@ -138,7 +137,7 @@ from torch import autocast import requests import PIL -from inpainting import StableDiffusionInpaintingPipeline +from diffusers import StableDiffusionInpaintPipeline def download_image(url): response = requests.get(url) From d8bcdd6f78924fe4a1edb80067e653bd24da75e5 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 12:16:50 +0000 Subject: [PATCH 21/32] finish --- README.md | 93 ++++++++++++++++++- src/diffusers/pipelines/README.md | 53 ++++++++--- .../pipelines/stable_diffusion/__init__.py | 2 +- tests/test_pipelines.py | 14 ++- 4 files changed, 141 insertions(+), 21 deletions(-) diff --git a/README.md b/README.md index e03d5e569f46..49f9d5fa4d45 100644 --- a/README.md +++ b/README.md @@ -42,7 +42,9 @@ See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more i You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-3), read the license and tick the checkbox if you agree. You have to be a registered user in πŸ€— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. -```py +### Text-to-Image generation with Stable Diffusion + +```python # make sure you're logged in with `huggingface-cli login` from torch import autocast from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler @@ -54,10 +56,13 @@ lms = LMSDiscreteScheduler( ) pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-3", + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, scheduler=lms, use_auth_token=True -).to("cuda") +) +pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" with autocast("cuda"): @@ -66,6 +71,88 @@ with autocast("cuda"): image.save("astronaut_rides_horse.png") ``` +### Image-to-Image text-guided generation with Stable Diffusion + +The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! + +```python +from torch import autocast +import requests +from PIL import Image +from io import BytesIO + +from diffusers import StableDiffusionImg2ImgPipeline + +# load the pipeline +device = "cuda" +pipe = StableDiffusionImg2ImgPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, + use_auth_token=True +) +pipe = pipe.to(device) + +# let's download an initial image +url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" + +response = requests.get(url) +init_image = Image.open(BytesIO(response.content)).convert("RGB") +init_image = init_image.resize((768, 512)) + +prompt = "A fantasy landscape, trending on artstation" + +with autocast("cuda"): + images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"] + +images[0].save("fantasy_landscape.png") +``` +You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) + +### In-painting using Stable Diffusion + +The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. + +```python +from io import BytesIO + +from torch import autocast +import requests +import PIL + +from diffusers import StableDiffusionInpaintPipeline + +def download_image(url): + response = requests.get(url) + return PIL.Image.open(BytesIO(response.content)).convert("RGB") + +img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" +mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" + +init_image = download_image(img_url).resize((512, 512)) +mask_image = download_image(mask_url).resize((512, 512)) + +device = "cuda" +pipe = StableDiffusionInpaintingPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + revision="fp16", + torch_dtype=torch.float16, + use_auth_token=True +) +pipe = pipe.to(device) + +prompt = "a cat sitting on a bench" +with autocast("cuda"): + images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75)["sample"] + +images[0].save("cat_on_bench.png") +``` + +### Tweak prompts reusing seeds and latents + +You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). + + For more details, check out [the Stable Diffusion notebook](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb) and have a look into the [release notes](https://github.com/huggingface/diffusers/releases/tag/v0.2.0). diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index cb8ac4c9f91c..2203b72e05d1 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -32,19 +32,19 @@ available a colab notebook to directly try them out. | Pipeline | Paper | Tasks | Colab |---|---|:---:|:---:| -| [ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim) | []() | Unconditional Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm) | []() | Unconditional Image Generation | -| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | []() | Text-to-Image Generation | -| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | []() | Unconditional Image Generation | -| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | []() | Unconditional Image Generation | -| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | []() | Unconditional Image Generation | -| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | []() | Unconditional Image Generation | -| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | []() | Text-to-Image Generation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| | []() | Image-to-Image Translation | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| | []() | Image In-Painting | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [stochatic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochatic_karras_ve) | []() | Unconditional Image Generation | - -**Note**: Many pipelines provide very simple examples of how to play around with the diffusion systems as described in the corresponding papers. +| [ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm) | [*Denoising Diffusion Probabilistic Models*](https://arxiv.org/abs/2006.11239) | **Unconditional Image Generation** | +| [ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim) | [*Denoising Diffusion Implicit Models*](https://arxiv.org/abs/2010.02502) | **Unconditional Image Generation** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | []() | **Text-to-Image Generation** | +| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | [*High-Resolution Image Synthesis with Latent Diffusion Models*](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | +| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | [*Pseudo Numerical Methods for Diffusion Models on Manifolds*](https://arxiv.org/abs/2202.09778) | **Unconditional Image Generation** | +| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | [*Score-Based Generative Modeling through Stochastic Differential Equations*](https://openreview.net/forum?id=PxTIG12RRHS) | **Unconditional Image Generation** | +| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_vp) | [*Score-Based Generative Modeling through Stochastic Differential Equations*](https://openreview.net/forum?id=PxTIG12RRHS) | **Unconditional Image Generation** | +| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [*Stable Diffusion*](https://stability.ai/blog/stable-diffusion-public-release) | **Text-to-Image Generation** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| | [*Stable Diffusion*](https://stability.ai/blog/stable-diffusion-public-release) | **Image-to-Image Translation** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| | [*Stable Diffusion*](https://stability.ai/blog/stable-diffusion-public-release) | **Image In-Painting** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [stochatic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochatic_karras_ve) | [*Elucidating the Design Space of Diffusion-Based Generative Models*](https://arxiv.org/abs/2206.00364) | **Unconditional Image Generation** | + +**Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers. However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the [Examples](#examples) below. ## Pipelines API @@ -84,6 +84,33 @@ We try to make the pipeline code as readable as possible so that each part from ## Examples +### Text-to-Image generation with Stable Diffusion + +```python +# make sure you're logged in with `huggingface-cli login` +from torch import autocast +from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler + +lms = LMSDiscreteScheduler( + beta_start=0.00085, + beta_end=0.012, + beta_schedule="scaled_linear" +) + +pipe = StableDiffusionPipeline.from_pretrained( + "CompVis/stable-diffusion-v1-4", + scheduler=lms, + use_auth_token=True +) +pipe = pipe.to("cuda") + +prompt = "a photo of an astronaut riding a horse on mars" +with autocast("cuda"): + image = pipe(prompt)["sample"][0] + +image.save("astronaut_rides_horse.png") +``` + ### Image-to-Image text-guided generation with Stable Diffusion The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! diff --git a/src/diffusers/pipelines/stable_diffusion/__init__.py b/src/diffusers/pipelines/stable_diffusion/__init__.py index 64fa2ab69ec0..1721caf03b85 100644 --- a/src/diffusers/pipelines/stable_diffusion/__init__.py +++ b/src/diffusers/pipelines/stable_diffusion/__init__.py @@ -3,7 +3,7 @@ if is_transformers_available(): - from .safety_checker import StableDiffusionSafetyChecker from .pipeline_stable_diffusion import StableDiffusionPipeline from .pipeline_stable_diffusion_img2img import StableDiffusionImg2ImgPipeline from .pipeline_stable_diffusion_inpaint import StableDiffusionInpaintPipeline + from .safety_checker import StableDiffusionSafetyChecker diff --git a/tests/test_pipelines.py b/tests/test_pipelines.py index e5aece440a5d..be3b201338e6 100644 --- a/tests/test_pipelines.py +++ b/tests/test_pipelines.py @@ -19,9 +19,8 @@ import numpy as np import torch -from datasets import load_dataset - import PIL +from datasets import load_dataset from diffusers import ( DDIMPipeline, DDIMScheduler, @@ -452,7 +451,9 @@ def test_stable_diffusion_img2img_pipeline(self): prompt = "A fantasy landscape, trending on artstation" generator = torch.Generator(device=torch_device).manual_seed(0) - image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)["sample"][0] + image = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5, generator=generator)[ + "sample" + ][0] expected_array = np.array(output_image) sampled_array = np.array(image) @@ -477,7 +478,12 @@ def test_stable_diffusion_in_paint_pipeline(self): generator = torch.Generator(device=torch_device).manual_seed(0) image = pipe( - prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75, guidance_scale=7.5, generator=generator + prompt=prompt, + init_image=init_image, + mask_image=mask_image, + strength=0.75, + guidance_scale=7.5, + generator=generator, )["sample"][0] expected_array = np.array(output_image) From 102eb97b2de3d8b823e1af446678ddb7bb3b0d06 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 12:26:40 +0000 Subject: [PATCH 22/32] improve table --- src/diffusers/pipelines/README.md | 22 +++++++++++----------- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 2203b72e05d1..a31cea26c225 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -32,17 +32,17 @@ available a colab notebook to directly try them out. | Pipeline | Paper | Tasks | Colab |---|---|:---:|:---:| -| [ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm) | [*Denoising Diffusion Probabilistic Models*](https://arxiv.org/abs/2006.11239) | **Unconditional Image Generation** | -| [ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim) | [*Denoising Diffusion Implicit Models*](https://arxiv.org/abs/2010.02502) | **Unconditional Image Generation** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | []() | **Text-to-Image Generation** | -| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | [*High-Resolution Image Synthesis with Latent Diffusion Models*](https://arxiv.org/abs/2112.10752) | Unconditional Image Generation | -| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | [*Pseudo Numerical Methods for Diffusion Models on Manifolds*](https://arxiv.org/abs/2202.09778) | **Unconditional Image Generation** | -| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | [*Score-Based Generative Modeling through Stochastic Differential Equations*](https://openreview.net/forum?id=PxTIG12RRHS) | **Unconditional Image Generation** | -| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_vp) | [*Score-Based Generative Modeling through Stochastic Differential Equations*](https://openreview.net/forum?id=PxTIG12RRHS) | **Unconditional Image Generation** | -| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [*Stable Diffusion*](https://stability.ai/blog/stable-diffusion-public-release) | **Text-to-Image Generation** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| | [*Stable Diffusion*](https://stability.ai/blog/stable-diffusion-public-release) | **Image-to-Image Translation** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| | [*Stable Diffusion*](https://stability.ai/blog/stable-diffusion-public-release) | **Image In-Painting** | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) -| [stochatic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochatic_karras_ve) | [*Elucidating the Design Space of Diffusion-Based Generative Models*](https://arxiv.org/abs/2206.00364) | **Unconditional Image Generation** | +| [ddpm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddpm) | [**Denoising Diffusion Probabilistic Models**](https://arxiv.org/abs/2006.11239) | *Unconditional Image Generation* | +| [ddim](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/ddim) | [**Denoising Diffusion Implicit Models**](https://arxiv.org/abs/2010.02502) | *Unconditional Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [latent_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752)| *Text-to-Image Generation* | +| [latent_diffusion_uncond](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/latent_diffusion_uncond) | [**High-Resolution Image Synthesis with Latent Diffusion Models**](https://arxiv.org/abs/2112.10752) | *Unconditional Image Generation* | +| [pndm](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/pndm) | [**Pseudo Numerical Methods for Diffusion Models on Manifolds**](https://arxiv.org/abs/2202.09778) | *Unconditional Image Generation* | +| [score_sde_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_ve) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | *Unconditional Image Generation* | +| [score_sde_vp](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/score_sde_vp) | [**Score-Based Generative Modeling through Stochastic Differential Equations**](https://openreview.net/forum?id=PxTIG12RRHS) | *Unconditional Image Generation* | +| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | *Text-to-Image Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) +| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | *Image-to-Image Text-Guided Generation* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) +| [stable_diffusion](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion) | [**Stable Diffusion**](https://stability.ai/blog/stable-diffusion-public-release) | *Text-Guided Image Inpainting* | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/in_painting_with_stable_diffusion_using_diffusers.ipynb) +| [stochatic_karras_ve](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stochatic_karras_ve) | [**Elucidating the Design Space of Diffusion-Based Generative Models**](https://arxiv.org/abs/2206.00364) | *Unconditional Image Generation* | **Note**: Pipelines are simple examples of how to play around with the diffusion systems as described in the corresponding papers. However, most of them can be adapted to use different scheduler components or even different model components. Some pipeline examples are shown in the [Examples](#examples) below. From e3238c0e4bd8f8ae23e8ac225b46af148ae11e40 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 12:27:56 +0000 Subject: [PATCH 23/32] more fixes --- README.md | 4 ++-- examples/inference/inpainting.py | 2 +- src/diffusers/pipelines/README.md | 4 ++-- src/diffusers/pipelines/stable_diffusion/readme.md | 6 +++--- 4 files changed, 8 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 49f9d5fa4d45..876547f6da6c 100644 --- a/README.md +++ b/README.md @@ -111,7 +111,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research. ### In-painting using Stable Diffusion -The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. +The `inpainting.py` script implements `StableDiffusionInpainPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO @@ -133,7 +133,7 @@ init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" -pipe = StableDiffusionInpaintingPipeline.from_pretrained( +pipe = StableDiffusionInpainPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, diff --git a/examples/inference/inpainting.py b/examples/inference/inpainting.py index c1777dc1a9b4..05de8a1a6b82 100644 --- a/examples/inference/inpainting.py +++ b/examples/inference/inpainting.py @@ -1 +1 @@ -from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpaintingPipeline # noqa F401 +from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpainPipeline # noqa F401 diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index a31cea26c225..4a119f7b61e5 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -155,7 +155,7 @@ You can generate your own latents to reproduce results, or tweak your prompt on ### In-painting using Stable Diffusion -The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. +The `inpainting.py` script implements `StableDiffusionInpainPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO @@ -177,7 +177,7 @@ init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" -pipe = StableDiffusionInpaintingPipeline.from_pretrained( +pipe = StableDiffusionInpainPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, diff --git a/src/diffusers/pipelines/stable_diffusion/readme.md b/src/diffusers/pipelines/stable_diffusion/readme.md index 37e9d859a3cb..3cb1de9aa6da 100644 --- a/src/diffusers/pipelines/stable_diffusion/readme.md +++ b/src/diffusers/pipelines/stable_diffusion/readme.md @@ -57,7 +57,7 @@ You can generate your own latents to reproduce results, or tweak your prompt on ## In-painting using Stable Diffusion -The `inpainting.py` script implements `StableDiffusionInpaintingPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. +The `inpainting.py` script implements `StableDiffusionInpainPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. ### How to use it @@ -69,7 +69,7 @@ from torch import autocast import requests import PIL -from inpainting import StableDiffusionInpaintingPipeline +from inpainting import StableDiffusionInpainPipeline def download_image(url): response = requests.get(url) @@ -82,7 +82,7 @@ init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" -pipe = StableDiffusionInpaintingPipeline.from_pretrained( +pipe = StableDiffusionInpainPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, From dbd9489d1f991ae5f04d439c330d0a0b5625370c Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 12:33:13 +0000 Subject: [PATCH 24/32] up --- README.md | 10 ++++------ 1 file changed, 4 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 876547f6da6c..6191b4d2015b 100644 --- a/README.md +++ b/README.md @@ -20,11 +20,13 @@ as a modular toolbox for inference and training of diffusion models. More precisely, πŸ€— Diffusers offers: -- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). +- State-of-the-art diffusion pipelines that can be run in inference with just a couple of lines of code (see [src/diffusers/pipelines](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines)). Check [this overview](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/README.md#pipelines-summary) to see all supported pipelines and their corresponding official papers. - Various noise schedulers that can be used interchangeably for the prefered speed vs. quality trade-off in inference (see [src/diffusers/schedulers](https://github.com/huggingface/diffusers/tree/main/src/diffusers/schedulers)). - Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)). - Training examples to show how to train the most popular diffusion models (see [examples/training](https://github.com/huggingface/diffusers/tree/main/examples/training)). - Inference examples to show how to create custom pipelines for advanced tasks such as image2image, in-painting (see [examples/inference](https://github.com/huggingface/diffusers/tree/main/examples/inference)) +- Most popular diffusion papers are natively supported in + ## Quickstart @@ -33,11 +35,7 @@ In order to get started, we recommend taking a look at two notebooks: - The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library. - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your - diffuser model on an image dataset, with explanatory graphics. - -## **New 🎨🎨🎨** Stable Diffusion is now fully compatible with `diffusers`! - -Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. + diffuser model on an image dataset, with explanatory graphics. ## **New 🎨🎨🎨** Stable Diffusion is now fully compatible with `diffusers`! Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-3), read the license and tick the checkbox if you agree. You have to be a registered user in πŸ€— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. From b78467a24f5fa107dba5f72ff8a10de4c3efff70 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 18:22:40 +0200 Subject: [PATCH 25/32] Apply suggestions from code review Co-authored-by: Suraj Patil Co-authored-by: Pedro Cuenca --- README.md | 8 +++++--- examples/README.md | 8 ++++---- examples/community/README.md | 2 +- 3 files changed, 10 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 6191b4d2015b..13d526dc8108 100644 --- a/README.md +++ b/README.md @@ -35,7 +35,9 @@ In order to get started, we recommend taking a look at two notebooks: - The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library. - The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your - diffuser model on an image dataset, with explanatory graphics. ## **New 🎨🎨🎨** Stable Diffusion is now fully compatible with `diffusers`! Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. + diffuser model on an image dataset, with explanatory graphics. + + ## **New ** Stable Diffusion is now fully compatible with `diffusers`! Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. You need to accept the model license before downloading or using the Stable Diffusion weights. Please, visit the [model card](https://huggingface.co/CompVis/stable-diffusion-v1-3), read the license and tick the checkbox if you agree. You have to be a registered user in πŸ€— Hugging Face Hub, and you'll also need to use an access token for the code to work. For more information on access tokens, please refer to [this section](https://huggingface.co/docs/hub/security-tokens) of the documentation. @@ -71,7 +73,7 @@ image.save("astronaut_rides_horse.png") ### Image-to-Image text-guided generation with Stable Diffusion -The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! +The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images. ```python from torch import autocast @@ -109,7 +111,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research. ### In-painting using Stable Diffusion -The `inpainting.py` script implements `StableDiffusionInpainPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. +The `StableDiffusionInpainPipeline` lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO diff --git a/examples/README.md b/examples/README.md index 36846c89bdc7..e39ca5c3a96f 100644 --- a/examples/README.md +++ b/examples/README.md @@ -32,7 +32,7 @@ point of view very similar, *e.g.* image super-resolution and image modification We provide **official** examples that cover the most popular tasks of diffusion models. *Official* examples are **actively** maintained by the `diffusers` maintainers and we try to rigorously follow our example philosophy as defined above. -If you feel like another important should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you . +If you feel like another important example should exist, we are more than happy to welcome a [Feature Request](https://github.com/huggingface/diffusers/issues/new?assignees=&labels=&template=feature_request.md&title=) or directly a [Pull Request](https://github.com/huggingface/diffusers/compare) from you! Training examples show how to pretrain or fine-tune diffusion models for a variety of tasks. Currently we support: @@ -42,7 +42,7 @@ Training examples show how to pretrain or fine-tune diffusion models for a varie ## Community -In additon, we provide **community** examples, which are examples added and maintained by our community. +In addition, we provide **community** examples, which are examples added and maintained by our community. Community examples can consist of both *training* examples or *inference* pipelines. For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. Examples that are useful for the community, but are either not yet deemed popular or not yet following our above philosophy should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. @@ -52,8 +52,8 @@ Examples that are useful for the community, but are either not yet deemed popula To make sure you can successfully run the latest versions of the example scripts, you have to **install the library from source** and install some example-specific requirements. To do this, execute the following steps in a new virtual environment: ```bash -git clone https://github.com/huggingface/transformers -cd transformers +git clone https://github.com/huggingface/diffusers +cd diffusers pip install . ``` Then cd in the example folder of your choice and run diff --git a/examples/community/README.md b/examples/community/README.md index 1ff604f708e0..ce6f57b018cf 100644 --- a/examples/community/README.md +++ b/examples/community/README.md @@ -1,6 +1,6 @@ # Community Examples -**Community** examples consits of both inference and training examples that have been added by the community. +**Community** examples consist of both inference and training examples that have been added by the community. | Example | Description | Author | | |:----------|:-------------|:-------------|------:| From 914c7c0c88768b8a9a898acb22cb609e494785ff Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 18:26:23 +0200 Subject: [PATCH 26/32] Apply suggestions from code review Co-authored-by: Suraj Patil --- .../stable_diffusion/pipeline_stable_diffusion_img2img.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py index 3dac53bd9bd1..2c3d5c8e15e8 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_img2img.py @@ -87,7 +87,7 @@ def __call__( init_latents = self.vae.encode(init_image.to(self.device)).sample(generator=generator) init_latents = 0.18215 * init_latents - # prepare init_latents noise to latents + # expand init_latents for batch_size init_latents = torch.cat([init_latents] * batch_size) # get the original timestep using init_timestep From f9f39f4fb8847e8262c1734d31fda759bc969290 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 18:28:48 +0200 Subject: [PATCH 27/32] Apply suggestions from code review Co-authored-by: Suraj Patil --- .../stable_diffusion/pipeline_stable_diffusion_inpaint.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py index bf879cf205e8..6827846722d7 100644 --- a/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py +++ b/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_inpaint.py @@ -101,7 +101,7 @@ def __call__( init_latents = self.vae.encode(init_image).sample(generator=generator) init_latents = 0.18215 * init_latents - # prepare init_latents noise to latents + # Expand init_latents for batch_size init_latents = torch.cat([init_latents] * batch_size) init_latents_orig = init_latents From 36dd1a837c768010d27e789cc367804098f24546 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 18:31:15 +0200 Subject: [PATCH 28/32] Apply suggestions from code review Co-authored-by: Pedro Cuenca Co-authored-by: Suraj Patil Co-authored-by: Anton Lozhkov --- src/diffusers/pipelines/README.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index 4a119f7b61e5..c9e613b0610c 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -4,7 +4,7 @@ Pipelines provide a simple way to run state-of-the-art diffusion models in infer Most diffusion systems consist of multiple independently-trained models and highly adaptable scheduler components - all of which are needed to have a functioning end-to-end diffusion system. -As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three indepently trained models: +As an example, [Stable Diffusion](https://huggingface.co/blog/stable_diffusion) has three independently trained models: - [Autoencoder](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/vae.py#L392) - [Conditional Unet](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/models/unet_2d_condition.py#L12) - [CLIP text encoder](https://huggingface.co/docs/transformers/v4.21.2/en/model_doc/clip#transformers.CLIPTextModel) @@ -16,9 +16,9 @@ or created independently from each other. To that end, we strive to offer all open-sourced, state-of-the-art diffusion system under a unified API. More specifically, we strive to provide pipelines that -- 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implemetation according to the corresponding paper (*e.g.* [LatentDiffusionPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)), +- 1. can load the officially published weights and yield 1-to-1 the same outputs as the original implementation according to the corresponding paper (*e.g.* [LatentDiffusionPipeline](https://github.com/huggingface/diffusers/tree/main/src/diffusers/pipelines/latent_diffusion), uses the officially released weights of [High-Resolution Image Synthesis with Latent Diffusion Models](https://arxiv.org/abs/2112.10752)), - 2. have a simple user interface to run the model in inference (see the [Pipelines API](#pipelines-api) section), -- 3. are easy to understand with code that is self-explanatory and be can read along-side the official paper (see [Pipelines summary](#pipelines-summary)), +- 3. are easy to understand with code that is self-explanatory and can be read along-side the official paper (see [Pipelines summary](#pipelines-summary)), - 4. can easily be contributed by the community (see the [Contribution](#contribution) section). **Note** that pipelines do not (and should not) offer any training functionality. @@ -52,11 +52,11 @@ However, most of them can be adapted to use different scheduler components or ev Diffusion models often consist of multiple independently-trained models or other previously existing components. -Each model has been trained indepently on a different task and the scheduler can easily be swapped out against another scheduler. +Each model has been trained independently on a different task and the scheduler can easily be swapped out and replaced with a different one. During inference, we however want to be able to easily load all components and use them in inference - even if one component, *e.g.* CLIP's text encoder, originates from a different library, such as [Transformers](https://github.com/huggingface/transformers). To that end, all pipelines provide the following functionality: - [`from_pretrained` method](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L139) that accepts a Hugging Face Hub repository id, *e.g.* [CompVis/stable-diffusion-v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4) or a path to a local directory, *e.g.* -"./stable-diffusion". To correctly retrieve which models and components should be loaded one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be +"./stable-diffusion". To correctly retrieve which models and components should be loaded, one has to provide a `model_index.json` file, *e.g.* [CompVis/stable-diffusion-v1-4/model_index.json](https://huggingface.co/CompVis/stable-diffusion-v1-4/blob/main/model_index.json), which defines all components that should be loaded into the pipelines. More specifically, for each model/component one needs to define the format `: ["", ""]`. `` is the attribute name given to the loaded instance of `` which can be found in the library or pipeline folder called `""`. - [`save_pretrained`](https://github.com/huggingface/diffusers/blob/5cbed8e0d157f65d3ddc2420dfd09f2df630e978/src/diffusers/pipeline_utils.py#L90) that accepts a local path, *e.g.* `./stable-diffusion` under which all models/components of the pipeline will be saved. For each component/model a folder is created inside the local path that is named after the given attribute name, *e.g.* `./stable_diffusion/unet`. In addition, a `model_index.json` file is created at the root of the local path, *e.g.* `./stable_diffusion/model_index.json` so that the complete pipeline can again be instantiated @@ -79,7 +79,7 @@ use it for its designated task, *e.g.* text-to-image generation, in just a coupl logic including pre-processing, an unrolled diffusion loop, and post-processing should all happen inside the `__call__` method. - **Easy-to-tweak**: Certain pipelines will not be able to handle all use cases and tasks that you might like them to. If you want to use a certain pipeline for a specific use case that is not yet supported, you might have to copy the pipeline file and tweak the code to your needs. -We try to make the pipeline code as readable as possible so that each part from pre-processing to diffusing to post-processing can easily be adapted. If you would like the community to benefit from your customized pipeline, we would to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/commmunity). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines) would be even better . +We try to make the pipeline code as readable as possible so that each part –from pre-processing to diffusing to post-processing– can easily be adapted. If you would like the community to benefit from your customized pipeline, we would love to see a contribution to our [community-examples](https://github.com/huggingface/diffusers/tree/main/examples/commmunity). If you feel that an important pipeline should be part of the official pipelines but isn't, a contribution to the [official pipelines](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines) would be even better. - **One-purpose-only**: Pipelines should be used for one task and one task only. Even if two tasks are very similar from a modeling point of view, *e.g.* image2image translation and in-painting, pipelines shall be used for one task only to keep them *easy-to-tweak* and *readable*. ## Examples @@ -113,7 +113,7 @@ image.save("astronaut_rides_horse.png") ### Image-to-Image text-guided generation with Stable Diffusion -The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! +The `StableDiffusionImg2ImgPipeline` lets you pass a text prompt and an initial image to condition the generation of new images. ```python from torch import autocast @@ -150,7 +150,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research. ### Tweak prompts reusing seeds and latents -You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). +You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](https://github.com/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). ### In-painting using Stable Diffusion From e076f9976408e87a52ed317c1ebdbfce48388903 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 18:32:12 +0200 Subject: [PATCH 29/32] Update src/diffusers/pipelines/README.md Co-authored-by: Suraj Patil --- src/diffusers/pipelines/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index c9e613b0610c..bb7e5fdf703f 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -155,7 +155,7 @@ You can generate your own latents to reproduce results, or tweak your prompt on ### In-painting using Stable Diffusion -The `inpainting.py` script implements `StableDiffusionInpainPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. +The `StableDiffusionInpainPipeline` lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO From 8c5d0719b90b221212c672fd189978cd455dba87 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 16:32:46 +0000 Subject: [PATCH 30/32] add better links --- README.md | 1 - examples/README.md | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/README.md b/README.md index 6191b4d2015b..5b47062d4bb1 100644 --- a/README.md +++ b/README.md @@ -25,7 +25,6 @@ More precisely, πŸ€— Diffusers offers: - Multiple types of models, such as UNet, can be used as building blocks in an end-to-end diffusion system (see [src/diffusers/models](https://github.com/huggingface/diffusers/tree/main/src/diffusers/models)). - Training examples to show how to train the most popular diffusion models (see [examples/training](https://github.com/huggingface/diffusers/tree/main/examples/training)). - Inference examples to show how to create custom pipelines for advanced tasks such as image2image, in-painting (see [examples/inference](https://github.com/huggingface/diffusers/tree/main/examples/inference)) -- Most popular diffusion papers are natively supported in ## Quickstart diff --git a/examples/README.md b/examples/README.md index 36846c89bdc7..83a66e3785ac 100644 --- a/examples/README.md +++ b/examples/README.md @@ -46,7 +46,7 @@ In additon, we provide **community** examples, which are examples added and main Community examples can consist of both *training* examples or *inference* pipelines. For such examples, we are more lenient regarding the philosophy defined above and also cannot guarantee to provide maintenance for every issue. Examples that are useful for the community, but are either not yet deemed popular or not yet following our above philosophy should go into the [community examples](https://github.com/huggingface/diffusers/tree/main/examples/community) folder. The community folder therefore includes training examples and inference pipelines. -**Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/compare) to show to the community how you like to use `diffusers` πŸͺ„. +**Note**: Community examples can be a [great first contribution](https://github.com/huggingface/diffusers/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) to show to the community how you like to use `diffusers` πŸͺ„. ## Important note From 2f5b6b3a61718785e0d1e7447dd58aca28d589f7 Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 16:36:28 +0000 Subject: [PATCH 31/32] fix more --- README.md | 8 ++++---- examples/inference/image_to_image.py | 3 +++ examples/inference/inpainting.py | 5 ++++- src/diffusers/pipelines/README.md | 4 ++-- src/diffusers/pipelines/stable_diffusion/readme.md | 6 +++--- src/diffusers/utils/dummy_transformers_objects.py | 1 + 6 files changed, 17 insertions(+), 10 deletions(-) diff --git a/README.md b/README.md index d55fa210373c..c88f8de426aa 100644 --- a/README.md +++ b/README.md @@ -33,8 +33,8 @@ In order to get started, we recommend taking a look at two notebooks: - The [Getting started with Diffusers](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/diffusers_intro.ipynb) notebook, which showcases an end-to-end example of usage for diffusion models, schedulers and pipelines. Take a look at this notebook to learn how to use the pipeline abstraction, which takes care of everything (model, scheduler, noise handling) for you, and also to understand each independent building block in the library. -- The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffuser model training methods. This notebook takes a step-by-step approach to training your - diffuser model on an image dataset, with explanatory graphics. +- The [Training a diffusers model](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/training_example.ipynb) notebook summarizes diffusion models training methods. This notebook takes a step-by-step approach to training your + diffusion models on an image dataset, with explanatory graphics. ## **New ** Stable Diffusion is now fully compatible with `diffusers`! Stable Diffusion is a text-to-image latent diffusion model created by the researchers and engineers from [CompVis](https://github.com/CompVis), [Stability AI](https://stability.ai/) and [LAION](https://laion.ai/). It's trained on 512x512 images from a subset of the [LAION-5B](https://laion.ai/blog/laion-5b/) database. This model uses a frozen CLIP ViT-L/14 text encoder to condition the model on text prompts. With its 860M UNet and 123M text encoder, the model is relatively lightweight and runs on a GPU with at least 10GB VRAM. See the [model card](https://huggingface.co/CompVis/stable-diffusion) for more information. @@ -110,7 +110,7 @@ You can also run this example on colab [![Open In Colab](https://colab.research. ### In-painting using Stable Diffusion -The `StableDiffusionInpainPipeline` lets you edit specific parts of an image by providing a mask and text prompt. +The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO @@ -132,7 +132,7 @@ init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" -pipe = StableDiffusionInpainPipeline.from_pretrained( +pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, diff --git a/examples/inference/image_to_image.py b/examples/inference/image_to_image.py index a636f6608e7c..625bc729f20d 100644 --- a/examples/inference/image_to_image.py +++ b/examples/inference/image_to_image.py @@ -1 +1,4 @@ +import warnings from diffusers import StableDiffusionImg2ImgPipeline # noqa F401 + +warnings.warn("The `image_to_image.py` script is outdated. Please use directly `from diffusers import StableDiffusionImg2ImgPipeline` instead.") diff --git a/examples/inference/inpainting.py b/examples/inference/inpainting.py index 05de8a1a6b82..ca6a753a6595 100644 --- a/examples/inference/inpainting.py +++ b/examples/inference/inpainting.py @@ -1 +1,4 @@ -from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpainPipeline # noqa F401 +import warnings +from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpaintPipeline # noqa F401 + +warnings.warn("The `inpainting.py` script is outdated. Please use directly `from diffusers import StableDiffusionInpaintPipeline` instead.") diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index bb7e5fdf703f..f4862b64e04e 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -155,7 +155,7 @@ You can generate your own latents to reproduce results, or tweak your prompt on ### In-painting using Stable Diffusion -The `StableDiffusionInpainPipeline` lets you edit specific parts of an image by providing a mask and text prompt. +The `StableDiffusionInpaintPipeline` lets you edit specific parts of an image by providing a mask and text prompt. ```python from io import BytesIO @@ -177,7 +177,7 @@ init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" -pipe = StableDiffusionInpainPipeline.from_pretrained( +pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, diff --git a/src/diffusers/pipelines/stable_diffusion/readme.md b/src/diffusers/pipelines/stable_diffusion/readme.md index 3cb1de9aa6da..ed1c17a8a30a 100644 --- a/src/diffusers/pipelines/stable_diffusion/readme.md +++ b/src/diffusers/pipelines/stable_diffusion/readme.md @@ -57,7 +57,7 @@ You can generate your own latents to reproduce results, or tweak your prompt on ## In-painting using Stable Diffusion -The `inpainting.py` script implements `StableDiffusionInpainPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. +The `inpainting.py` script implements `StableDiffusionInpaintPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. ### How to use it @@ -69,7 +69,7 @@ from torch import autocast import requests import PIL -from inpainting import StableDiffusionInpainPipeline +from inpainting import StableDiffusionInpaintPipeline def download_image(url): response = requests.get(url) @@ -82,7 +82,7 @@ init_image = download_image(img_url).resize((512, 512)) mask_image = download_image(mask_url).resize((512, 512)) device = "cuda" -pipe = StableDiffusionInpainPipeline.from_pretrained( +pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, diff --git a/src/diffusers/utils/dummy_transformers_objects.py b/src/diffusers/utils/dummy_transformers_objects.py index 753e3fdbe291..dc929427221a 100644 --- a/src/diffusers/utils/dummy_transformers_objects.py +++ b/src/diffusers/utils/dummy_transformers_objects.py @@ -1,3 +1,4 @@ # This file is autogenerated by the command `make fix-copies`, do not edit. # flake8: noqa from ..utils import DummyObject, requires_backends + From 02c85e033cf5d78553d5975e3b9575c86bac206c Mon Sep 17 00:00:00 2001 From: Patrick von Platen Date: Tue, 30 Aug 2022 16:37:20 +0000 Subject: [PATCH 32/32] finish --- examples/inference/image_to_image.py | 7 +- examples/inference/inpainting.py | 7 +- src/diffusers/pipelines/README.md | 12 +-- .../pipelines/stable_diffusion/readme.md | 99 ------------------- .../utils/dummy_transformers_objects.py | 1 - 5 files changed, 13 insertions(+), 113 deletions(-) delete mode 100644 src/diffusers/pipelines/stable_diffusion/readme.md diff --git a/examples/inference/image_to_image.py b/examples/inference/image_to_image.py index 625bc729f20d..86b46c4e606e 100644 --- a/examples/inference/image_to_image.py +++ b/examples/inference/image_to_image.py @@ -1,4 +1,9 @@ import warnings + from diffusers import StableDiffusionImg2ImgPipeline # noqa F401 -warnings.warn("The `image_to_image.py` script is outdated. Please use directly `from diffusers import StableDiffusionImg2ImgPipeline` instead.") + +warnings.warn( + "The `image_to_image.py` script is outdated. Please use directly `from diffusers import" + " StableDiffusionImg2ImgPipeline` instead." +) diff --git a/examples/inference/inpainting.py b/examples/inference/inpainting.py index ca6a753a6595..8aad208ff34e 100644 --- a/examples/inference/inpainting.py +++ b/examples/inference/inpainting.py @@ -1,4 +1,9 @@ import warnings + from diffusers import StableDiffusionInpaintPipeline as StableDiffusionInpaintPipeline # noqa F401 -warnings.warn("The `inpainting.py` script is outdated. Please use directly `from diffusers import StableDiffusionInpaintPipeline` instead.") + +warnings.warn( + "The `inpainting.py` script is outdated. Please use directly `from diffusers import" + " StableDiffusionInpaintPipeline` instead." +) diff --git a/src/diffusers/pipelines/README.md b/src/diffusers/pipelines/README.md index f4862b64e04e..f79d96fb8026 100644 --- a/src/diffusers/pipelines/README.md +++ b/src/diffusers/pipelines/README.md @@ -91,17 +91,7 @@ We try to make the pipeline code as readable as possible so that each part –fr from torch import autocast from diffusers import StableDiffusionPipeline, LMSDiscreteScheduler -lms = LMSDiscreteScheduler( - beta_start=0.00085, - beta_end=0.012, - beta_schedule="scaled_linear" -) - -pipe = StableDiffusionPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - scheduler=lms, - use_auth_token=True -) +pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4", use_auth_token=True) pipe = pipe.to("cuda") prompt = "a photo of an astronaut riding a horse on mars" diff --git a/src/diffusers/pipelines/stable_diffusion/readme.md b/src/diffusers/pipelines/stable_diffusion/readme.md deleted file mode 100644 index ed1c17a8a30a..000000000000 --- a/src/diffusers/pipelines/stable_diffusion/readme.md +++ /dev/null @@ -1,99 +0,0 @@ -# Inference Examples - -## Installing the dependencies - -Before running the scripts, make sure to install the library's dependencies: - -```bash -pip install diffusers transformers ftfy -``` - -## Image-to-Image text-guided generation with Stable Diffusion - -The `image_to_image.py` script implements `StableDiffusionImg2ImgPipeline`. It lets you pass a text prompt and an initial image to condition the generation of new images. This example also showcases how you can write custom diffusion pipelines using `diffusers`! - -### How to use it - - -```python -import torch -from torch import autocast -import requests -from PIL import Image -from io import BytesIO - -from image_to_image import StableDiffusionImg2ImgPipeline, preprocess - -# load the pipeline -device = "cuda" -pipe = StableDiffusionImg2ImgPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - revision="fp16", - torch_dtype=torch.float16, - use_auth_token=True -).to(device) - -# let's download an initial image -url = "https://raw.githubusercontent.com/CompVis/stable-diffusion/main/assets/stable-samples/img2img/sketch-mountains-input.jpg" - -response = requests.get(url) -init_image = Image.open(BytesIO(response.content)).convert("RGB") -init_image = init_image.resize((768, 512)) -init_image = preprocess(init_image) - -prompt = "A fantasy landscape, trending on artstation" - -with autocast("cuda"): - images = pipe(prompt=prompt, init_image=init_image, strength=0.75, guidance_scale=7.5)["sample"] - -images[0].save("fantasy_landscape.png") -``` -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/image_2_image_using_diffusers.ipynb) - -## Tweak prompts reusing seeds and latents - -You can generate your own latents to reproduce results, or tweak your prompt on a specific result you liked. [This notebook](stable-diffusion-seeds.ipynb) shows how to do it step by step. You can also run it in Google Colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/pcuenca/diffusers-examples/blob/main/notebooks/stable-diffusion-seeds.ipynb). - - -## In-painting using Stable Diffusion - -The `inpainting.py` script implements `StableDiffusionInpaintPipeline`. This script lets you edit specific parts of an image by providing a mask and text prompt. - -### How to use it - -```python -import torch -from io import BytesIO - -from torch import autocast -import requests -import PIL - -from inpainting import StableDiffusionInpaintPipeline - -def download_image(url): - response = requests.get(url) - return PIL.Image.open(BytesIO(response.content)).convert("RGB") - -img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" -mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" - -init_image = download_image(img_url).resize((512, 512)) -mask_image = download_image(mask_url).resize((512, 512)) - -device = "cuda" -pipe = StableDiffusionInpaintPipeline.from_pretrained( - "CompVis/stable-diffusion-v1-4", - revision="fp16", - torch_dtype=torch.float16, - use_auth_token=True -).to(device) - -prompt = "a cat sitting on a bench" -with autocast("cuda"): - images = pipe(prompt=prompt, init_image=init_image, mask_image=mask_image, strength=0.75)["sample"] - -images[0].save("cat_on_bench.png") -``` - -You can also run this example on colab [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/patil-suraj/Notebooks/blob/master/in_painting_with_stable_diffusion_using_diffusers.ipynb) diff --git a/src/diffusers/utils/dummy_transformers_objects.py b/src/diffusers/utils/dummy_transformers_objects.py index dc929427221a..753e3fdbe291 100644 --- a/src/diffusers/utils/dummy_transformers_objects.py +++ b/src/diffusers/utils/dummy_transformers_objects.py @@ -1,4 +1,3 @@ # This file is autogenerated by the command `make fix-copies`, do not edit. # flake8: noqa from ..utils import DummyObject, requires_backends -