Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
da02cee
Add combined pipeline
patrickvonplaten Jul 22, 2023
a89153d
Download readme
patrickvonplaten Jul 22, 2023
1cb7cd5
Upload
patrickvonplaten Jul 22, 2023
13ec934
up
patrickvonplaten Jul 24, 2023
581c1aa
Merge branch 'main' of https://github.com/huggingface/diffusers into …
patrickvonplaten Jul 25, 2023
5e1bd45
fix
patrickvonplaten Jul 25, 2023
e6d5bc6
up
patrickvonplaten Jul 25, 2023
e06c1ca
fix final
patrickvonplaten Jul 25, 2023
8714f03
fix more
patrickvonplaten Jul 25, 2023
af58ae5
Add enable model cpu offload kandinsky
patrickvonplaten Jul 25, 2023
5e465bf
finish
patrickvonplaten Jul 25, 2023
3196d1a
finish
patrickvonplaten Jul 25, 2023
931e76d
Fix
patrickvonplaten Jul 25, 2023
ca077c5
fix more
patrickvonplaten Jul 25, 2023
9ced89b
make style
patrickvonplaten Jul 25, 2023
a9630bf
fix kandinsky mask
patrickvonplaten Jul 25, 2023
1a5023b
fix inpainting test
patrickvonplaten Jul 25, 2023
09372a4
add callbacks
patrickvonplaten Jul 25, 2023
8f4d316
add tests
patrickvonplaten Jul 26, 2023
d70d0e1
fix tests
patrickvonplaten Jul 26, 2023
9ecf71a
Merge branch 'main' into add_combined_pipeline_kandinsky
patrickvonplaten Jul 26, 2023
a649f60
Apply suggestions from code review
patrickvonplaten Jul 26, 2023
435657d
docs
patrickvonplaten Jul 26, 2023
c822e07
docs
patrickvonplaten Jul 26, 2023
3300b7e
correct docs
patrickvonplaten Jul 26, 2023
4f9e3b2
fix tests
patrickvonplaten Jul 26, 2023
7f038f5
add warning
patrickvonplaten Jul 26, 2023
1acbcbe
correct docs
patrickvonplaten Jul 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,8 @@
title: InstructPix2Pix
- local: api/pipelines/kandinsky
title: Kandinsky
- local: api/pipelines/kandinsky_v22
title: Kandinsky 2.2
- local: api/pipelines/latent_diffusion
title: Latent Diffusion
- local: api/pipelines/panorama
Expand Down
278 changes: 27 additions & 251 deletions docs/source/en/api/pipelines/kandinsky.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -212,9 +212,9 @@ init_image = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinsky/cat.png"
)

mask = np.ones((768, 768), dtype=np.float32)
mask = np.zeros((768, 768), dtype=np.float32)
# Let's mask out an area above the cat's head
mask[:250, 250:-250] = 0
mask[:250, 250:-250] = 1
Comment on lines +215 to +217
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe add a "Tip warning" here to ensure the users are communicated about this through and through?

Maybe something like:

<Tip warning=true>

Note that the above change was introduced in the following pull request: https://github.com/huggingface/diffusers/pull/4207. So, if you're using a source installation of `diffusers` or the latest release, you should upgrade your inpainting code to follow the above. 

</Tip>

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the comment in the PR description is wrong (just in case we want to copy/paste it somewhere)


out = pipe(
prompt,
Expand Down Expand Up @@ -276,208 +276,6 @@ image.save("starry_cat.png")
```
![img](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/kandinsky-docs/starry_cat.png)


### Text-to-Image Generation with ControlNet Conditioning

In the following, we give a simple example of how to use [`KandinskyV22ControlnetPipeline`] to add control to the text-to-image generation with a depth image.

First, let's take an image and extract its depth map.

```python
from diffusers.utils import load_image

img = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png"
).resize((768, 768))
```
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/cat.png)

We can use the `depth-estimation` pipeline from transformers to process the image and retrieve its depth map.

```python
import torch
import numpy as np

from transformers import pipeline
from diffusers.utils import load_image


def make_hint(image, depth_estimator):
image = depth_estimator(image)["depth"]
image = np.array(image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
detected_map = torch.from_numpy(image).float() / 255.0
hint = detected_map.permute(2, 0, 1)
return hint


depth_estimator = pipeline("depth-estimation")
hint = make_hint(img, depth_estimator).unsqueeze(0).half().to("cuda")
```
Now, we load the prior pipeline and the text-to-image controlnet pipeline

```python
from diffusers import KandinskyV22PriorPipeline, KandinskyV22ControlnetPipeline

pipe_prior = KandinskyV22PriorPipeline.from_pretrained(
"kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16
)
pipe_prior = pipe_prior.to("cuda")

pipe = KandinskyV22ControlnetPipeline.from_pretrained(
"kandinsky-community/kandinsky-2-2-controlnet-depth", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")
```

We pass the prompt and negative prompt through the prior to generate image embeddings

```python
prompt = "A robot, 4k photo"

negative_prior_prompt = "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature"

generator = torch.Generator(device="cuda").manual_seed(43)
image_emb, zero_image_emb = pipe_prior(
prompt=prompt, negative_prompt=negative_prior_prompt, generator=generator
).to_tuple()
```

Now we can pass the image embeddings and the depth image we extracted to the controlnet pipeline. With Kandinsky 2.2, only prior pipelines accept `prompt` input. You do not need to pass the prompt to the controlnet pipeline.

```python
images = pipe(
image_embeds=image_emb,
negative_image_embeds=zero_image_emb,
hint=hint,
num_inference_steps=50,
generator=generator,
height=768,
width=768,
).images

images[0].save("robot_cat.png")
```

The output image looks as follow:
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/robot_cat_text2img.png)

### Image-to-Image Generation with ControlNet Conditioning

Kandinsky 2.2 also includes a [`KandinskyV22ControlnetImg2ImgPipeline`] that will allow you to add control to the image generation process with both the image and its depth map. This pipeline works really well with [`KandinskyV22PriorEmb2EmbPipeline`], which generates image embeddings based on both a text prompt and an image.

For our robot cat example, we will pass the prompt and cat image together to the prior pipeline to generate an image embedding. We will then use that image embedding and the depth map of the cat to further control the image generation process.

We can use the same cat image and its depth map from the last example.

```python
import torch
import numpy as np

from diffusers import KandinskyV22PriorEmb2EmbPipeline, KandinskyV22ControlnetImg2ImgPipeline
from diffusers.utils import load_image
from transformers import pipeline

img = load_image(
"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main" "/kandinskyv22/cat.png"
).resize((768, 768))


def make_hint(image, depth_estimator):
image = depth_estimator(image)["depth"]
image = np.array(image)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
detected_map = torch.from_numpy(image).float() / 255.0
hint = detected_map.permute(2, 0, 1)
return hint


depth_estimator = pipeline("depth-estimation")
hint = make_hint(img, depth_estimator).unsqueeze(0).half().to("cuda")

pipe_prior = KandinskyV22PriorEmb2EmbPipeline.from_pretrained(
"kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16
)
pipe_prior = pipe_prior.to("cuda")

pipe = KandinskyV22ControlnetImg2ImgPipeline.from_pretrained(
"kandinsky-community/kandinsky-2-2-controlnet-depth", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "A robot, 4k photo"
negative_prior_prompt = "lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck, username, watermark, signature"

generator = torch.Generator(device="cuda").manual_seed(43)

# run prior pipeline

img_emb = pipe_prior(prompt=prompt, image=img, strength=0.85, generator=generator)
negative_emb = pipe_prior(prompt=negative_prior_prompt, image=img, strength=1, generator=generator)

# run controlnet img2img pipeline
images = pipe(
image=img,
strength=0.5,
image_embeds=img_emb.image_embeds,
negative_image_embeds=negative_emb.image_embeds,
hint=hint,
num_inference_steps=50,
generator=generator,
height=768,
width=768,
).images

images[0].save("robot_cat.png")
```

Here is the output. Compared with the output from our text-to-image controlnet example, it kept a lot more cat facial details from the original image and worked into the robot style we asked for.

![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/robot_cat.png)

## Kandinsky 2.2

The Kandinsky 2.2 release includes robust new text-to-image models that support text-to-image generation, image-to-image generation, image interpolation, and text-guided image inpainting. The general workflow to perform these tasks using Kandinsky 2.2 is the same as in Kandinsky 2.1. First, you will need to use a prior pipeline to generate image embeddings based on your text prompt, and then use one of the image decoding pipelines to generate the output image. The only difference is that in Kandinsky 2.2, all of the decoding pipelines no longer accept the `prompt` input, and the image generation process is conditioned with only `image_embeds` and `negative_image_embeds`.

Let's look at an example of how to perform text-to-image generation using Kandinsky 2.2.

First, let's create the prior pipeline and text-to-image pipeline with Kandinsky 2.2 checkpoints.

```python
from diffusers import DiffusionPipeline
import torch

pipe_prior = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-prior", torch_dtype=torch.float16)
pipe_prior.to("cuda")

t2i_pipe = DiffusionPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", torch_dtype=torch.float16)
t2i_pipe.to("cuda")
```

You can then use `pipe_prior` to generate image embeddings.

```python
prompt = "portrait of a women, blue eyes, cinematic"
negative_prompt = "low quality, bad quality"

image_embeds, negative_image_embeds = pipe_prior(prompt, guidance_scale=1.0).to_tuple()
```

Now you can pass these embeddings to the text-to-image pipeline. When using Kandinsky 2.2 you don't need to pass the `prompt` (but you do with the previous version, Kandinsky 2.1).

```
image = t2i_pipe(image_embeds=image_embeds, negative_image_embeds=negative_image_embeds, height=768, width=768).images[
0
]
image.save("portrait.png")
```
![img](https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/kandinskyv22/%20blue%20eyes.png)

We used the text-to-image pipeline as an example, but the same process applies to all decoding pipelines in Kandinsky 2.2. For more information, please refer to our API section for each pipeline.


## Optimization

Running Kandinsky in inference requires running both a first prior pipeline: [`KandinskyPriorPipeline`]
Expand Down Expand Up @@ -530,85 +328,63 @@ t2i_pipe.unet = torch.compile(t2i_pipe.unet, mode="reduce-overhead", fullgraph=T
After compilation you should see a very fast inference time. For more information,
feel free to have a look at [Our PyTorch 2.0 benchmark](https://huggingface.co/docs/diffusers/main/en/optimization/torch2.0).

<Tip>

To generate images directly from a single pipeline, you can use [`KandinskyCombinedPipeline`], [`KandinskyImg2ImgCombinedPipeline`], [`KandinskyInpaintCombinedPipeline`].
These combined pipelines wrap the [`KandinskyPriorPipeline`] and [`KandinskyPipeline`], [`KandinskyImg2ImgPipeline`], [`KandinskyInpaintPipeline`] respectively into a single
pipeline for a simpler user experience

</Tip>

## Available Pipelines:

| Pipeline | Tasks |
|---|---|
| [pipeline_kandinsky2_2.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2.py) | *Text-to-Image Generation* |
| [pipeline_kandinsky.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky/pipeline_kandinsky.py) | *Text-to-Image Generation* |
| [pipeline_kandinsky2_2_inpaint.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_inpaint.py) | *Image-Guided Image Generation* |
| [pipeline_kandinsky_combined.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky_combined.py) | *End-to-end Text-to-Image, image-to-image, Inpainting Generation* |
| [pipeline_kandinsky_inpaint.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky/pipeline_kandinsky_inpaint.py) | *Image-Guided Image Generation* |
| [pipeline_kandinsky2_2_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_img2img.py) | *Image-Guided Image Generation* |
| [pipeline_kandinsky_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky/pipeline_kandinsky_img2img.py) | *Image-Guided Image Generation* |
| [pipeline_kandinsky2_2_controlnet.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_controlnet.py) | *Image-Guided Image Generation* |
| [pipeline_kandinsky2_2_controlnet_img2img.py](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/kandinsky2_2/pipeline_kandinsky2_2_controlnet_img2img.py) | *Image-Guided Image Generation* |


### KandinskyV22Pipeline

[[autodoc]] KandinskyV22Pipeline
- all
- __call__

### KandinskyV22ControlnetPipeline

[[autodoc]] KandinskyV22ControlnetPipeline
- all
- __call__

### KandinskyV22ControlnetImg2ImgPipeline
### KandinskyPriorPipeline

[[autodoc]] KandinskyV22ControlnetImg2ImgPipeline
[[autodoc]] KandinskyPriorPipeline
- all
- __call__
- interpolate

### KandinskyPipeline

### KandinskyV22Img2ImgPipeline

[[autodoc]] KandinskyV22Img2ImgPipeline
[[autodoc]] KandinskyPipeline
- all
- __call__

### KandinskyV22InpaintPipeline
### KandinskyImg2ImgPipeline

[[autodoc]] KandinskyV22InpaintPipeline
[[autodoc]] KandinskyImg2ImgPipeline
- all
- __call__

### KandinskyV22PriorPipeline

[[autodoc]] ## KandinskyV22PriorPipeline
- all
- __call__
- interpolate

### KandinskyV22PriorEmb2EmbPipeline
### KandinskyInpaintPipeline

[[autodoc]] KandinskyV22PriorEmb2EmbPipeline
[[autodoc]] KandinskyInpaintPipeline
- all
- __call__
- interpolate

### KandinskyPriorPipeline
### KandinskyCombinedPipeline

[[autodoc]] KandinskyPriorPipeline
[[autodoc]] KandinskyCombinedPipeline
- all
- __call__
- interpolate

### KandinskyPipeline

[[autodoc]] KandinskyPipeline
- all
- __call__

### KandinskyImg2ImgPipeline
### KandinskyImg2ImgCombinedPipeline

[[autodoc]] KandinskyImg2ImgPipeline
[[autodoc]] KandinskyImg2ImgCombinedPipeline
- all
- __call__

### KandinskyInpaintPipeline
### KandinskyInpaintCombinedPipeline

[[autodoc]] KandinskyInpaintPipeline
[[autodoc]] KandinskyInpaintCombinedPipeline
- all
- __call__
Loading