Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
71618a4
add: support for BLIP generation.
sayakpaul Feb 13, 2023
f206b1d
add: support for editing synthetic images.
sayakpaul Feb 13, 2023
6ed7c20
remove unnecessary comments.
sayakpaul Feb 13, 2023
f95a3fc
add inits and run make fix-copies.
sayakpaul Feb 13, 2023
3881408
version change of diffusers.
sayakpaul Feb 13, 2023
9fe140b
fix: condition for loading the captioner.
sayakpaul Feb 13, 2023
d8de3e0
default conditions_input_image to False.
sayakpaul Feb 13, 2023
e3f51c5
guidance_amount -> cross_attention_guidance_amount
sayakpaul Feb 13, 2023
e08b254
fix inputs to check_inputs()
sayakpaul Feb 13, 2023
45b1739
fix: attribute.
sayakpaul Feb 13, 2023
d44d699
fix: prepare_attention_mask() call.
sayakpaul Feb 13, 2023
2f71a8c
debugging.
sayakpaul Feb 13, 2023
f757663
better placement of references.
sayakpaul Feb 13, 2023
5d44832
remove torch.no_grad() decorations.
sayakpaul Feb 13, 2023
c4fb4ad
put torch.no_grad() context before the first denoising loop.
sayakpaul Feb 13, 2023
ade51d7
detach() latents before decoding them.
sayakpaul Feb 13, 2023
e23fe7b
put deocding in a torch.no_grad() context.
sayakpaul Feb 13, 2023
c76c162
add reconstructed image for debugging.
sayakpaul Feb 13, 2023
b02016b
no_grad(0
sayakpaul Feb 13, 2023
55414e0
apply formatting.
sayakpaul Feb 13, 2023
b07d2cd
address one-off suggestions from the draft PR.
sayakpaul Feb 14, 2023
9bdade1
back to torch.no_grad() and add more elaborate comments.
sayakpaul Feb 14, 2023
8e97366
refactor prepare_unet() per Patrick's suggestions.
sayakpaul Feb 14, 2023
7f61cdb
more elaborate description for .
sayakpaul Feb 14, 2023
b99e508
formatting.
sayakpaul Feb 14, 2023
db2136a
add docstrings to the methods specific to pix2pix zero.
sayakpaul Feb 14, 2023
a64d646
suspecting a redundant noise prediction.
sayakpaul Feb 14, 2023
1afd2b8
needed for gradient computation chain.
sayakpaul Feb 14, 2023
b6655f9
less hacks.
sayakpaul Feb 14, 2023
30cc5ef
fix: attention mask handling within the processor.
sayakpaul Feb 14, 2023
ee59e57
remove attention reference map computation.
sayakpaul Feb 14, 2023
d9cc312
fix: cross attn args.
sayakpaul Feb 14, 2023
e54ff0f
fix: prcoessor.
sayakpaul Feb 14, 2023
a5d95d9
store attention maps.
sayakpaul Feb 14, 2023
02facee
fix: attention processor.
sayakpaul Feb 14, 2023
26be9a1
update docs and better treatment to xa args.
sayakpaul Feb 15, 2023
57e1709
update the final noise computation call.
sayakpaul Feb 15, 2023
f37e25a
change xa args call.
sayakpaul Feb 15, 2023
2d55162
remove xa args option from the pipeline.
sayakpaul Feb 15, 2023
8a17a17
add: docs.
sayakpaul Feb 15, 2023
7bd9a6d
first test.
sayakpaul Feb 15, 2023
164aef4
fix: url call.
sayakpaul Feb 15, 2023
cc1efb2
fix: argument call.
sayakpaul Feb 15, 2023
17dda7c
remove image conditioning for now.
sayakpaul Feb 15, 2023
6a3091e
🚨 add: fast tests.
sayakpaul Feb 15, 2023
df99c53
explicit placement of the xa attn weights.
sayakpaul Feb 15, 2023
9a58071
add: slow tests 🐢
sayakpaul Feb 15, 2023
9df8999
fix: tests.
sayakpaul Feb 15, 2023
c20870f
edited direction embedding should be on the same device as prompt_emb…
sayakpaul Feb 15, 2023
731267b
debugging message.
sayakpaul Feb 15, 2023
87d5c15
debugging.
sayakpaul Feb 15, 2023
d81e110
add pix2pix zero pipeline for a non-deterministic test.
sayakpaul Feb 15, 2023
48e8f8c
debugging/
sayakpaul Feb 15, 2023
522e882
remove debugging message.
sayakpaul Feb 15, 2023
8884e10
make caption generation _
sayakpaul Feb 15, 2023
872cf9d
Merge branch 'main' into pix2pix-zero
sayakpaul Feb 15, 2023
d23357f
address comments (part I).
sayakpaul Feb 15, 2023
ca8855e
address PR comments (part II)
sayakpaul Feb 15, 2023
46cdfc8
fix: DDPM test assertion.
sayakpaul Feb 15, 2023
75f918e
refactor doc.
sayakpaul Feb 15, 2023
a86277e
address PR comments (part III).
sayakpaul Feb 15, 2023
696d802
fix: type annotation for the scheduler.
sayakpaul Feb 15, 2023
d166a47
apply styling.
sayakpaul Feb 15, 2023
35e11a9
Merge branch 'main' into pix2pix-zero
sayakpaul Feb 16, 2023
4ff6f12
skip_mps and add note on embeddings in the docs.
sayakpaul Feb 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -151,6 +151,8 @@
title: Stable-Diffusion-Latent-Upscaler
- local: api/pipelines/stable_diffusion/pix2pix
title: InstructPix2Pix
- local: api/pipelines/stable_diffusion/pix2pix_zero
title: Pix2Pix Zero
title: Stable Diffusion
- local: api/pipelines/stable_diffusion_2
title: Stable Diffusion 2
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/api/pipelines/stable_diffusion/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ For more details about how Stable Diffusion works and how it differs from the ba
| [StableDiffusionUpscalePipeline](./upscale) | **Experimental** – *Text-Guided Image Super-Resolution * | | Coming soon
| [StableDiffusionLatentUpscalePipeline](./latent_upscale) | **Experimental** – *Text-Guided Image Super-Resolution * | | Coming soon
| [StableDiffusionInstructPix2PixPipeline](./pix2pix) | **Experimental** – *Text-Based Image Editing * | | [InstructPix2Pix: Learning to Follow Image Editing Instructions](https://huggingface.co/spaces/timbrooks/instruct-pix2pix)
| [StableDiffusionPix2PixZeroPipeline](./pix2pix_zero) | **Experimental** – *Text-Based Image Editing * | | [Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027)



Expand Down
103 changes: 103 additions & 0 deletions docs/source/en/api/pipelines/stable_diffusion/pix2pix_zero.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
<!--Copyright 2023 The HuggingFace Team. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
-->

# Zero-shot Image-to-Image Translation

## Overview

[Zero-shot Image-to-Image Translation](https://arxiv.org/abs/2302.03027) by Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, and Jun-Yan Zhu.

The abstract of the paper is the following:

*Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.*

Resources:

* [Project Page](https://pix2pixzero.github.io/).
* [Paper](https://arxiv.org/abs/2302.03027).
* [Original Code](https://github.com/pix2pixzero/pix2pix-zero).

## Tips

* The pipeline exposes two arguments namely `source_embeds` and `target_embeds`
that let you control the direction of the semantic edits in the final image to be generated. Let's say,
you wanted to translate from "cat" to "dog". In this case, the edit direction will be "cat -> dog". To reflect
this in the pipeline, you simply have to set the embeddings related to the phrases including "cat" to
`source_embeds` and "dog" to `target_embeds`. Refer to the code example below for more details.
* When you're using this pipeline from a prompt, specify the _source_ concept in the prompt. Taking
the above example, a valid input prompt would be: "a high resolution painting of a **cat** in the style of van gough".
* If you wanted to reverse the direction in the example above, i.e., "dog -> cat", then it's recommended to:
* Swap the `source_embeds` and `target_embeds`.
* Change the input prompt to include "dog".
* To learn more about how the source and target embeddings are generated, refer to the [original
paper](https://arxiv.org/abs/2302.03027).

## Available Pipelines:

| Pipeline | Tasks | Demo
|---|---|:---:|
| [StableDiffusionPix2PixZeroPipeline](https://github.com/huggingface/diffusers/blob/main/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion_pix2pix_zero.py) | *Text-Based Image Editing* | [🤗 Space] (soon) |

<!-- TODO: add Colab -->

## Usage example

**Based on an image generated with the input prompt**

```python
import requests
import torch

from diffusers import DDIMScheduler, StableDiffusionPix2PixZeroPipeline


def download(embedding_url, local_filepath):
r = requests.get(embedding_url)
with open(local_filepath, "wb") as f:
f.write(r.content)


model_ckpt = "CompVis/stable-diffusion-v1-4"
pipeline = StableDiffusionPix2PixZeroPipeline.from_pretrained(
model_ckpt, conditions_input_image=False, torch_dtype=torch.float16
)
pipeline.scheduler = DDIMScheduler.from_config(pipeline.scheduler.config)
pipeline.to("cuda")

prompt = "a high resolution painting of a cat in the style of van gough"
src_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/cat.pt"
target_embs_url = "https://github.com/pix2pixzero/pix2pix-zero/raw/main/assets/embeddings_sd_1.4/dog.pt"

for url in [src_embs_url, target_embs_url]:
download(url, url.split("/")[-1])

src_embeds = torch.load(src_embs_url.split("/")[-1])
target_embeds = torch.load(target_embs_url.split("/")[-1])

images = pipeline(
prompt,
source_embeds=src_embeds,
target_embeds=target_embeds,
num_inference_steps=50,
cross_attention_guidance_amount=0.15,
).images
images[0].save("edited_image_dog.png")
```

**Based on an input image**

_Coming soon_

## StableDiffusionPix2PixZeroPipeline
[[autodoc]] StableDiffusionPix2PixZeroPipeline
- __call__
- all
1 change: 1 addition & 0 deletions src/diffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@
StableDiffusionLatentUpscalePipeline,
StableDiffusionPipeline,
StableDiffusionPipelineSafe,
StableDiffusionPix2PixZeroPipeline,
StableDiffusionUpscalePipeline,
StableUnCLIPImg2ImgPipeline,
StableUnCLIPPipeline,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
StableDiffusionInstructPix2PixPipeline,
StableDiffusionLatentUpscalePipeline,
StableDiffusionPipeline,
StableDiffusionPix2PixZeroPipeline,
StableDiffusionUpscalePipeline,
StableUnCLIPImg2ImgPipeline,
StableUnCLIPPipeline,
Expand Down
1 change: 1 addition & 0 deletions src/diffusers/pipelines/stable_diffusion/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ class StableDiffusionPipelineOutput(BaseOutput):
from ...utils.dummy_torch_and_transformers_objects import StableDiffusionDepth2ImgPipeline
else:
from .pipeline_stable_diffusion_depth2img import StableDiffusionDepth2ImgPipeline
from .pipeline_stable_diffusion_pix2pix_zero import StableDiffusionPix2PixZeroPipeline


try:
Expand Down
Loading