diff --git a/docs/source/en/conceptual/evaluation.mdx b/docs/source/en/conceptual/evaluation.mdx index 98821010e203..2721adea0c16 100644 --- a/docs/source/en/conceptual/evaluation.mdx +++ b/docs/source/en/conceptual/evaluation.mdx @@ -310,7 +310,7 @@ for idx in range(len(dataset)): edited_images.append(edited_image) ``` -To measure the directional similarity, we first load CLIP's image and text encoders. +To measure the directional similarity, we first load CLIP's image and text encoders: ```python from transformers import ( @@ -329,7 +329,7 @@ image_encoder = CLIPVisionModelWithProjection.from_pretrained(clip_id).to(device Notice that we are using a particular CLIP checkpoint, i.e., `openai/clip-vit-large-patch14`. This is because the Stable Diffusion pre-training was performed with this CLIP variant. For more details, refer to the [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix#diffusers.StableDiffusionInstructPix2PixPipeline.text_encoder). -Next, we prepare a PyTorch `nn.module` to compute directional similarity: +Next, we prepare a PyTorch `nn.Module` to compute directional similarity: ```python import torch.nn as nn @@ -410,7 +410,7 @@ It should be noted that the `StableDiffusionInstructPix2PixPipeline` exposes t We can extend the idea of this metric to measure how similar the original image and edited version are. To do that, we can just do `F.cosine_similarity(img_feat_two, img_feat_one)`. For these kinds of edits, we would still want the primary semantics of the images to be preserved as much as possible, i.e., a high similarity score. -We can use these metrics for similar pipelines such as the[`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline)`. +We can use these metrics for similar pipelines such as the [`StableDiffusionPix2PixZeroPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/pix2pix_zero#diffusers.StableDiffusionPix2PixZeroPipeline). @@ -550,7 +550,7 @@ FID results tend to be fragile as they depend on a lot of factors: * The image format (not the same if we start from PNGs vs JPGs). Keeping that in mind, FID is often most useful when comparing similar runs, but it is -hard to to reproduce paper results unless the authors carefully disclose the FID +hard to reproduce paper results unless the authors carefully disclose the FID measurement code. These points apply to other related metrics too, such as KID and IS.