Skip to content

StableDiffusionControlNetInpaintPipeline strength=1 doesn't work #3582

@undernf

Description

@undernf

Describe the bug

cc @williamberman

Lets say you have an image with a subject and a background, you put a mask over the background to repaint it and set the strength=1, so in theory the background should be completely ignored. But this isn't what happens, the background carries some weight in the regeneration.

It's only subtle but it's more obvious if you use, for example, an image with a pink background and then try to regenerate a new background which is usually not pink (e.g. a forest).

I don't know enough about how this works but I can see from the below example that some information from the background is being passed to the inpainted area when it shouldn't.

Reproduction

The below code uses img_red_bg_url and the output will have a red BG (which shouldn't happen because strength=1), if you change it to use img which has a pink BG, the output will also have a pink BG, again proving the original BG has some weighting in the generation.

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)

pipe = DiffusionPipeline.from_pretrained(
    "runwayml/stable-diffusion-inpainting",
    custom_pipeline="stable_diffusion_controlnet_inpaint_img2img",
    controlnet=controlnet,
    safety_checker=None,
    torch_dtype=torch.float16
).to(device)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)

img_url = "https://i.ibb.co/0ZK7yL0/img.png"
img_red_bg_url = "https://i.ibb.co/zHHZTkZ/img-red-bg.png"
canny_url = "https://i.ibb.co/rp9FYCX/canny.png"
mask_url = "https://i.ibb.co/FK4DNNK/mask.png"

def decode_image(image_url):
    print("Decode image: " + image_url)
    req = urllib.request.urlopen(image_url)
    arr = np.array(bytearray(req.read()), dtype=np.uint8)
    imageBGR = cv2.imdecode(arr, cv2.IMREAD_UNCHANGED)
    imageRGB = cv2.cvtColor(imageBGR, cv2.COLOR_BGRA2RGB)

    return Image.fromarray(imageRGB)

img = decode_image(img_url)
img_red_bg = decode_image(img_red_bg_url)
canny = decode_image(canny_url)
mask = decode_image(mask_url)

with autocast("cuda"), torch.inference_mode():
    output = pipe(
        prompt="product in a forest",
        negative_prompt="",
        image=img_red_bg,
        mask_image=mask,
        controlnet_conditioning_image=canny,
        height=512,
        width=512,
        num_images_per_prompt=1,
        num_inference_steps=20,
        guidance_scale=7.5,
        strength=1
    )

    for image in output.images:
        display(image)

Logs

No response

System Info

  • diffusers version: 0.17.0.dev0
  • Platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.31
  • Python version: 3.10.10
  • PyTorch version (GPU?): 1.13.1+cu117 (True)
  • Huggingface_hub version: 0.14.1
  • Transformers version: 4.27.3
  • Accelerate version: 0.17.1
  • xFormers version: 0.0.16
  • Using GPU in script?: y
  • Using distributed or parallel set-up in script?: n

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingstaleIssues that haven't received updates

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions