-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Description
Describe the bug
Lets say you have an image with a subject and a background, you put a mask over the background to repaint it and set the strength=1, so in theory the background should be completely ignored. But this isn't what happens, the background carries some weight in the regeneration.
It's only subtle but it's more obvious if you use, for example, an image with a pink background and then try to regenerate a new background which is usually not pink (e.g. a forest).
I don't know enough about how this works but I can see from the below example that some information from the background is being passed to the inpainted area when it shouldn't.
Reproduction
The below code uses img_red_bg_url and the output will have a red BG (which shouldn't happen because strength=1), if you change it to use img which has a pink BG, the output will also have a pink BG, again proving the original BG has some weighting in the generation.
controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = DiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-inpainting",
custom_pipeline="stable_diffusion_controlnet_inpaint_img2img",
controlnet=controlnet,
safety_checker=None,
torch_dtype=torch.float16
).to(device)
pipe.scheduler = DDIMScheduler.from_config(pipe.scheduler.config)
img_url = "https://i.ibb.co/0ZK7yL0/img.png"
img_red_bg_url = "https://i.ibb.co/zHHZTkZ/img-red-bg.png"
canny_url = "https://i.ibb.co/rp9FYCX/canny.png"
mask_url = "https://i.ibb.co/FK4DNNK/mask.png"
def decode_image(image_url):
print("Decode image: " + image_url)
req = urllib.request.urlopen(image_url)
arr = np.array(bytearray(req.read()), dtype=np.uint8)
imageBGR = cv2.imdecode(arr, cv2.IMREAD_UNCHANGED)
imageRGB = cv2.cvtColor(imageBGR, cv2.COLOR_BGRA2RGB)
return Image.fromarray(imageRGB)
img = decode_image(img_url)
img_red_bg = decode_image(img_red_bg_url)
canny = decode_image(canny_url)
mask = decode_image(mask_url)
with autocast("cuda"), torch.inference_mode():
output = pipe(
prompt="product in a forest",
negative_prompt="",
image=img_red_bg,
mask_image=mask,
controlnet_conditioning_image=canny,
height=512,
width=512,
num_images_per_prompt=1,
num_inference_steps=20,
guidance_scale=7.5,
strength=1
)
for image in output.images:
display(image)Logs
No response
System Info
diffusersversion: 0.17.0.dev0- Platform: Linux-5.15.0-58-generic-x86_64-with-glibc2.31
- Python version: 3.10.10
- PyTorch version (GPU?): 1.13.1+cu117 (True)
- Huggingface_hub version: 0.14.1
- Transformers version: 4.27.3
- Accelerate version: 0.17.1
- xFormers version: 0.0.16
- Using GPU in script?: y
- Using distributed or parallel set-up in script?: n