-
Notifications
You must be signed in to change notification settings - Fork 6.7k
[ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL #4694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Wow, I really need it. Can it work now? I always generate black pictures with it ? Can you post the api usage, thanks a lot ! |
I discovered some issues today, but it should generate sensible images, rather than black ones ... Let me complete this by this week. Feel free to add my discord: harutatsuakiyama |
…sers into sdxl_ctrl_inpaint
I fixed the issue yesterday. The code should work as expected. |
|
I use the following pipeline, but still generate black image. def inpaint_with_controlnet():
import torch
from diffusers import StableDiffusionXLInpaintPipeline
from diffusers.utils import load_image
from diffusers import StableDiffusionXLControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from pipeline_controlnet_inpaint_sd_xl import StableDiffusionXLControlNetInpaintPipeline
img_url = "https://user-images.githubusercontent.com/8084808/262496067-e01fb3c9-aece-4560-ae64-6354fdd789d7.png"
mask_url = "https://user-images.githubusercontent.com/8084808/262496139-234e0049-43ab-415b-ae6d-4cbb96055f6d.png"
control_image_url = img_url
# Compute openpose conditioning image.
from controlnet_aux import OpenposeDetector
openpose = OpenposeDetector.from_pretrained("lllyasviel/ControlNet")
control_image = openpose(load_image(control_image_url))
controlnet = ControlNetModel.from_pretrained("thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetInpaintPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
)
pipe.to("cuda")
init_image = load_image(img_url).convert("RGB")
mask_image = load_image(mask_url).convert("RGB")
prompt = "hand"
strength=0.5
controlnet_conditioning_scale = 1.0
image = pipe(
prompt=prompt,
image=init_image,
mask_image=mask_image,
control_image=control_image,
controlnet_conditioning_scale=controlnet_conditioning_scale,
strength=strength,
).images[0]
image.save('result.jpg') |
Thank you for the code! You need to use torch.float32 instead of torch.float16. I tested the following code, should work: Feel free to add my discord and we can discuss there. |
|
Very cool PR! @yiyixuxu can you give this a look? :-) |
yiyixuxu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! excellent work!
I think 2 main thing left are:
- Refactor with a mask_image_processor https://github.com/huggingface/diffusers/pull/4444/files
- Add MultiControlnet support
|
|
||
|
|
||
| # Copied from diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion_inpaint.prepare_mask_and_masked_image | ||
| def prepare_mask_and_masked_image(image, mask, height, width, return_image=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just deprecated this function :)
in this PR #4444 (comment)
let's update this PR too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
self.image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor)
self.mask_processor = VaeImageProcessor(
vae_scale_factor=self.vae_scale_factor, do_normalize=False, do_binarize=True, do_convert_grayscale=True)
self.control_image_processor = VaeImageProcessor(vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False)| self.control_image_processor = VaeImageProcessor( | ||
| vae_scale_factor=self.vae_scale_factor, do_convert_rgb=True, do_normalize=False | ||
| ) | ||
| self.watermark = StableDiffusionXLWatermarker() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a mask_processor here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| generator = torch.Generator(device=device).manual_seed(seed) | ||
|
|
||
| controlnet_embedder_scale_factor = 2 | ||
| control_image = randn_tensor( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we accept image tensor in [0,1] range, so should not use randn_tensor here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Corrected.
control_image = (
floats_tensor(
(1, 3, 32 * controlnet_embedder_scale_factor, 32 * controlnet_embedder_scale_factor),
rng=random.Random(seed),
)
.to(device)
.cpu()
)| init_image = init_image.cpu().permute(0, 2, 3, 1)[0] | ||
|
|
||
| controlnet_embedder_scale_factor = 2 | ||
| image = Image.fromarray(np.uint8(init_image)).convert("RGB").resize((64, 64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dummy image and mask_image are just 2 black images here
let's do something similar as https://github.com/huggingface/diffusers/pull/4536/files#diff-b65a24df736726ca6f92c71567b77c2a9832ee6142ee2dcbdb08e9addcb6da4b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Followed the link's code,
image = floats_tensor((1, 3, 32, 32), rng=random.Random(seed)).to(device)
image = image.cpu().permute(0, 2, 3, 1)[0]
mask_image = torch.ones_like(image)
controlnet_embedder_scale_factor = 2
control_image = (
floats_tensor(
(1, 3, 32 * controlnet_embedder_scale_factor, 32 * controlnet_embedder_scale_factor),
rng=random.Random(seed),
)
.to(device)
.cpu()
)| assert np.abs(image_slice_1.flatten() - image_slice_3.flatten()).max() > 1e-4 | ||
|
|
||
| # Ignore float16 for SDXL | ||
| def test_float16_inference(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we disable this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was unintentional. Removed the disabling.
|
Thank you @yiyixuxu and @patrickvonplaten. I will work on comments this week. |
|
Borrowing ideas of PR 4811. Working in progress. |
|
Hey @viiika, Could we maybe work on this PR together? @harutatsuakiyama can you maybe invite @viiika as a collaborator for this PR to your fork so that we can work here? @viiika , it's quite rare that we have two PRs about the same feature popping up almost at the same time - very sorry for the potentially duplicated work. Would it be ok to pass onto this PR because:
That would be very nice if we could collaborate here 🙏 |
| return mask | ||
|
|
||
|
|
||
| def prepare_mask_and_masked_image(image, mask, height, width, return_image: bool = False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove this function and instead use the new mask processor logic: #4444
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harutatsuakiyama I think you can delete this function now if not used?
|
I still insist that #4811 already support some new features mentioned in #4694, like MultiControlnet, the api usage, no randn_tensor for control_image, even refactor with a mask_image_processor you mentioned just now, etc. And the coding style is more consistent with pipeline_stable_diffusion_xl_inpaint, compared to StableDiffusionControlNetInpaintPipeline adapted from StableDiffusionInpaintPipeline. I believe #4811 requires almost no effort to review, because it and the latest pipeline_stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint are updated synchronously. Despite this, merge which PR depends you. And I believe if you choose #4811, it may take less than a day for us to merge. |
|
Also, if you still insist we should continue with #4694, that's fine with me and I can try my best to help fixing problems. I just think merging #4694 will take a few weeks to handle many problems, and might introduce some design inconsistencies. A lot of current research relies on this pipeline, so I just hope it gets merged soon. |
|
Hi @yiyixuxu. Thanks for the review. I have addressed the review comments:
My local tests show no issues. Please let me know if further changes are required :-) |
| ] = None, | ||
| height: Optional[int] = None, | ||
| width: Optional[int] = None, | ||
| strength: float = 1.0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| strength: float = 1.0, | |
| strength: float =0.9999, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, but why?
| The height in pixels of the generated image. | ||
| width (`int`, *optional*, defaults to self.unet.config.sample_size * self.vae_scale_factor): | ||
| The width in pixels of the generated image. | ||
| strength (`float`, *optional*, defaults to 1.): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| strength (`float`, *optional*, defaults to 1.): | |
| strength (`float`, *optional*, defaults to 0.9999): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed, can I curiously ask why?
|
|
||
| control_image = control_images | ||
| else: | ||
| assert False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert False | |
| raise ValueError(f"{controlnet.__class__} is not supported.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed
patrickvonplaten
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good to merge once @yiyixuxu is ok with it :-)
|
@viiika could you maybe drop your email here so that we can add you as a co-author via https://docs.github.com/en/pull-requests/committing-changes-to-your-project/creating-and-editing-commits/creating-a-commit-with-multiple-authors |
Sure. My primary GitHub email for this account is 1355864570@qq.com. Thank you very much! |
|
@harutatsuakiyama |
@harutatsuakiyama could you add @viiika as an author here that would be very nice ❤️ |
Co-authored-by: Jiabin Bai 1355864570@qq.com
|
Hi @yiyixuxu, @patrickvonplaten, and @viiika, I have addressed the new code review comments:
For the failing tests, it seems previous failure was due to Internet issues (500 bad gate). My local tests can pass. Please let me know if further changes are required. |
|
@harutatsuakiyama |
|
Thank you @yiyixuxu. I just realized that diffusers.utils.dummy_torch_and_transformers_objects.py has some style problems. I have fixed them. The following shows outputs of Let me know if other things are required.
python utils/check_copies.py --fix_and_overwrite
python utils/check_dummies.py --fix_and_overwrite
black examples scripts src tests utils
All done! ✨ 🍰 ✨
613 files left unchanged.
ruff examples scripts src tests utils --fix
examples/community/lpw_stable_diffusion_xl.py:1141:42: E721 Do not compare types, use `isinstance()`
examples/community/stable_diffusion_xl_reference.py:703:42: E721 Do not compare types, use `isinstance()`
src/diffusers/experimental/rl/value_guided_sampling.py:79:12: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/audio_diffusion/pipeline_audio_diffusion.py:181:12: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl.py:827:42: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_img2img.py:909:20: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py:1132:20: E721 Do not compare types, use `isinstance()`
src/diffusers/pipelines/t2i_adapter/pipeline_stable_diffusion_xl_adapter.py:877:42: E721 Do not compare types, use `isinstance()`
tests/pipelines/consistency_models/test_consistency_models.py:190:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:112:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:548:12: E721 Do not compare types, use `isinstance()`
tests/pipelines/unidiffuser/test_unidiffuser.py:651:12: E721 Do not compare types, use `isinstance()`
Found 12 errors.
make: *** [Makefile:59: style] Error 1 |
|
Ahh I see, I need to run the test for doc builder. Let me do that. I aim that to be the last test. Sorry for failing test again. Can I ask for hints about how to fix this error? @yiyixuxu Also, can we get access to run tests, for more efficient debugging purposes? I have tried locally, and seem to be correct ... |
| >>> mask_image = load_image(mask_url).convert("RGB") | ||
|
|
||
| >>> original_width, original_height = init_image.size | ||
| >>> new_width = int(original_width / 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we resize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to save CUDA memory. Removed in the new code.
| self, | ||
| prompt: Union[str, List[str]] = None, | ||
| prompt_2: Optional[Union[str, List[str]]] = None, | ||
| image: Union[ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's use a custom type PipelineImageInput (was recently introduced)
diffusers/src/diffusers/pipelines/stable_diffusion_xl/pipeline_stable_diffusion_xl_inpaint.py
Line 891 in 5eeedd9
| image: PipelineImageInput = None, |
| List[PIL.Image.Image], | ||
| List[np.ndarray], | ||
| ] = None, | ||
| mask_image: Union[torch.FloatTensor, PIL.Image.Image] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think mask_image should be of same type as image no? PipelineImageInput
| latent_model_input = torch.cat([latent_model_input, mask, masked_image_latents], dim=1) | ||
|
|
||
| # predict the noise residual | ||
| added_cond_kwargs = {"text_embeds": add_text_embeds, "time_ids": add_time_ids} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this line is needed? it has not changed from line 1452
| projection_class_embeddings_input_dim=80, # 6 * 8 + 32 | ||
| cross_attention_dim=64, | ||
| ) | ||
| torch.manual_seed(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to fix the seed here? I don't think we have any randomness here, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| image_latents_params = TEXT_TO_IMAGE_IMAGE_PARAMS | ||
|
|
||
| def get_dummy_components(self): | ||
| torch.manual_seed(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| projection_class_embeddings_input_dim=80, # 6 * 8 + 32 | ||
| cross_attention_dim=64, | ||
| ) | ||
| torch.manual_seed(0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, follow test here: https://github.com/huggingface/diffusers/blob/main/tests/pipelines/controlnet/test_controlnet_sdxl.py
|
regards to the quality test, make sure you are up to date? cc @DN6 here we need help with tests! |
|
I found out the test issues, some lines in doc_string is too long. |
|
Hi @yiyixuxu. I removed For now, I strongly believe the code should be able to pass tests (finger crossed 🙏) |
|
Hi @yiyixuxu, thanks for the new review round. I have addressed the comments:
Also, I strongly believe the code should be able to pass tests (finger crossed 🙏) Let me know if further changes are required. |
…uggingface#4694) * [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL Co-authored-by: Jiabin Bai 1355864570@qq.com --------- Co-authored-by: Harutatsu Akiyama <kf.zy.qin@gmail.com>
…uggingface#4694) * [ControlNet SDXL Inpainting] Support inpainting of ControlNet SDXL Co-authored-by: Jiabin Bai 1355864570@qq.com --------- Co-authored-by: Harutatsu Akiyama <kf.zy.qin@gmail.com>
Overview:
This PR introduces the implementation of the inference pipeline for ControlNet with SDXL and inpainting.
Files Modified/Added:
srcs/pipelines/controlnet/pipeline_control_inpaint_sd_xl.pytests/pipelines/controlnet/test_controlnet_inpaint_sdx.pyVisualizations:
To better understand the impact and functionality of the implemented pipeline, the following visualizations are provided:
Overview:
This PR introduces the implementation of the inference pipeline for ControlNet with SDXL and inpainting.
Files Modified/Added:
srcs/pipelines/controlnet/pipeline_control_inpaint_sd_xl.pytests/pipelines/controlnet/test_controlnet_inpaint_sdx.pyExample Usage
Features