Skip to content

controlnet with sdxl infer black images even after rebasing #4038  #4185

@yutongli

Description

@yutongli

Describe the bug

I have been tightly following our amazing #4038, I got the new code and tried training for 10000 steps, training works however validation images are all black. I assume with #4038, we should have fixed the black image issue. Any clues?

Reproduction

here's my training config:

export MODEL_DIR="stabilityai/stable-diffusion-xl-base-0.9"
export VAE_DIR="madebyollin/sdxl-vae-fp16-fix"
export OUTPUT_DIR="product_train_output_extract_1stbatch_100k_sdxl0.9"

accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet_sdxl.py
--pretrained_model_name_or_path=$MODEL_DIR
--output_dir=$OUTPUT_DIR
--pretrained_vae_model_name_or_path=$VAE_DIR
--dataset_name=all_training_full_extract
--image_column="target"
--conditioning_image_column="source"
--caption_column="prompt"
--resolution=768
--learning_rate=2e-5
--validation_image "./val1_extract_source.jpg" "./val2_extract_source.jpg" "./val3_extract_source.jpg" "./popchange.png"
--validation_prompt "a white trash can sitting on a table next to a plant" "a bottle of liquid with flower in it" "a rack with a bunch of shoes on it" "a doll in galaxy"
--train_batch_size=1
--gradient_accumulation_steps=8
--tracker_project_name="product_train_output_extract_1stbatch_100k_sdxl0.9"
--num_train_epochs=20
--report_to=wandb
--resume_from_checkpoint="latest"

Results:

  1. validation images during training are black after training about 500 steps.
  2. I then continued training for 10000steps and ran inference with a checkpoint model, using our example code https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md#inference, however the inference images are also black.

Logs

No response

System Info

  • diffusers version: 0.19.0.dev0
  • Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.29
  • Python version: 3.8.10
  • PyTorch version (GPU?): 2.0.1+cu117 (True)
  • Huggingface_hub version: 0.16.4
  • Transformers version: 4.31.0
  • Accelerate version: 0.21.0
  • xFormers version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: Single GPU

Who can help?

@sayakpaul @patrickvonplaten

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions