-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Description
Describe the bug
I have been tightly following our amazing #4038, I got the new code and tried training for 10000 steps, training works however validation images are all black. I assume with #4038, we should have fixed the black image issue. Any clues?
Reproduction
here's my training config:
export MODEL_DIR="stabilityai/stable-diffusion-xl-base-0.9"
export VAE_DIR="madebyollin/sdxl-vae-fp16-fix"
export OUTPUT_DIR="product_train_output_extract_1stbatch_100k_sdxl0.9"
accelerate launch --mixed_precision="fp16" --multi_gpu train_controlnet_sdxl.py
--pretrained_model_name_or_path=$MODEL_DIR
--output_dir=$OUTPUT_DIR
--pretrained_vae_model_name_or_path=$VAE_DIR
--dataset_name=all_training_full_extract
--image_column="target"
--conditioning_image_column="source"
--caption_column="prompt"
--resolution=768
--learning_rate=2e-5
--validation_image "./val1_extract_source.jpg" "./val2_extract_source.jpg" "./val3_extract_source.jpg" "./popchange.png"
--validation_prompt "a white trash can sitting on a table next to a plant" "a bottle of liquid with flower in it" "a rack with a bunch of shoes on it" "a doll in galaxy"
--train_batch_size=1
--gradient_accumulation_steps=8
--tracker_project_name="product_train_output_extract_1stbatch_100k_sdxl0.9"
--num_train_epochs=20
--report_to=wandb
--resume_from_checkpoint="latest"
Results:
- validation images during training are black after training about 500 steps.
- I then continued training for 10000steps and ran inference with a checkpoint model, using our example code https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md#inference, however the inference images are also black.
Logs
No response
System Info
diffusersversion: 0.19.0.dev0- Platform: Linux-5.15.0-52-generic-x86_64-with-glibc2.29
- Python version: 3.8.10
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- Huggingface_hub version: 0.16.4
- Transformers version: 4.31.0
- Accelerate version: 0.21.0
- xFormers version: not installed
- Using GPU in script?: Yes
- Using distributed or parallel set-up in script?: Single GPU