train_dreambooth_lora_sdxl_advanced.py and train_dreambooth_lora_sdxl.py do not load previously saved checkpoints correctly

### Describe the bug

both examples/dreambooth/train_dreambooth_lora_sdxl.py and examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py seem to have an issue when resuming training from a previously saved checkpoint.

Training and saving checkpoints seems to work correctly, however, when resuming from a previously saved checkpoint, the following messages are produced at script startup:
Resuming from checkpoint checkpoint-10

```
12/27/2023 16:29:22 - INFO - accelerate.accelerator - Loading states from xqc/checkpoint-10
Loading unet.
12/27/2023 16:29:22 - INFO - peft.tuners.tuners_utils - Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
Loading text_encoder.
12/27/2023 16:29:23 - INFO - peft.tuners.tuners_utils - Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
```

Training appears to continue normally, however, all new checkpoints saved after this will be significantly larger than the previous checkpoints:
```
(xl) localhost /media/nvme/xl/diffusers/examples/dreambooth # du -sch xqc/*
87M     xqc/checkpoint-10
110M    xqc/checkpoint-15
110M    xqc/checkpoint-20
110M    xqc/checkpoint-25
87M     xqc/checkpoint-5
88K     xqc/logs
494M    total
```

Once training with a resumed checkpoint is completed, there will be a large dump of layer names with a message saying that the model contains layers that do not match. (Full error message below)

To me, this looks like the checkpoints are being loaded incorrectly and ignored, and then a new adapter is being trained from scratch, and then both versions, old and new, are saved in the final lora.

### Reproduction

To reproduce this issue, follow the following steps:

1. Run either train_dreambooth_lora_sdxl*.py script with appropriate parameters, including `--checkpointing_steps` (preferably set to a low number to reproduce this issue quickly).
2. After at least one or two checkpoints have been saved, either stop the script or wait for it to complete.
3. Rerun the same script, but also include the `--resume_from_checkpoint latest` or `--resume_from_checkpoint checkpoint-x`.
4. Observe the effects listed above (PEFT warning message on startup, later checkpoint file sizes)
5. After resumed training is completed, attempt to load the finished lora. (inference will be successful, but lora performance does not seem correct).
6. Observe the error message produced.


### Logs

```shell
My full command-line with all arguments looks like this:

python train_dreambooth_lora_sdxl.py --pretrained_model_name_or_path ../../../models/colossus_v5.3 --instance_data_dir /media/nvme/datasets/combined/ --output_dir xqc --resolution 1024 --instance_prompt 'a photo of hxq' --train_text_encoder --num_train_epochs 1 --train_batch_size 1 --gradient_checkpointing --checkpointing_steps 5 --gradient_accumulation_steps 1 --learning_rate 0.0001 --resume_from_checkpoint latest
```

Error produced during inference with the affected lora (truncated because of length):
```
Loading adapter weights from state_dict led to unexpected keys not found in the model:  ['down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.lora_A_1.default_0.weight', 

*** TRUNCATED HERE ***

'mid_block.attentions.0.transformer_blocks.8.attn1.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_q.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_q.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_q.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_q.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_q.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_q.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.lora_B_1.default_0.weight'].
Loading adapter weights from None led to unexpected keys not found in the model:  ['text_model.encoder.layers.0.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.out_proj.lora_B_1.default_0.weight'].
```
```


### System Info

Latest diffusers - master branch pulled on 2023/12/27.

OS - Linux 6.1.9
```
(xl) localhost /media/nvme/xl # uname -a
Linux localhost 6.1.9-noinitramfs #4 SMP PREEMPT_DYNAMIC Fri Feb 10 03:01:14 -00 2023 x86_64 Intel(R) Core(TM) i5-9500T CPU @ 2.20GHz GenuineIntel GNU/Linux
```

python - Python 3.10.9


```
(xl) localhost /media/nvme/xl # diffusers-cli env
Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `diffusers` version: 0.25.0.dev0
- Platform: Linux-6.1.9-noinitramfs-x86_64-Intel-R-_Core-TM-_i5-9500T_CPU_@_2.20GHz-with-glibc2.36
- Python version: 3.10.9
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Huggingface_hub version: 0.20.1
- Transformers version: 4.36.2
- Accelerate version: 0.23.0
- xFormers version: 0.0.23.post1
- Using GPU in script?: No (however, I believe it will occur on GPU as well)
- Using distributed or parallel set-up in script?: No
```

### Who can help?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

train_dreambooth_lora_sdxl_advanced.py and train_dreambooth_lora_sdxl.py do not load previously saved checkpoints correctly #6366

Describe the bug

Reproduction

Logs

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

train_dreambooth_lora_sdxl_advanced.py and train_dreambooth_lora_sdxl.py do not load previously saved checkpoints correctly #6366

Description

Describe the bug

Reproduction

Logs

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions