-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Description
For the last few months, we have been collaborating with our contributors to ensure we support LoRA effectively and efficiently from Diffusers:
1. Training support
✅ DreamBooth (letting users perform LoRA fine-tuning of both UNet and text-encoder). There were some issues in the text encoder part which are now being fixed in #3437. Thanks to @takuma104.
✅ Vanilla text-to-image fine-tuning. We support only the fine-tuning of UNet with LoRA purposefully since here we'd assume that the number of image-caption pairs is higher than what is typically used for DreamBooth and therefore, text encoder fine-tuning is probably an overkill.
2. Interoperability
With #3437, we're introducing limited support for loading A1111 CivitAI checkpoints with pipeline.load_lora_weights(). This has been a widely requested feature (see #3064 as an example).
We do provide a convert_lora_safetensor_to_diffusers.py script as well that allows for converting A1111 LoRA checkpoints (potentially non-exhaustive) and merging them to the text encoder and the UNet of a DiffusionPipeline. However, this doesn't allow switching the attention processor back to the default one, unlike how it's currently in Diffusers. Check out https://huggingface.co/docs/diffusers/main/en/training/lora for more details. For inference-only and definitive workflows (where one doesn't need to switch attention processors), it caters to many use cases.
3. xformers support for efficient inference
Once LoRA parameters are loaded into a pipeline, xformers should work seamlessly. There was apparently a problem with that and it's fixed in #3556.
4. PT 2.0 SDPA optimization
See: #3594
5. torch.compile() compatibility with LoRA
Once 4. is settled, we should be able to take advantage of torch.compile().
6. Introduction of scale for control the contributions from the text encoder LoRA
See #3480. We already support passing scale as a part of cross_attention_kwargs for the UNet LoRA.
7. Supporting multiple LoRAs
@takuma104 proposed a hook-based design here: #3064 (comment)
I hope this helps to provide a consolidated view of where we're at regarding supporting LoRA from Diffusers.