[LoRA] Discussions on ensuring robust LoRA support in Diffusers

For the last few months, we have been collaborating with our contributors to ensure we support LoRA effectively and efficiently from Diffusers:

**1. Training support**
  
✅ [DreamBooth](https://github.com/huggingface/diffusers/tree/main/examples/dreambooth) (letting users perform LoRA fine-tuning of both UNet and text-encoder). There were some issues in the text encoder part which are now being fixed in https://github.com/huggingface/diffusers/pull/3437. Thanks to @takuma104. 
✅ [Vanilla text-to-image fine-tuning](https://github.com/huggingface/diffusers/tree/main/examples/text_to_image). We support only the fine-tuning of UNet with LoRA purposefully since here we'd assume that the number of image-caption pairs is higher than what is typically used for DreamBooth and therefore, text encoder fine-tuning is probably an overkill. 

**2. Interoperability**
   
With https://github.com/huggingface/diffusers/pull/3437, we're introducing limited support for loading A1111 CivitAI checkpoints with `pipeline.load_lora_weights()`. This has been a widely requested feature (see https://github.com/huggingface/diffusers/issues/3064 as an example). 

We do provide a [`convert_lora_safetensor_to_diffusers.py`](https://github.com/huggingface/diffusers/blob/main/scripts/convert_lora_safetensor_to_diffusers.py) script as well that allows for converting A1111 LoRA checkpoints (potentially non-exhaustive) and merging them to the text encoder and the UNet of a `DiffusionPipeline`. However, this doesn't allow switching the attention processor back to the default one, unlike how it's currently in Diffusers. Check out https://huggingface.co/docs/diffusers/main/en/training/lora for more details. For inference-only and definitive workflows (where one doesn't need to switch attention processors), it caters to many use cases. 

**3. xformers support for efficient inference**

Once LoRA parameters are loaded into a pipeline, xformers should work seamlessly. There was apparently a problem with that and it's fixed in https://github.com/huggingface/diffusers/pull/3556. 

**4. PT 2.0 SDPA optimization**

See: https://github.com/huggingface/diffusers/pull/3594

**5. `torch.compile()` compatibility with LoRA**

Once 4. is settled, we should be able to take advantage of `torch.compile()`. 

**6. Introduction of `scale` for control the contributions from the text encoder LoRA**

See https://github.com/huggingface/diffusers/issues/3480. We already support passing `scale` as a part of `cross_attention_kwargs` for the UNet LoRA. 

**7. Supporting multiple LoRAs**

@takuma104 proposed a hook-based design here: https://github.com/huggingface/diffusers/issues/3064#issuecomment-1544159510

I hope this helps to provide a consolidated view of where we're at regarding supporting LoRA from Diffusers. 

Cc: @pcuenca @patrickvonplaten  

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[LoRA] Discussions on ensuring robust LoRA support in Diffusers #3620

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions