Skip to content

Conversation

@stevhliu
Copy link
Contributor

This PR describes how bnb supports FSDP-QLoRA - mainly through the selectable quantization storage parameter - and provides code examples for setting up training with Transformers/PEFT/TRL. The docs are fairly lightweight since it is covered in more depth and detail in Answer.AI's technical blog post.

@github-actions
Copy link

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@Titus-von-Koeller
Copy link
Collaborator

Ok, just finished proof-reading. Looks super good, no corrections needed!

For a moment, I was thinking it might be good to add a mention of FSDP's FlatParameters and how they need a uniform dtype in the tensors being wrapped for sharding. The more uniform your weights are, the larger the groups of parameters in the module tree that you can wrap in a single FlatParameter, which is key to optimize the sharding process.

At second thought, this doesn't really add anything for the average user and if they were interested, the references provided explain in ample detail.

Thanks a lot for this very good work 🤗 ! Really thorough and at the right level of detail for the circumstances.

@Titus-von-Koeller Titus-von-Koeller merged commit 0c64a0d into bitsandbytes-foundation:main Apr 5, 2024
@stevhliu stevhliu deleted the fsdp-qlora branch April 8, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants