-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[doc] add lazy init docs #4808
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
[doc] add lazy init docs #4808
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # Lazy initialization | ||
|
|
||
| Author: [Hongxiu Liu](https://github.com/ver217) | ||
|
|
||
| **Prerequisite:** | ||
| - [Train with booster](../basics/booster_api.md) | ||
|
|
||
| ## Introduction | ||
|
|
||
| Lazy initialization defers model initialization. It saves memory when initializing large models. | ||
|
|
||
| If your model has `N` billion parameters and your memory (or GPU memory) is `M` GB, we recommend you use lazy initialization when `4N >= M`. Otherwise, it is optional. | ||
|
|
||
| ## Usage | ||
|
|
||
| Lazy initialization must be used with booster. | ||
|
|
||
| ### API reference | ||
|
|
||
| {{ autodoc:colossalai.lazy.LazyInitContext }} | ||
|
|
||
| ### Example | ||
|
|
||
| ```python | ||
| import colossalai | ||
| from colossalai.lazy import LazyInitContext | ||
| from colossalai.booster import Booster | ||
| from colossalai.booster.plugin import GeminiPlugin | ||
|
|
||
| from transformers import LlamaForCausalLM, LlamaConfig, BertForPreTraining | ||
|
|
||
| colossalai.launch({}) | ||
| plugin = GeminiPlugin() | ||
| booster = Booster(plugin) | ||
|
|
||
| # 1. Initialize model from scratch | ||
| # Initialization on cuda will accelerate the initialization process but take more GPU memory. | ||
| with LazyInitContext(default_device="cuda"): | ||
| model = LlamaForCausalLM(LlamaConfig(hidden_size=64, intermediate_size=172, num_hidden_layers=4, num_attention_heads=4)) | ||
| model, *_ = booster.boost(model) | ||
|
|
||
| # 2. Initialize model from pretrained | ||
| with LazyInitContext(): | ||
| model = BertForPreTraining.from_pretrained("prajjwal1/bert-tiny") | ||
| model, *_ = booster.boost(model) | ||
| ``` | ||
|
|
||
| > ⚠️ Lazy initialization from pretrained is supported for colossalai>0.3.3 or main branch. | ||
|
|
||
| ## Limitations | ||
|
|
||
| As we claimed, lazy initialization must be used with booster. And only several plugins support it. | ||
|
|
||
| | Plugin | Supported | Remarks | | ||
| |-----------------|-----------|--------------| | ||
| | Gemini | Yes | | | ||
| | Hybrid Parallel | Yes | | | ||
| | Low Level Zero | No | No need | | ||
| | Torch DDP | No | Incompatible | | ||
| | Torch FSDP | No | Incompatible | | ||
|
|
||
| Not all models can be lazily initialized. In some cases, a part of parameters/buffers may be early initialized. But don't worry, this part usually takes a small proportion of the whole model. | ||
|
|
||
| And some models are not supported at all which will raise an error. We tested models in torchvision, diffusers, timm, transformers, torchaudio and torchrec. Below models are not supported: | ||
|
|
||
| | Model | Category | | ||
| |-------------------------------|--------------| | ||
| | wav2vec2_base | torchaudio | | ||
| | hubert_base | torchaudio | | ||
| | ViTModel | transformers | | ||
| | ViTForMaskedImageModeling | transformers | | ||
| | ViTForImageClassification | transformers | | ||
| | Blip2Model | transformers | | ||
| | Blip2ForConditionalGeneration | transformers | | ||
|
|
||
| <!-- doc-test-command: torchrun --standalone --nproc_per_node=2 lazy_iniy.py --> | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,76 @@ | ||
| # 懒惰初始化 | ||
|
|
||
| 作者: [Hongxiu Liu](https://github.com/ver217) | ||
|
|
||
| **前置教程:** | ||
| - [Train with booster](../basics/booster_api.md) | ||
|
|
||
| ## 简介 | ||
|
|
||
| 懒惰初始化延迟了模型的初始化。它能够节省在大模型初始化时的内存占用。 | ||
|
|
||
| 如果你的模型有 `N` 十亿个参数并且你的内存(或显存)为 `M` GB, 我们推荐您在 `4N >= M` 时使用懒惰初始化。否则,懒惰初始化不是必须的。 | ||
|
|
||
| ## 使用 | ||
|
|
||
| 懒惰初始化必须与 booster 一起使用。 | ||
|
|
||
| ### API 参考 | ||
|
|
||
| {{ autodoc:colossalai.lazy.LazyInitContext }} | ||
|
|
||
| ### 例子 | ||
|
|
||
| ```python | ||
| import colossalai | ||
| from colossalai.lazy import LazyInitContext | ||
| from colossalai.booster import Booster | ||
| from colossalai.booster.plugin import GeminiPlugin | ||
|
|
||
| from transformers import LlamaForCausalLM, LlamaConfig, BertForPreTraining | ||
|
|
||
| colossalai.launch({}) | ||
| plugin = GeminiPlugin() | ||
| booster = Booster(plugin) | ||
|
|
||
| # 1. Initialize model from scratch | ||
| # Initialization on cuda will accelerate the initialization process but take more GPU memory. | ||
| with LazyInitContext(default_device="cuda"): | ||
| model = LlamaForCausalLM(LlamaConfig(hidden_size=64, intermediate_size=172, num_hidden_layers=4, num_attention_heads=4)) | ||
| model, *_ = booster.boost(model) | ||
|
|
||
| # 2. Initialize model from pretrained | ||
| with LazyInitContext(): | ||
| model = BertForPreTraining.from_pretrained("prajjwal1/bert-tiny") | ||
| model, *_ = booster.boost(model) | ||
| ``` | ||
|
|
||
| > ⚠️ 使用懒惰初始化加载预训练模型在 colossalai>0.3.3 或主分支上支持。 | ||
|
|
||
| ## 限制 | ||
|
|
||
| 我们提到,懒惰初始化必须与 booster 一起使用。只有几个插件支持它。 | ||
|
|
||
| | 插件 | 支持情况 | 备注 | | ||
| |-----------------|---------|--------| | ||
| | Gemini | 是 | | | ||
| | Hybrid Parallel | 是 | | | ||
| | Low Level Zero | 否 | 不需要 | | ||
| | Torch DDP | 否 | 不兼容 | | ||
| | Torch FSDP | 否 | 不兼容 | | ||
|
|
||
| 不是所有的模型都可以懒惰初始化。在某些情况下,一部分参数/缓冲区可能会被提前初始化。但是不用担心,这部分通常只占整个模型的一小部分。 | ||
|
|
||
| 并且一些模型完全不支持,会引发错误。我们测试了 torchvision, diffusers, timm, transformers, torchaudio 和 torchrec 中的模型。以下模型不受支持: | ||
|
|
||
| | 模型 | 分类 | | ||
| |-------------------------------|--------------| | ||
| | wav2vec2_base | torchaudio | | ||
| | hubert_base | torchaudio | | ||
| | ViTModel | transformers | | ||
| | ViTForMaskedImageModeling | transformers | | ||
| | ViTForImageClassification | transformers | | ||
| | Blip2Model | transformers | | ||
| | Blip2ForConditionalGeneration | transformers | | ||
|
|
||
| <!-- doc-test-command: torchrun --standalone --nproc_per_node=2 lazy_iniy.py --> |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.