Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 13 additions & 13 deletions colossalai/shardformer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,18 +116,18 @@ We will follow this roadmap to develop Shardformer:

| model | tensor parallel | pipeline parallel | lazy initialization | xformer | flash attn2 | jit fused operator | fused layernorm | sequence parallel | overlap |
| :------: | :-----: | :-----: | :--------: | :---------: | :------: | :-----: | :-----: | :--------: | :---------: |
| bert | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
| t5 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| llama V1/V2 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| gpt2 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
| opt | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| bloom | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
| chatglm2 | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] | [x] |
| vit | [x] | [x] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| whisper | [x] | [x] | [x] | [x] | [x] | [ ] | [x] | [ ] | [ ] |
| sam | [x] | [ ] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| blip2 | [x] | [ ] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| falcon | [x] | [x] | [x] | [x] | [x] | [ ] | [x] | [ ] | [ ] |
| bert | [] | [] | [] | [] | [] | [] | [] | [] | [] |
| t5 | [] | [] | [] | [] | [] | [] | [] | [ ] | [ ] |
| llama V1/V2 | [] | [] | [] | [] | [] | [] | [] | [ ] | [ ] |
| gpt2 | [] | [] | [] | [] | [] | [] | [] | [] | [] |
| opt | [] | [] | [] | [] | [] | [] | [] | [ ] | [ ] |
| bloom | [] | [] | [] | [] | [] | [] | [] | [] | [] |
| chatglm2 | [] | [] | [] | [] | [] | [] | [] | [] | [] |
| vit | [] | [] | [ ] | [] | [] | [] | [] | [ ] | [ ] |
| whisper | [] | [] | [] | [] | [] | [ ] | [] | [ ] | [ ] |
| sam | [] | [ ] | [ ] | [] | [] | [] | [] | [ ] | [ ] |
| blip2 | [] | [ ] | [ ] | [] | [] | [] | [] | [ ] | [ ] |
| falcon | [] | [] | [] | [] | [] | [ ] | [] | [ ] | [ ] |
| roberta | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
| albert | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
| ernie | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
Expand All @@ -137,7 +137,7 @@ We will follow this roadmap to develop Shardformer:
| swin | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
| swin V2 | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
| qwen | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] | [ ] |
| mistral | [x] | [ ] | [ ] | [x] | [x] | [x] | [x] | [ ] | [ ] |
| mistral | [] | [ ] | [ ] | [] | [] | [] | [] | [ ] | [ ] |


## 💡 API Design
Expand Down
4 changes: 1 addition & 3 deletions examples/language/llama2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@
</p>

- 70 billion parameter LLaMA2 model training accelerated by 195%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/examples/language/llama2)
[[blog]](https://www.hpc-ai.tech/blog/70b-llama2-training)

### LLaMA1
Expand All @@ -15,7 +14,6 @@
</p>

- 65-billion-parameter large model pretraining accelerated by 38%
[[code]](https://github.com/hpcaitech/ColossalAI/tree/example/llama/examples/language/llama)
[[blog]](https://www.hpc-ai.tech/blog/large-model-pretraining)

## Dataset
Expand Down Expand Up @@ -123,7 +121,7 @@ Here we will show an example of how to run training
llama pretraining with `gemini, batch_size=16, sequence_length=4096, gradient_checkpoint=True, flash_attn=True`.

#### a. Running environment
This experiment was performed on 4 computing nodes with 32 A800 GPUs in total for LLaMA-1 65B. The nodes are
This experiment was performed on 4 computing nodes with 32 A800/H800 80GB GPUs in total for LLaMA-1 65B or LLaMA-2 70B. The nodes are
connected with RDMA and GPUs within one node are fully connected with NVLink.

#### b. Running command
Expand Down