hpcaitech · binmakeswell · Nov 27, 2023 · Nov 27, 2023
@@ -5,20 +5,31 @@
 </div>
 
 ## Table of Contents
+- [Table of Contents](#table-of-contents)
 - [News](#news)
 - [Colossal-LLaMA-2-7B](#colossal-llama-2-7b)
-    - [Performance Evaluation](#performance-evaluation)
-    - [Examples](#examples)
-    - [Training Logs](#training-logs)
-    - [Import from Transformers](#import-from-transformers)
+  - [Performance Evaluation](#performance-evaluation)
+  - [Examples](#examples)
+  - [Training Logs](#training-logs)
+  - [Import from Transformers (Inference)](#import-from-transformers-inference)
 - [Usage](#usage)
-    - [Install](#install)
-    - [How to run](#how-to-run)
-- [Technical Insight](#technical-insights)
-    - [Data](#data)
-    - [Tokenizer](#tokenizer)
-    - [Training Strategy](#training-strategy)
-    - [Bridging Any Domain-specific Large Models](#bridging-any-domain-specific-large-models)
+  - [Install](#install)
+    - [0. Pre-requisite](#0-pre-requisite)
+    - [1. Install required packages](#1-install-required-packages)
+    - [2. Install `xentropy`, `layer_norm` and `rotary`](#2-install-xentropy-layer_norm-and-rotary)
+  - [How to run](#how-to-run)
+    - [1. Init Tokenizer Preparation](#1-init-tokenizer-preparation)
+    - [2. Init Model Preparation](#2-init-model-preparation)
+    - [3. Data Preparation](#3-data-preparation)
+    - [4. Command Line Arguments for Training](#4-command-line-arguments-for-training)
+    - [5. Running Command](#5-running-command)
+- [Technical Insights](#technical-insights)
+  - [Data](#data)
+  - [Tokenizer](#tokenizer)
+  - [Training Strategy](#training-strategy)
+    - [Multi-stage Training](#multi-stage-training)
+    - [Bucket-based Training](#bucket-based-training)
+  - [Bridging Any Domain-specific Large Models](#bridging-any-domain-specific-large-models)
 - [Citations](#citations)
 
 ## News
@@ -260,7 +271,7 @@ Here is details about CLI arguments:
 * Booster plugin: `--plugin`. `gemini`, `gemini_auto`, `zero2`，`zero2_cpu` and `3d` are supported.For more details, please refer to [Booster plugins](https://colossalai.org/docs/basics/booster_plugins/).
 * Intermediate checkpoint to load: `--load_checkpoint`. Path to the intermediate checkpoint. Saved checkpoint contains the states for `lr_scheduler`, `optimizer`,`running_states.json` and `modelling`. If `load_checkpoint` points to the `modelling` folder, only the model weights will be loaded without any other states to support multi-stage training.
 * Save interval: `--save_interval`. The interval (steps) of saving checkpoints. The default value is 1000.
-* Checkpoint directory: `--save_dir`. The directoty path to save checkpoint and intermediate states. Intermediate states include `lr_scheduler`, `optimizer`,`running_states.json` and `modelling`.
+* Checkpoint directory: `--save_dir`. The directory path to save checkpoint and intermediate states. Intermediate states include `lr_scheduler`, `optimizer`,`running_states.json` and `modelling`.
 * Tensorboard directory: `--tensorboard_dir`. The path to save tensorboard logs.
 * Configuration file: `--config_file`. The path to save the configuration file.
 * Number of epochs: `--num_epochs`. Number of training epochs. The default value is 1.
@@ -404,5 +415,4 @@ Applying the above process to perform knowledge transfer in any field allows for
     author={Dao, Tri},
     year={2023}
 }
-}
 ```
@@ -23,7 +23,7 @@
 
 Booster 插件是管理并行配置的重要组件（eg：gemini 插件封装了 gemini 加速方案）。目前支持的插件如下：
 
-**_HybridParallelPlugin:_** HybirdParallelPlugin 插件封装了混合并行的加速解决方案。它提供的接口可以在张量并行，流水线并行以及两种数据并行方法（DDP, Zero）间进行任意的组合。
+**_HybridParallelPlugin:_** HybridParallelPlugin 插件封装了混合并行的加速解决方案。它提供的接口可以在张量并行，流水线并行以及两种数据并行方法（DDP, Zero）间进行任意的组合。
 
 **_GeminiPlugin:_** GeminiPlugin 插件封装了 gemini 加速解决方案，即基于块内存管理的 ZeRO 优化方案。
 

@@ -103,7 +103,7 @@ Here is details about CLI arguments:
 - Max length: `-l`, `--max_length`. The default value is 4096.
 - Mixed precision: `-x`, `--mixed_precision`. The default value is "fp16". "fp16" and "bf16" are supported.
 - Save interval: `-i`, `--save_interval`. The interval (steps) of saving checkpoints. The default value is 1000.
-- Checkpoint directory: `-o`, `--save_dir`. The directoty path to save checkpoints. The default value is `checkpoint`.
+- Checkpoint directory: `-o`, `--save_dir`. The directory path to save checkpoints. The default value is `checkpoint`.
 - Checkpoint to load: `-f`, `--load`. The checkpoint path to load. The default value is `None`.
 - Gradient clipping: `--gradient_clipping`. The default value is 1.0.
 - Tensorboard log directory: `-t`, `--tensorboard_dir`. The directory path to save tensorboard logs. The default value is `tb_logs`.
@@ -217,7 +217,7 @@ Here is details about CLI arguments:
 - Max length: `-l`, `--max_length`. The default value is 4096.
 - Mixed precision: `-x`, `--mixed_precision`. The default value is "fp16". "fp16" and "bf16" are supported.
 - Save interval: `-i`, `--save_interval`. The interval (steps) of saving checkpoints. The default value is 1000.
-- Checkpoint directory: `-o`, `--save_dir`. The directoty path to save checkpoints. The default value is `checkpoint`.
+- Checkpoint directory: `-o`, `--save_dir`. The directory path to save checkpoints. The default value is `checkpoint`.
 - Checkpoint to load: `-f`, `--load`. The checkpoint path to load. The default value is `None`.
 - Gradient clipping: `--gradient_clipping`. The default value is 1.0.
 - Tensorboard log directory: `-t`, `--tensorboard_dir`. The directory path to save tensorboard logs. The default value is `tb_logs`.