-[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline. [[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat) [[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b) [[demo]](https://chat.colossalai.org)
+[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline.
+[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
+[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
+[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0)
+[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
+
+
+
+
+
+- Up to 10 times faster for RLHF PPO Stage3 Training
@@ -352,6 +362,22 @@ If you want to install and enable CUDA kernel fusion (compulsory installation wh
CUDA_EXT=1 pip install .
```
+For Users with CUDA 10.2, you can still build ColossalAI from source. However, you need to manually download the cub library and copy it to the corresponding directory.
+
+```bash
+# clone the repository
+git clone https://github.com/hpcaitech/ColossalAI.git
+cd ColossalAI
+
+# download the cub library
+wget https://github.com/NVIDIA/cub/archive/refs/tags/1.8.0.zip
+unzip 1.8.0.zip
+cp -r cub-1.8.0/cub/ colossalai/kernel/cuda_native/csrc/kernels/include/
+
+# install
+CUDA_EXT=1 pip install .
+```
+
## Use Docker
diff --git a/applications/Chat/README.md b/applications/Chat/README.md
index 9ba831973b6c..29cd581d7cc9 100644
--- a/applications/Chat/README.md
+++ b/applications/Chat/README.md
@@ -67,13 +67,24 @@ More details can be found in the latest news.
* [2023/02] [Open Source Solution Replicates ChatGPT Training Process! Ready to go with only 1.6GB GPU Memory](https://www.hpc-ai.tech/blog/colossal-ai-chatgpt)
## Online demo
-You can experience the performance of Coati7B on this page.
+
-[chat.colossalai.org](https://chat.colossalai.org/)
+[ColossalChat](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat): An open-source solution for cloning [ChatGPT](https://openai.com/blog/chatgpt/) with a complete RLHF pipeline.
+[[code]](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat)
+[[blog]](https://medium.com/@yangyou_berkeley/colossalchat-an-open-source-solution-for-cloning-chatgpt-with-a-complete-rlhf-pipeline-5edf08fb538b)
+[[demo]](https://www.youtube.com/watch?v=HcTiHzApHm0)
+[[tutorial]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
+
+
+
+
-Due to resource constraints, we will only provide this service from 29th Mar 2023 to 5 April 2023. However, we have provided the inference code in the [inference](./inference/) folder. The WebUI will be open-sourced soon as well.
+> DeepSpeedChat performance comes from its blog on 2023 April 12, ColossalChat performance can be reproduced on an AWS p4d.24xlarge node with 8 A100-40G GPUs with the following command: torchrun --standalone --nproc_per_node 8 benchmark_opt_lora_dummy.py --max_timesteps 1 --update_timesteps 1 --use_kernels --strategy colossalai_zero2 --experience_batch_size 64 --train_batch_size 32
-> Warning: Due to model and dataset size limitations, Coati is just a baby model, Coati7B may output incorrect information and lack the ability for multi-turn dialogue. There is still significant room for improvement.
## Install
### Install the environment
@@ -112,12 +123,14 @@ Here is how we collected the data
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
+[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
### RLHF Training Stage2 - Training reward model
Stage2 trains a reward model, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model
You can run the `examples/train_rm.sh` to start a reward model training.
+[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo)
### RLHF Training Stage3 - Training model with reinforcement learning by human feedback
@@ -128,6 +141,7 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
You can run the `examples/train_prompts.sh` to start training PPO with human feedback.
+[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)
For more details, see [`examples/`](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples).
diff --git a/applications/Chat/coati/dataset/prompt_dataset.py b/applications/Chat/coati/dataset/prompt_dataset.py
index f8ab2346c4b7..5858052c836a 100644
--- a/applications/Chat/coati/dataset/prompt_dataset.py
+++ b/applications/Chat/coati/dataset/prompt_dataset.py
@@ -45,7 +45,7 @@ def __init__(self,
self.keyed_prompt[k].extend(tensor.to(torch.cuda.current_device()).unbind())
def __len__(self):
- return len(self.keyed_prompt)
+ return len(self.keyed_prompt["input_ids"])
def __getitem__(self, i) -> Dict[str, torch.Tensor]:
return {k: v[i] for k, v in self.keyed_prompt.items()}
diff --git a/applications/Chat/coati/dataset/reward_dataset.py b/applications/Chat/coati/dataset/reward_dataset.py
index faa1c94d2728..5dacf7e81464 100644
--- a/applications/Chat/coati/dataset/reward_dataset.py
+++ b/applications/Chat/coati/dataset/reward_dataset.py
@@ -6,7 +6,7 @@
from .utils import is_rank_0
-# Dahaos/rm-static
+# Dahoas/rm-static
class RmStaticDataset(Dataset):
"""
Dataset for reward model
diff --git a/applications/Chat/coati/ray/src/detached_replay_buffer.py b/applications/Chat/coati/ray/src/detached_replay_buffer.py
index 855eee48c5a5..18c8db388e88 100644
--- a/applications/Chat/coati/ray/src/detached_replay_buffer.py
+++ b/applications/Chat/coati/ray/src/detached_replay_buffer.py
@@ -34,7 +34,7 @@ def __init__(self, sample_batch_size: int, tp_world_size: int = 1, limit : int =
'''
Workers in the same tp group share this buffer and need same sample for one step.
Therefore a held_sample should be returned tp_world_size times before it could be dropped.
- worker_state records wheter a worker got the held_sample
+ worker_state records whether a worker got the held_sample
'''
self.tp_world_size = tp_world_size
self.worker_state = [False] * self.tp_world_size
diff --git a/applications/Chat/coati/ray/src/experience_maker_holder.py b/applications/Chat/coati/ray/src/experience_maker_holder.py
index 94e4a3d537a5..0ae4e3125b70 100644
--- a/applications/Chat/coati/ray/src/experience_maker_holder.py
+++ b/applications/Chat/coati/ray/src/experience_maker_holder.py
@@ -22,7 +22,7 @@
class ExperienceMakerHolder:
'''
Args:
- detached_trainer_name_list: str list to get ray actor handleskkk
+ detached_trainer_name_list: str list to get ray actor handles
strategy:
experience_batch_size: batch size of generated experience
kl_coef: the coefficient of kl divergence loss
diff --git a/applications/Chat/coati/ray/src/pipeline_strategy.py b/applications/Chat/coati/ray/src/pipeline_strategy.py
index 1780839c62ee..7ecb5d7d86d6 100644
--- a/applications/Chat/coati/ray/src/pipeline_strategy.py
+++ b/applications/Chat/coati/ray/src/pipeline_strategy.py
@@ -26,7 +26,7 @@
class PipelineModel(torch.nn.Module):
'''
Actor has 2 kinds of jobs: forward and generate.
- better to just pipelinize the inner model
+ better to just pipeline the inner model
'''
def __init__(self,
model: torch.nn.Module,
diff --git a/applications/Chat/evaluate/README.md b/applications/Chat/evaluate/README.md
index 7ace4bfe6d18..1e86eadf1c33 100644
--- a/applications/Chat/evaluate/README.md
+++ b/applications/Chat/evaluate/README.md
@@ -1,182 +1,252 @@
-# Evaluation
-
-In this directory, we introduce how you can evaluate your model with GPT-4.
-
-## Evaluation Pipeline
-
-The whole evaluation process undergoes the following three steps:
-1. Prepare the questions following the internal data structure in the data format section (described below).
-2. Generate answers from different models:
- * Generate answers using GPT-3.5: [`generate_gpt35_answers.py`](generate_gpt35_answers.py).
- * Generate answers using your own models: [`generate_answers.py`](generate_answers.py).
-3. Evaluate models using GPT-4: [`evaluate.py`](evaluate.py).
-
-### Generate Answers
-#### Generate Answers Using GPT-3.5
-You can provide your own OpenAI key to generate answers from GPT-3.5 using [`generate_gpt35_answers.py`](./generate_gpt35_answers.py).
-
-An example script is provided as follows:
-```shell
-python generate_gpt35_answers.py \
- --dataset "path to the question dataset" \
- --answer_path "path to answer folder" \
- --num_workers 4 \
- --openai_key "your openai key" \
- --max_tokens 512 \
-```
-
-#### Generate Answers Using our Own Model
-You can also generate answers using your own models. The generation process is divided into two stages:
-1. Generate answers using multiple GPUs (optional) with batch processing: [`generate_answers.py`](./generate_answers.py).
-2. Merge multiple shards and output a single file: [`merge.py`](./merge.py).
-
-An example script is given as follows:
-
-```shell
-device_number=number of your devices
-model_name="name of your model"
-model_path="path to your model"
-dataset="path to the question dataset"
-answer_path="path to save the model answers"
-
-torchrun --standalone --nproc_per_node=$device_number generate_answers.py \
- --model 'llama' \
- --strategy ddp \
- --model_path $model_path \
- --model_name $model_name \
- --dataset $dataset \
- --batch_size 8 \
- --max_datasets_size 80 \
- --answer_path $answer_path \
- --max_length 512
-
-python merge.py \
- --model_name $model_name \
- --shards $device_number \
- --answer_path $answer_path \
-
-for (( i=0; i我想让你担任Android开发工程师面试官。我将成为候选人,您将向我询问Android开发工程师职位的面试问题。我希望你只作为面试官回答。不要一次写出所有的问题。我希望你只对我进行采访。问我问题,等待我的回答。不要写解释。像面试官一样一个一个问我,等我回答。我的第一句话是“面试官你好”。
**Example 2:** 我想让你扮演讲故事的角色。你会想出引人入胜、富有想象力和吸引观众的有趣故事。它可以是童话故事、教育故事或任何其他类型的有潜力的故事以吸引人们的注意力和想象力。根据目标受众,您可以为您的讲故事环节选择特定的主题或主题,例如,如果是儿童,那么您可以谈论动物;如果是成人,那么基于历史的故事可能会更好地吸引他们等。我的第一个请求是我需要一个关于毅力的有趣故事。 | **Example 1:** I want you to act as a journalist. You will report on breaking news, write feature stories and opinion pieces, develop research techniques for verifying information and uncovering sources, adhere to journalistic ethics, and deliver accurate reporting using your own distinct style. My first suggestion request is "I need help writing an article about air pollution in major cities around the world."
**Example 2:** I want you to act as a debater. I will provide you with some topics related to current events and your task is to research both sides of the debates, present valid arguments for each side, refute opposing points of view, and draw persuasive conclusions based on evidence. Your goal is to help people come away from discussion with increased knowledge and insight into the topic at hand. My first request is "I want an opinion piece about Deno." |
+| Chat | **Example 1:** 基于以下角色信息完成一段对话。小王,20岁,是一名大学生;小张,22岁,是一名大学生。 小王:小张,我们一起去看电影吧? 小张:好啊,你想看什么电影呢? 小王:我听说最近有一部很好看的科幻电影。你觉得怎么样? 小张:
**Example 2:** 基于以下角色信息完成一段对话。小明是一名医生,一位老年病患者想要停药,但他对病情有所忽视并有担忧;王叔叔是老年病患者的儿子,希望能够听取医生的建议。 小明:你好,王叔叔,我了解你想要让你父亲停药。 王叔叔:是的,我父亲已经吃了那么久的药,我担心药物对他的身体会有副作用。 小明: | **Example 1:** Human: Who is the highest try scorer in the current Sydney Roosters team? AI: As an AI language model, I don't have access to current statistics. However, you can check the current Sydney Roosters team's official website or relevant sports news websites to find the answer. Human: Thanks for the suggestion, but can you recommend a reliable sports news website that I can check? AI:
**Example 2:** Complete a dialogue based on the following role information. A: Elementary student B: Teacher B: Good morning, Student A. Today we're going to learn about addition and subtraction. A: Teacher, I already know this very well. Why do I need to learn it again? B: |
+| Open QA | **Example 1:** 请问万有引力定律由谁提出的?
**Example 2:** 哪些国家参与了第一次世界大战? | **Example 1:** Who are the indigenous people of New Zealand?
**Example 2:** How do you take the derivative of the sin function? |
+| Closed QA | **Example 1:** 请从以下选项中选择正确答案。以下哪个是世界上最高山峰? A. 长城 B. 泰山 C. 珠穆朗玛峰 D. 黄山
**Example 2:** 请从以下选项中选择一个最佳答案回答下面的问题。问题:非洲最高的山是哪座山? 选项: A. 麦金利山 B. 喜马拉雅山 C. 乞力马扎罗山 | **Example 1:** Answer the following question: What shape is the Earth? A) A circle B) A sphere C) An ellipse D) A plane
**Example 2:** Choose the correct classification for the following question: "What type of restaurant is 'Burger King'"? fast food family style formal dining buffet |
+| Brainstorming | **Example 1:** 请介绍一下人工智能的多个领域。
**Example 2:** 请给出管理家庭财务的3个小技巧。 | **Example 1:** What are 10 science fiction books I should read next?
**Example 2:** List five ideas for how to regain enthusiasm for my career. |
+| Generation | **Example 1:** 请撰写一篇文章,介绍如何通过改善生活习惯来预防疾病和延长寿命。
**Example 2:** 请根据以下情节撰写一篇短篇小说:一名年轻人被困在一个荒岛上,他必须想办法生存下去直到被救援。但他很快发现自己并不孤单。 | **Example 1:** Can you help me write a formal email to a potential business partner proposing a joint venture?
**Example 2:** Please use the appropriate format to write a formal letter of recommendation for a student applying to a prestigious computer science graduate program at a university. |
+| Rewriting | **Example 1:** 将以下句子改为被动语态: "他们正在洗车"
**Example 2:** 将以下文本翻译成英语: “这个周末我要去海边玩” | **Example 1:** Translate the following text into English: "我最喜欢的季节是春天,因为我可以看到美丽的花朵。"
**Example 2:** Please correct the following sentences and give them the correct sentence. "Their going to the party there." |
+| Classification | **Example 1:** 新闻标题:今日立夏,有一上联,立夏万物并秀,下联怎么对? 请根据以上新闻标题判断新闻所属的分类,你需要从文化,娱乐,体育,财经,房产,教育,科技,旅游,游戏,军事这十类中选择一个答案。
**Example 2:** 新闻标题:赵丽颖很久没有登上微博热搜了,但你们别急,她只是在憋大招而已。 请根据新闻标题判断新闻所属的分类,你需要从文化,娱乐,体育,财经,房产,教育,科技,旅游,游戏,军事这十类中选择一个答案。 | **Example 1:** Classify the given email as spam or non-spam. "Hello, this is an email reminding you to pay your property fees"
**Example 2:** Classify the following text as news, ads or forum posts "The latest iPhone 13 is now available, shop now!" |
+| Extraction | **Example 1:** 根据以下新闻文本,提取新闻报道时间,例如回答时按照格式“新闻报道时间:2007年8月10日” 新闻文本如下:2007-4-7中新网4月7日电据中国消防在线消息,4月4日晚上7时30分左右,湖南长潭高速公路上发生一起6车连环相撞失火事故。长株潭三地消防部门共出动消防车21台,警力100余人。经过消防官兵近2个小时奋力扑救,大火被成功扑灭。据初步调查,有1人在此次事故中死亡。
**Example 2:** 根据以下新闻文本,提取新闻报道时间,例如回答时按照格式“新闻报道时间:2007年8月10日” 新闻文本如下:2014年1月15日,据外媒《俄罗斯报》报道称,位于北半球的澳大利亚现在正处于炎热的夏季,而近日也到了高温酷暑的时候,当地时间1月14日晚,澳大利亚南部一夜间发生至少250起火灾。受炎热天气及雷雨天气影响,澳大利亚南部一夜间发生至少250起火灾,灾情多集中在维多利亚州。火灾发生后,救援人员立即展开救灾行动。目前,大部分起火点火势已被控制。 | **Example 1:** Extract all phenotypes of the following text: "The 55-year-old patient has fever and hypertension."
**Example 2:** Extract the location mentioned in the following text: "The student graduated from Harvard university, which is located in Boston" |
+| Summarization | **Example 1:** 请简要总结概括以下段落材料。 新华社快讯:斯里兰卡政府部门21日说,首都科伦坡包括教堂、酒店等多个地点当天发生的爆炸已导致至少70人死亡,另有260多人受伤。
**Example 2:** 请简要总结概括以下段落材料。 近期,参与京雄高铁站站房建设的中铁十二局,因在施工过程中存在环境违法行为被雄安新区公开通报。通报发出后,引起社会广泛关注。近日,人民网记者从雄安新区相关部门及中铁十二局获悉,新区有关部门已经集中约谈了中铁十二局等24个参与雄安建设的项目单位。对于约谈内容和结果,中铁十二局有关宣传负责人回应:“具体内容不清楚,最好找雄安新区相关部门了解情况。”新区有关部门负责人表示,此前涉及的环境违法行为,中铁十二局已基本整改到位,但约谈内容和结果暂不公开,接下来,将按部就班推进环境治理工作。(原题为《雄安新区:中铁十二局涉环境违法已基本整改到位》) | **Example 1:** Please provide a summary based on the following news: "China plans to launch its first space station core module in 2022, an important development in the country's space program. The space station, called Tianhe, will include three modules: a core module, an experiment module and an astronomy module. The first launch of the core module will be used to test and verify the basic functions of the station, as well as to conduct related scientific research and technology experiments. "
**Example 2:** What information is provided in the table below? Summarize the core information in it? "Ranking, Player Name, Team, Position, Salary (in millions of dollars) 1, LeBron James, Los Angeles Lakers, SF, 45.0 2, Stephen Curry, Golden State Warriors, PG, 43.5" |
+
+
+### Evaluation Metrics
+#### GPT Evaluation
+Use GPT-3.5 to evaluate the prediction of different models, and pre-define evaluation metrics for different categories. There are 10 pre-defined evaluation metrics and you can refer to the table below:
+
+| Evaluation Metric | Prompt Words | CoT |
+|:-----------------------:|:-------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| Language organization | 语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。 | 1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。 2.检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说 3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。 4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。 5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。 6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。 |
+| Relevance | 切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。 | 1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。 2. 阅读答案,确认答案是否直接回答了题目所问的问题。 3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。 4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。 |
+| Creativity | 创意性(1-5):某些头脑风暴问题可能需要答案具有创意,提出新的思路。 | 1. 仔细阅读所提供的头脑风暴问题,确保你理解问题的要点和背景。 2. 根据你的知识和经验,判断所提供的答案是否可行。如果答案不可行,则创意性评分可能会受到影响。 3. 考虑答案中是否包含新颖的想法或独特的思路。答案可能与已知的解决方案有所重叠,但仍然可以被认为是有创意的,只要它提供了新的角度或方法来解决问题。 4. 根据答案的创意性,给出一个1到5的评分。如果答案缺乏创意,则应给出一个较低的评分。如果答案具有创意并提供了新的思路,应给出一个较高的评分。 |
+| Practicality | 实用性(1-5):某些头脑风暴问题可能需要答案提出实用的建议或解决方法。 | 1. 仔细阅读所提供的头脑风暴问题,确保你理解问题的要点和背景。 2. 根据你的知识和经验,判断所提供的答案是否可行。如果答案不可行,则实用性评分可能会受到影响。 3. 考虑答案中提出的建议或解决方法是否实用并可行。答案可能看起来很好,但如果无法实现或应用,则实用性评分可能会受到影响。 4. 根据答案的实用性,给出一个1到5的评分。如果答案缺乏实用性,则应给出一个较低的评分。如果答案提出了实用的建议或解决方法,并且可以很好地解决问题,则应给出一个较高的评分。 |
+| Correctness | 正确性(1-5):答案应该符合常识、生活实际等等 | 1. 仔细阅读所提供的头脑风暴问题,确保你理解问题的要点和背景。 2. 根据你的知识和经验,判断所提供的答案是否可行。如果答案不可行,则正确性评分可能会受到影响。 3. 考虑答案中所提供的信息是否正确、符合常识、生活实际等等。如果答案中存在明显的错误或不合理之处,则正确性评分可能会受到影响。 4. 根据答案的正确性,给出一个1到5的评分。如果答案存在明显的错误或不合理之处,则应给出一个较低的评分。如果答案正确、符合常识、生活实际等等,则应给出一个较高的评分。 |
+| Naturalness | 自然(1-5):答案是否自然,并且符合问题给定的身份。 | 1. 阅读题目,确定题目提供的身份信息。 2. 检查答案内容是否符合题目给定的身份。 3. 根据以上因素,对该回答的自然性进行打分,分数从1到5,其中1表示不自然,5表示非常自然,并符合问题给定的身份。 |
+| Engagingness | 参与感(1-5):答案是否对前面的对话内容做出了恰当的反应,是否理解对话的语境和背景。 | 1. 阅读题目,确定对话的语境和背景。 2. 检查答案是否充分理解对话的语境和背景,能否自然地融入到对话中而不显得突兀。 3. 根据以上因素,对该回答的参与感进行打分,分数从1到5,其中1表示没有参与感,5表示非常有参与感,并且恰当地理解了对话的语境和背景。 |
+| Reasonableness | 合理性(1-5):答案是否能够与前面的对话内容形成逻辑上的衔接,是否符合常理,能否在这个上下文中合理存在。 | 1. 阅读题目,确定对话的主题以及问题期望的回答方向。 2. 判断答案是否能够与前面的对话内容形成逻辑上的衔接,是否符合常理,能否在这个上下文中合理存在。 3. 根据以上因素,对该回答的合理性进行打分,分数从1到5,其中1表示不合理,5表示非常合理,并且能够与前面的对话内容形成逻辑上的衔接,并符合常理。 |
+| Diversity | 多样性(1-5):答案使用语言是否优美,具有有一定的创造性和想象力。然而,回答也应该保持合理和适度,不要过于夸张或离题。 | 1. 仔细阅读整个回答,确保完全理解回答所表达的内容和主题。 2. 在阅读回答的同时,注意语言的质量,例如措辞是否正确,语言是否生动等。 3. 检查回答的创造性和想象力,看看回答是否能够吸引人阅读下去。 4. 检查回答的合理性和适度,看看回答是否夸张或离题。5. 将多样性的评分打分在1到5之间,5分表示回答的质量很好,能够吸引人阅读,1分表示回答的内容生硬或者有离题的问题。 |
+| Fidelity | 保真度(1-5):答案是否能够严格遵守角色的设定回答给定的请求。 | 1. 仔细阅读问题,了解角色在问题中的设定和表现,包括职业、背景、观点、性格等方面。 阅读题目的请求,确认回答请求时需要注意的细节。 3. 对比提供的回答与该角色的设定,评估回答是否能够严格遵守角色的设定。 4. 结合以上评估结果给出保真度的评分,范围从1到5分,其中1分表示回答与角色设定完全不符,5分表示回答完全符合角色设定且满足给定请求。 |
+| Conciseness | 简明扼要(1-5):答案是否简明扼要,没有冗余内容。 | 1. 阅读题目,提取出材料的重点。 2. 阅读该总结,并注意其中的主要观点和信息。 3. 评估总结的长度。一个简明扼要的总结通常应该在几句话或几段文字内传达关键信息,而不是冗长的段落或文章。 4. 检查总结是否包含与主要观点无关的信息或冗余信息。 5. 确定总结涵盖了材料中的关键信息,并且没有忽略任何重要细节。 6. 给总结打出1-5的分数,其中5表示总结简明扼要,没有冗余内容,而1表示总结冗长或包含不必要的信息,难以理解或记忆。根据您的判断,打出适当的得分。 |
+
+GPT-3.5 evaluates the quality of model predictions based on the given prompt words and gives a score between 1-5.
+
+#### Automatic Evaluation
+Automated metrics evaluate the capability of a model by comparing model predictions with reference answers.
+There are two ways to obtain reference answers:
+* For instruction coming from human-designed problems, the reference answers are generated by GPT-3.5, such as roleplay, chat.
+* For instruction related with classic NLP problems, the reference answers are collected from open-sourced dataset with target answers, such as classification, extraction, summarization.
+
+There are 5 types of automatic evaluation metrics listed in the table below:
+
+ | Automatic Evaluation Metric | Description |
+|:-----------------------------------:|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| BLEU-n | Measure the accuracy between prediction and reference. BLEU-1 (Unigram) evaluates accuracy in word level BLEU-n (n-gram) evaluate the fluency in sentence level. |
+| ROUGE | ROUGE-N measures the number of matching n-grams between prediction and reference. ROUGE-L measures the number of matching longest common subsequence (LCS) between prediction and reference. |
+| Distinct | Measure the diversity of generation text by counting the unique n-grams. |
+| BERTScore | Measure the semantic similarity between tokens of predictions and references with BERT. |
+| Precision Recall F1 Score | Measure the number of overlaps between prediction and reference (design for classification and extraction categories) |
+
+## Evaluation Process
+### Data Format
+#### Target Answers / Predictions
+A JSON file contains one list. Each element in the list is a target answer / prediction record for one instruction / question.
+An element should have the following fields:
+
+* `category` (str, compulsory): The category of the instruction / question.
+* `instruction` (str, compulsory): The instruction / question for the LLM.
+* `input` (str, optional): The additional context of the instruction / question.
+* `output` (str, optional): The sample output of the instruction (default: GPT-3.5).
+* `target` (str, optional): The target answer for the instruction.
+* `id` (int, compulsory): The ID of the instruction / question.
+
+If the `input` has a target answer, the `output` can be empty. Otherwise, we generate answers from GPT-3.5 as the `output`, and the `target` field is empty.
+
+Example:
+```
+[
+ {
+ "category": "brainstorming",
+ "instruction": "请介绍一下人工智能的多个领域。",
+ "input": "",
+ "output": "{GPT-3.5 Answers}",
+ "target": "",
+ "id": 1
+ },
+ {
+ "category": "classification",
+ "instruction": "新闻标题:为什么电影《倩女幽魂》中燕赤霞一个道士却拿着金刚经?请根据新闻标题判断新闻所属的分类,你需要从文化,娱乐,体育,财经,房产,教育,科技,旅游,游戏,军事这十类中选择一个答案。",
+ "input": "",
+ "output": "",
+ "target": "{target answer}",
+ "id": 2
+ }
+]
+```
+
+#### Model Answers / Predictions
+
+A JSON file contains one list. Each element in the list is a model answer / prediction record for one instruction / question.
+
+An element should have the following fields:
+
+* `category` (str, compulsory): The category of the instruction / question.
+* `instruction` (str, compulsory): The instruction / question for the LLM.
+* `input` (str, optional): The additional context of the instruction / question.
+* `output` (str, compulsory): The output from the LLM.
+* `target` (str, optional): The target answer for the instruction.
+* `id` (int, compulsory): The ID of the instruction / question.
+
+Example:
+```
+[
+ {
+ "category": "brainstorming",
+ "instruction": "请介绍一下人工智能的多个领域。",
+ "input": "",
+ "output": "{Model Answers / Predictions}",
+ "target": "",
+ "id": 1
+ },
+ {
+ "category": "classification",
+ "instruction": "新闻标题:为什么电影《倩女幽魂》中燕赤霞一个道士却拿着金刚经?请根据新闻标题判断新闻所属的分类,你需要从文化,娱乐,体育,财经,房产,教育,科技,旅游,游戏,军事这十类中选择一个答案。",
+ "input": "",
+ "output": "{Model Answers / Predictions}",
+ "target": "{target answer}",
+ "id": 2
+ }
+]
+```
+
+### Evaluation
+#### Configuration
+The configuration file `config_cn.json` can control how evaluate the performance of the model.
+The following is an example showing the config structure:
+```
+{
+ "language": "cn",
+ "category": {
+ "brainstorming": {
+ "GPT-3.5": ["relevance", "creativity", "practicality", "correctness"],
+ "Metrics": ["Distinct"]
+ },
+ "chat": {
+ "GPT-3.5": [ "relevance", "naturalness", "engagingness", "reasonableness"],
+ "Metrics": ["Distinct"]
+ }
+ }
+}
+```
+`"language"`: evaluate the model capability in which language, we only support Chinese `"cn"` for now.
+`"category"`: evaluate the model capability in which category/categories.
+`"GPT-3.5"`: config metrics for GPT-3.5 evaluation.
+`"Metrics"`: config metrics for automatic metrics evaluation.
+
+You can create your config file based on available settings listed in following table.
+
+| "category" | "GPT-3.5" | "Metrics" |
+|:----------------:|:-----------------------:|:-----------:|
+| "brainstorming" | "language organization" | "BLEU" |
+| "chat" | "relevance" | "ROUGE" |
+| "classification" | "creativity" | "Distinct" |
+| "closed_qa" | "practicality" | "BERTScore" |
+| "extraction" | "correctness" | "Precision" |
+| "generation" | "naturalness" | "Recall" |
+| "open_qa" | "engagingness" | "F1 score" |
+| "rewriting" | "reasonableness" |
+| "roleplay" | "diversity" |
+| "summarization" | "fidelity" |
+| | "conciseness" |
+
+#### Evaluate
+After setting the configuration file, you can evaluate the model using `eval.py`.
+
+
+An example script is provided as follows:
+```shell
+python eval.py \
+ --config_file "path to the config file" \
+ --battle_prompt_file "path to the prompt file for battle" \
+ --gpt_evaluation_prompt_file "path to the prompt file for gpt evaluation" \
+ --target_file "path to the target answer file" \
+ --answer_file_list "path to the answer files of at most 2 models" \
+ --model_name_list "the names of at most 2 models" \
+ --save_path "path to save results" \
+ --openai_key "your openai key" \
+```
+
+## To Do
+- [ ] Add evaluation for English capability
+- [ ] Support UniEval
+- [ ] Support GPT-4 evaluation
+
+## Citations
+
+```bibtex
+@misc{vicuna2023,
+ title = {Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90\%* ChatGPT Quality},
+ url = {https://vicuna.lmsys.org},
+ author = {Chiang, Wei-Lin and Li, Zhuohan and Lin, Zi and Sheng, Ying and Wu, Zhanghao and Zhang, Hao and Zheng, Lianmin and Zhuang, Siyuan and Zhuang, Yonghao and Gonzalez, Joseph E. and Stoica, Ion and Xing, Eric P.},
+ month = {March},
+ year = {2023}
+}
+
+@misc{ouyang2022training,
+ title={Training language models to follow instructions with human feedback},
+ author={Long Ouyang and Jeff Wu and Xu Jiang and Diogo Almeida and Carroll L. Wainwright and Pamela Mishkin and Chong Zhang and Sandhini Agarwal and Katarina Slama and Alex Ray and John Schulman and Jacob Hilton and Fraser Kelton and Luke Miller and Maddie Simens and Amanda Askell and Peter Welinder and Paul Christiano and Jan Leike and Ryan Lowe},
+ year={2022},
+ eprint={2203.02155},
+ archivePrefix={arXiv},
+ primaryClass={cs.CL}
+}
+
+@misc{liu2023geval,
+ title={G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment},
+ author={Yang Liu and Dan Iter and Yichong Xu and Shuohang Wang and Ruochen Xu and Chenguang Zhu},
+ year={2023},
+ eprint={2303.16634},
+ archivePrefix={arXiv},
+ primaryClass={cs.CL}
+}
+```
diff --git a/applications/Chat/evaluate/config/config_cn.json b/applications/Chat/evaluate/config/config_cn.json
new file mode 100644
index 000000000000..a7293f111a81
--- /dev/null
+++ b/applications/Chat/evaluate/config/config_cn.json
@@ -0,0 +1,123 @@
+{
+ "language": "cn",
+ "category": {
+ "brainstorming": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "creativity",
+ "practicality",
+ "correctness"
+ ],
+ "Metrics": [
+ "Distinct"
+ ]
+ },
+ "chat": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "naturalness",
+ "engagingness",
+ "reasonableness"
+ ],
+ "Metrics": [
+ "Distinct"
+ ]
+ },
+ "classification": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "correctness"
+ ],
+ "Metrics": [
+ "Precision",
+ "Recall",
+ "F1 score"
+ ]
+ },
+ "closed_qa": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "correctness"
+ ],
+ "Metrics": [
+ "BLEU",
+ "ROUGE",
+ "BERTScore"
+ ]
+ },
+ "extraction": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "correctness"
+ ],
+ "Metrics": [
+ "Precision",
+ "Recall",
+ "F1 score"
+ ]
+ },
+ "generation": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "diversity"
+ ],
+ "Metrics": [
+ "BLEU",
+ "ROUGE",
+ "BERTScore"
+ ]
+ },
+ "open_qa": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "correctness"
+ ],
+ "Metrics": [
+ "Distinct"
+ ]
+ },
+ "rewriting": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "correctness"
+ ],
+ "Metrics": [
+ "BLEU",
+ "ROUGE",
+ "BERTScore"
+ ]
+ },
+ "roleplay": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "fidelity",
+ "creativity"
+ ],
+ "Metrics": [
+ "Distinct"
+ ]
+ },
+ "summarization": {
+ "GPT-3.5": [
+ "language organization",
+ "relevance",
+ "correctness",
+ "conciseness"
+ ],
+ "Metrics": [
+ "BLEU",
+ "ROUGE",
+ "BERTScore"
+ ]
+ }
+ }
+}
diff --git a/applications/Chat/evaluate/eval.py b/applications/Chat/evaluate/eval.py
new file mode 100644
index 000000000000..69f2c272a116
--- /dev/null
+++ b/applications/Chat/evaluate/eval.py
@@ -0,0 +1,98 @@
+import argparse
+import json
+import os
+
+import openai
+from evaluator import Evaluator
+from utils import jload
+
+
+def main(args):
+ assert len(args.answer_file_list) == len(
+ args.model_name_list), "The number of answer files and model names should be equal!"
+
+ # load config
+ config = jload(args.config_file)
+
+ if config["language"] == "cn":
+ # get metric settings for all categories
+ metrics_per_category = {}
+ for category in config["category"].keys():
+ metrics_all = {}
+ for metric_type, metrics in config["category"][category].items():
+ metrics_all[metric_type] = metrics
+ metrics_per_category[category] = metrics_all
+
+ battle_prompt = None
+ if args.battle_prompt_file:
+ battle_prompt = jload(args.battle_prompt_file)
+
+ gpt_evaluation_prompt = None
+ if args.gpt_evaluation_prompt_file:
+ gpt_evaluation_prompt = jload(args.gpt_evaluation_prompt_file)
+
+ if len(args.model_name_list) == 2 and not battle_prompt:
+ raise Exception("No prompt file for battle provided. Please specify the prompt file for battle!")
+
+ if len(args.model_name_list) == 1 and not gpt_evaluation_prompt:
+ raise Exception(
+ "No prompt file for gpt evaluation provided. Please specify the prompt file for gpt evaluation!")
+
+ # initialize evaluator
+ evaluator = Evaluator(metrics_per_category, battle_prompt, gpt_evaluation_prompt)
+ if len(args.model_name_list) == 2:
+ answers1 = jload(args.answer_file_list[0])
+ answers2 = jload(args.answer_file_list[1])
+
+ assert len(answers1) == len(answers2), "The number of answers for two models should be equal!"
+
+ evaluator.battle(answers1=answers1, answers2=answers2)
+ evaluator.save(args.save_path, args.model_name_list)
+ elif len(args.model_name_list) == 1:
+ targets = jload(args.target_file)
+ answers = jload(args.answer_file_list[0])
+
+ assert len(targets) == len(answers), "The number of target answers and model answers should be equal!"
+
+ evaluator.evaluate(answers=answers, targets=targets)
+ evaluator.save(args.save_path, args.model_name_list)
+ else:
+ raise ValueError("Unsupported number of answer files and model names!")
+ else:
+ raise ValueError(f'Unsupported language {config["language"]}!')
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser(description='ColossalAI LLM evaluation pipeline.')
+ parser.add_argument('--config_file',
+ type=str,
+ default=None,
+ required=True,
+ help='path to the file of target results')
+ parser.add_argument('--battle_prompt_file', type=str, default=None, help='path to the prompt file for battle')
+ parser.add_argument('--gpt_evaluation_prompt_file',
+ type=str,
+ default=None,
+ help='path to the prompt file for gpt evaluation')
+ parser.add_argument('--target_file', type=str, default=None, help='path to the target answer (ground truth) file')
+ parser.add_argument('--answer_file_list',
+ type=str,
+ nargs='+',
+ default=[],
+ required=True,
+ help='path to the answer files of at most 2 models')
+ parser.add_argument('--model_name_list',
+ type=str,
+ nargs='+',
+ default=[],
+ required=True,
+ help='the names of at most 2 models')
+ parser.add_argument('--save_path', type=str, default="results", help='path to save evaluation results')
+ parser.add_argument('--openai_key', type=str, default=None, required=True, help='Your openai key')
+ args = parser.parse_args()
+
+ if args.openai_key is not None:
+ os.environ["OPENAI_API_KEY"] = args.openai_key
+ openai.api_key = os.getenv("OPENAI_API_KEY")
+
+ main(args)
diff --git a/applications/Chat/evaluate/eval.sh b/applications/Chat/evaluate/eval.sh
new file mode 100755
index 000000000000..f5729e6ee5c7
--- /dev/null
+++ b/applications/Chat/evaluate/eval.sh
@@ -0,0 +1,9 @@
+python eval.py \
+ --config_file "path to the config file" \
+ --battle_prompt_file "path to the prompt file for battle" \
+ --gpt_evaluation_prompt_file "path to the prompt file for gpt evaluation" \
+ --target_file "path to the target answer file" \
+ --answer_file_list "path to the answer files of at most 2 models" \
+ --model_name_list "the names of at most 2 models" \
+ --save_path "path to save results" \
+ --openai_key "your openai key" \
diff --git a/applications/Chat/evaluate/evaluate.py b/applications/Chat/evaluate/evaluate.py
deleted file mode 100644
index 2f9c9ce8e10d..000000000000
--- a/applications/Chat/evaluate/evaluate.py
+++ /dev/null
@@ -1,256 +0,0 @@
-# Adapted form https://github.com/lm-sys/FastChat/blob/main/fastchat/eval/eval_gpt_review.py
-# Copyright 2023 LM-SYS@FastChat
-
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-
-# http://www.apache.org/licenses/LICENSE-2.0
-
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-
-import argparse
-import json
-import os
-import time
-import re
-import concurrent.futures
-
-import openai
-import tqdm
-import shortuuid
-import logging
-
-from utils import jload, jdump, get_json_list
-
-logging.basicConfig(level=logging.INFO)
-logger = logging.getLogger(__name__)
-
-MAX_API_RETRY = 3
-
-
-def get_eval(sys_prompt, user_prompt: str, answer_id: int, max_tokens: int, model: str):
- logging.basicConfig(level=logging.INFO)
- for _ in range(MAX_API_RETRY):
- try:
- response = openai.ChatCompletion.create(
- model=model,
- messages=[{
- 'role': 'system',
- 'content': sys_prompt
- }, {
- 'role': 'user',
- 'content': user_prompt,
- }],
- temperature=0.2,
- max_tokens=max_tokens,
- )
- review = response['choices'][0]['message']['content']
- return {"review": review, 'id': answer_id}
- except Exception as e:
- logger.error(e)
- time.sleep(1)
- logger.error(f' Review {answer_id} failed after {MAX_API_RETRY} retries.')
- return 'error'
-
-
-def parse_score(review):
- try:
- pattern = re.compile('([0-9]|10) out of 10')
- sp = re.findall(pattern, review)
- if len(re.findall(pattern, review)) == 2:
- return [float(sp[0]), float(sp[1])]
-
- pattern = re.compile('a score of ([0-9]|10)')
- sp = re.findall(pattern, review)
- if len(re.findall(pattern, review)) == 2:
- return [float(sp[0]), float(sp[1])]
-
- pattern = re.compile('([0-9]|10)/10')
- sp = re.findall(pattern, review)
- if len(re.findall(pattern, review)) == 2:
- return [float(sp[0]), float(sp[1])]
-
- score_pair = review.split('\n')[0]
- score_pair = score_pair.replace(',', ' ')
- sp = score_pair.split(' ')
- if len(sp) == 2:
- return [float(sp[0]), float(sp[1])]
- else:
- raise Exception('Invalid score pair.')
- except Exception as e:
- return [-1, -1]
-
-
-def gen_prompt(reviewer_jsons, prompt_jsons, cat, ques, ans1, ans2):
- reviewer_idx = 0
- for idx, reviewer in enumerate(reviewer_jsons):
- if reviewer['category'] == cat:
- reviewer_idx = idx
- break
- prompt_id = reviewer_jsons[reviewer_idx]['prompt_id']
- prompt_json = prompt_jsons[prompt_id-1]
- assert prompt_json['prompt_id'] == prompt_id
-
- sys_prompt = prompt_json['system_prompt']
- prompt_template = prompt_json['prompt_template']
- defaults = prompt_json['defaults']
- prompt = prompt_template.format(
- question=ques, answer_1=ans1, answer_2=ans2, **defaults)
-
- return sys_prompt, prompt, reviewer_idx+1
-
-
-def evaluate(args):
- answer1_jsons = jload(args.answer_file_list[0])
- answer2_jsons = jload(args.answer_file_list[1])
- reviewer_jsons = get_json_list(args.reviewer_file)
- prompt_jsons = get_json_list(args.prompt_file)
-
- assert len(answer1_jsons) == len(answer2_jsons)
-
- handles = []
- review_jsons = []
-
- total_len = len(answer1_jsons)
- question_idx_list = list(range(total_len))
-
- logger.info(
- f' Total number of answers: {len(answer2_jsons)}.')
-
- reviews = []
- with concurrent.futures.ThreadPoolExecutor(max_workers=args.num_workers) as executor:
- futures = []
- for i in question_idx_list:
- assert answer1_jsons[i]['id'] == answer2_jsons[i]['id']
- answer_id = answer1_jsons[i]['id']
-
- ques = answer1_jsons[i]['instruction'] if answer1_jsons[i]['input'] == "" else answer1_jsons[i]['instruction'] + \
- " " + answer1_jsons[i]['input']
- cat = answer1_jsons[i]['category']
- ans1 = answer1_jsons[i]['output']
- ans2 = answer2_jsons[i]['output']
-
- sys_prompt, prompt, reviewer_id = gen_prompt(
- reviewer_jsons, prompt_jsons, cat, ques, ans1, ans2)
-
- review_id = shortuuid.uuid()
- review_jsons.append({
- 'review_id': review_id,
- 'id': answer_id,
- 'reviewer_id': reviewer_id,
- 'metadata': {}
- })
-
- future = executor.submit(
- get_eval, sys_prompt, prompt, answer_id, args.max_tokens, args.model)
- futures.append(future)
-
- for future in tqdm.tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
- reviews.append(future.result())
-
- reviews.sort(key=lambda x: x['id'])
- review_jsons.sort(key=lambda x: x['id'])
-
- ans1_score = 0
- ans2_score = 0
- better_count = 0
- worse_count = 0
- tie_count = 0
- invalid_count = 0
-
- better_file = []
- worse_file = []
- tie_file = []
- invalid_file = []
- output_review_file = []
-
- for idx, review in enumerate(reviews):
- scores = parse_score(review['review'])
- review_jsons[idx]['review'] = review['review']
- review_jsons[idx]['score'] = scores
-
- if scores[0] == -1 and scores[1] == -1:
- invalid_count += 1
- invalid_file.append(review_jsons[idx])
- logger.info(f' Invalid score pair: {review_jsons[idx]["id"]}.')
- else:
- if scores[0] > scores[1]:
- worse_count += 1
- worse_file.append(review_jsons[idx])
- elif scores[0] < scores[1]:
- better_count += 1
- better_file.append(review_jsons[idx])
- else:
- tie_count += 1
- tie_file.append(review_jsons[idx])
- ans1_score += scores[0]
- ans2_score += scores[1]
-
- output_review_file.append(review_jsons[idx])
-
- better_file.sort(key=lambda x: x['id'])
- worse_file.sort(key=lambda x: x['id'])
- tie_file.sort(key=lambda x: x['id'])
- invalid_file.sort(key=lambda x: x['id'])
- output_review_file.sort(key=lambda x: x['id'])
-
- name1 = os.path.basename(args.answer_file_list[0]).split("_answers")[0]
- name2 = os.path.basename(args.answer_file_list[1]).split("_answers")[0]
- prefix = f"{name1}_vs_{name2}"
-
- jdump(better_file, os.path.join(
- args.output_folder, prefix, f"{prefix}_better.json"))
- jdump(worse_file, os.path.join(
- args.output_folder, prefix, f"{prefix}_worse.json"))
- jdump(tie_file, os.path.join(
- args.output_folder, prefix, f"{prefix}_tie.json"))
- jdump(invalid_file, os.path.join(
- args.output_folder, prefix, f"{prefix}_invalid.json"))
- jdump(output_review_file, os.path.join(
- args.output_folder, prefix, f"{prefix}_review.json"))
-
- if os.path.exists(os.path.join(args.output_folder, "results.json")):
- results = jload(os.path.join(args.output_folder, "results.json"))
- else:
- results = {}
- results[prefix] = {'model': [name1, name2], 'better': better_count, 'worse': worse_count, 'tie': tie_count, 'win_rate': better_count /
- (len(reviews)-invalid_count), 'score': [ans1_score/(len(reviews)-invalid_count), ans2_score/(len(reviews)-invalid_count)]}
- jdump(results, os.path.join(args.output_folder, "results.json"))
-
- logger.info(f' Total {invalid_count} invalid score pair(s).')
- logger.info(f' Model {name2} has {better_count} better answer(s).')
- logger.info(f' Model {name2} has {worse_count} worse answer(s).')
- logger.info(f' {tie_count} answer(s) play(s) to a tie.')
- logger.info(
- f' Win rate of model {name2}: {better_count/(len(reviews)-invalid_count):.2f}')
- logger.info(
- f' Model {name1} average score: {ans1_score/(len(reviews)-invalid_count):.2f}')
- logger.info(
- f' Model {name2} average score: {ans2_score/(len(reviews)-invalid_count):.2f}')
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser(
- description='Model evaluation.')
- parser.add_argument('--answer_file_list', nargs='+', default=[])
- parser.add_argument('--prompt_file')
- parser.add_argument('--reviewer_file')
- parser.add_argument('--output_folder', type=str, default="./output")
- parser.add_argument('--openai_key', type=str, default=None)
- parser.add_argument('--model', type=str, default="gpt-4")
- parser.add_argument('--num_workers', type=int, default=8)
- parser.add_argument('--max_tokens', type=int, default=512,
- help='maximum number of tokens produced in the output')
- args = parser.parse_args()
-
- if args.openai_key is not None:
- os.environ["OPENAI_API_KEY"] = args.openai_key
- openai.api_key = os.getenv("OPENAI_API_KEY")
-
- evaluate(args)
diff --git a/applications/Chat/evaluate/evaluate.sh b/applications/Chat/evaluate/evaluate.sh
deleted file mode 100755
index c51aa941019e..000000000000
--- a/applications/Chat/evaluate/evaluate.sh
+++ /dev/null
@@ -1,9 +0,0 @@
-python evaluate.py \
- --answer_file_list "path to answers of model 1" "path to answers of model 2" \
- --prompt_file "path to prompt file" \
- --reviewer_file "path to reviewer file" \
- --output_folder "path to output folder" \
- --openai_key "your openai key" \
- --model "gpt-4" \
- --num_workers 8 \
- --max_tokens 512 \
diff --git a/applications/Chat/evaluate/evaluator.py b/applications/Chat/evaluate/evaluator.py
new file mode 100644
index 000000000000..d3d1c038bfb8
--- /dev/null
+++ b/applications/Chat/evaluate/evaluator.py
@@ -0,0 +1,130 @@
+import os
+from typing import Any, Dict, List
+
+import gpt_evaluate
+import metrics
+import pandas as pd
+from utils import get_data_per_category, jdump
+
+
+class Evaluator(object):
+ """
+ A class named Evaluator includes GPT-3.5/GPT-4 evaluation
+ and automatic evaluation
+
+ """
+
+ def __init__(self, params: Dict[str, Any], battle_prompt: Dict[str, Any], gpt_evaluation_prompt: Dict[str,
+ Any]) -> None:
+ self.params = params
+ self.battle_prompt = battle_prompt
+ self.gpt_evaluation_prompt = gpt_evaluation_prompt
+ self.automatic_metric_stats = dict()
+ self.gpt35_evaluation_results = dict()
+ self.battle_results = []
+
+ def battle(self, answers1: List[Dict], answers2: List[Dict]) -> None:
+ """
+ Comparison between two models using GPT-4 as the reviewer.
+ """
+
+ self.battle_results = gpt_evaluate.battle(answers1, answers2, self.battle_prompt)
+
+ def evaluate(self, answers: List[Dict], targets: List[Dict]) -> None:
+ """
+ A comprehensive evaluation of the answers from the model.
+ The function evaluates the model's performance from different perspectives
+ using GPT-3.5, GPT-4, and off-the-shelf evaluation metrics.
+
+ The metrics will be decided by the config file.
+
+ """
+
+ def switch(metric):
+ if metric == "BLEU":
+ return metrics.bleu_score(preds=predicts_list, targets=targets_list)
+ elif metric == "ROUGE":
+ return metrics.rouge_cn_score(preds=predicts_list, targets=targets_list)
+ elif (metric == "Distinct"):
+ return metrics.distinct_score(preds=predicts_list)
+ elif (metric == "BERTScore"):
+ return metrics.bert_score(preds=predicts_list, targets=targets_list)
+ elif (metric == "Precision"):
+ return metrics.precision(preds=predicts_list, targets=targets_list)
+ elif (metric == "Recall"):
+ return metrics.recall(preds=predicts_list, targets=targets_list)
+ elif (metric == "F1 score"):
+ return metrics.F1_score(preds=predicts_list, targets=targets_list)
+ else:
+ raise ValueError(f"Unexpected metric")
+
+ answers_per_category = get_data_per_category(answers, list(self.params.keys()))
+ targets_per_category = get_data_per_category(targets, list(self.params.keys()))
+
+ # automatic evaluation
+ for category in self.params:
+ category_metrics = self.params[category]["Metrics"]
+ self.automatic_metric_stats[category] = {}
+
+ targets_list = [
+ target["target"] if target["target"] else target["output"] for target in targets_per_category[category]
+ ]
+ predicts_list = [answer["output"] for answer in answers_per_category[category]]
+
+ for metric in category_metrics:
+ self.automatic_metric_stats[category].update(switch(metric=metric))
+
+ # gpt35 evaluation
+ for category in self.params:
+ category_metrics = self.params[category]["GPT-3.5"]
+
+ prompt = self.gpt_evaluation_prompt.get(category, None)
+ if prompt is None:
+ print(f"No prompt for category {category}! Use prompt for category general now.")
+ prompt = self.gpt_evaluation_prompt["general"]
+
+ self.gpt35_evaluation_results[category] = gpt_evaluate.gpt35_evaluate(answers_per_category[category],
+ prompt, category_metrics, category)
+
+ def save(self, path: str, model_name_list: List[str]) -> None:
+ """
+ Save evaluation results of GPT-3.5, GPT-4, and off-the-shelf evaluation metrics.
+
+ """
+
+ if len(model_name_list) == 2:
+ save_path = os.path.join(path, "gpt_evaluate", "battle_results")
+ gpt_evaluate.save_battle_results(self.battle_results, model_name_list[0], model_name_list[1], save_path)
+ else:
+ # save evaluation results for automatic metrics
+ automatic_df = pd.DataFrame(self.automatic_metric_stats)
+
+ automatic_results_save_path = os.path.join(path, "automatic_results")
+ if not os.path.exists(automatic_results_save_path):
+ os.makedirs(automatic_results_save_path)
+ automatic_df.to_csv(os.path.join(automatic_results_save_path, f"{model_name_list[0]}.csv"), index=True)
+
+ # Save evaluation results for GPT-3.5 evaluation metrics.
+ all_evaluations = []
+ base_save_path = os.path.join(path, "gpt_evaluate", "gpt35_evaluate_results")
+ evaluation_results_save_path = os.path.join(base_save_path, "evaluation_results")
+
+ for category, evaluations in self.gpt35_evaluation_results.items():
+ jdump(
+ evaluations,
+ os.path.join(evaluation_results_save_path, model_name_list[0],
+ f"{category}_evaluation_results.json"))
+ all_evaluations.extend(evaluations)
+
+ jdump(all_evaluations,
+ os.path.join(evaluation_results_save_path, f"{model_name_list[0]}_evaluation_results.json"))
+
+ # Start to calculate scores and save statistics.
+ evaluation_statistics_save_path = os.path.join(base_save_path, "evaluation_statistics")
+ gpt_evaluate.save_gpt35_evaluation_statistics(model_name_list[0], all_evaluations,
+ evaluation_statistics_save_path)
+
+ # Save charts and csv.
+ evaluation_analyses_save_path = os.path.join(base_save_path, "evaluation_analyses")
+ gpt_evaluate.analyze_gpt35_evaluation_statistics(evaluation_statistics_save_path,
+ evaluation_analyses_save_path)
diff --git a/applications/Chat/evaluate/generate_answers.py b/applications/Chat/evaluate/generate_answers.py
deleted file mode 100644
index fbebf5c5e6f6..000000000000
--- a/applications/Chat/evaluate/generate_answers.py
+++ /dev/null
@@ -1,173 +0,0 @@
-import argparse
-import os
-import random
-import copy
-import math
-from tqdm import tqdm
-
-import torch
-import torch.distributed as dist
-import transformers
-
-from coati.models.bloom import BLOOMActor
-from coati.models.gpt import GPTActor
-from coati.models.opt import OPTActor
-from coati.models.roberta import RoBERTaActor
-from coati.models.llama import LlamaActor
-from coati.trainer.strategies import ColossalAIStrategy, DDPStrategy, NaiveStrategy
-from transformers import AutoTokenizer, RobertaTokenizer
-from transformers.models.gpt2.tokenization_gpt2 import GPT2Tokenizer
-
-from colossalai.logging import get_dist_logger
-
-from utils import jload, jdump, is_rank_0
-
-
-logger = get_dist_logger()
-
-PROMPT_DICT = {
- "prompt_input":
- ("Below is an instruction that describes a task, paired with an input that provides further context. "
- "Write a response that appropriately completes the request.\n\n"
- "### Instruction:\n{instruction}\n\n### Input:\n{input}\n\n### Response:"),
- "prompt_no_input": ("Below is an instruction that describes a task. "
- "Write a response that appropriately completes the request.\n\n"
- "### Instruction:\n{instruction}\n\n### Response:"),
-}
-
-
-def generate(args):
- # torch.cuda.set_per_process_memory_fraction(0.4)
- if args.strategy == 'naive':
- strategy = NaiveStrategy()
- elif args.strategy == 'ddp':
- strategy = DDPStrategy()
- elif args.strategy == 'colossalai_gemini':
- strategy = ColossalAIStrategy(stage=3, placement_policy='cuda')
- elif args.strategy == 'colossalai_zero2':
- strategy = ColossalAIStrategy(stage=2, placement_policy='cuda')
- elif args.strategy == 'colossalai_zero2_cpu':
- strategy = ColossalAIStrategy(stage=2, placement_policy='cpu')
- else:
- raise ValueError(f'Unsupported strategy "{args.strategy}"')
-
- world_size = dist.get_world_size()
- rank = dist.get_rank()
-
- with strategy.model_init_context():
- if args.model == 'gpt2':
- actor = GPTActor(pretrained=args.model_path).to(
- torch.cuda.current_device())
- elif args.model == 'bloom':
- actor = BLOOMActor(pretrained=args.model_path).to(
- torch.cuda.current_device())
- elif args.model == 'opt':
- actor = OPTActor(pretrained=args.model_path).to(
- torch.cuda.current_device())
- elif args.model == 'roberta':
- actor = RoBERTaActor(pretrained=args.model_path).to(
- torch.cuda.current_device())
- elif args.model == 'llama':
- actor = LlamaActor(pretrained=args.model_path).to(
- torch.float16).to(torch.cuda.current_device())
- else:
- raise ValueError(f'Unsupported model "{args.model}"')
-
- if args.model == 'gpt2':
- tokenizer = GPT2Tokenizer.from_pretrained('gpt2')
- tokenizer.pad_token = tokenizer.eos_token
- elif args.model == 'bloom':
- tokenizer = AutoTokenizer.from_pretrained('bigscience/bloom-560m')
- tokenizer.pad_token = tokenizer.eos_token
- elif args.model == 'opt':
- tokenizer = AutoTokenizer.from_pretrained('facebook/opt-350m')
- elif args.model == 'roberta':
- tokenizer = RobertaTokenizer.from_pretrained("roberta-base")
- elif args.model == 'llama':
- tokenizer = AutoTokenizer.from_pretrained(args.model_path,
- padding_side="right",
- use_fast=False,
- )
- tokenizer.eos_token = '<\s>'
- else:
- raise ValueError(f'Unsupported model "{args.model}"')
-
- questions = []
- if args.max_datasets_size is not None:
- questions = random.sample(jload(args.dataset), args.max_datasets_size)
- if is_rank_0():
- logger.info(
- f"Limiting dataset to {args.max_datasets_size} examples.")
- questions = questions[rank:args.max_datasets_size:world_size]
-
- answers = copy.deepcopy(questions)
-
- prompt_input, prompt_no_input = PROMPT_DICT["prompt_input"], PROMPT_DICT["prompt_no_input"]
- sources = [
- prompt_input.format_map(example) if example.get(
- "input", "") != "" else prompt_no_input.format_map(example)
- for example in questions
- ]
-
- if is_rank_0():
- logger.info("Tokenizing inputs... This may take some time...")
-
- input_ids_list = []
-
- for string in sources:
- input_ids = tokenizer.encode(string, return_tensors='pt').squeeze(0)
- input_ids_list.append(input_ids)
-
- bar = tqdm(range(math.ceil(len(input_ids_list)/args.batch_size)),
- desc=f'steps', disable=not is_rank_0())
-
- actor.eval()
- with torch.no_grad():
- for i in range(0, len(input_ids_list), args.batch_size):
- batch = input_ids_list[i:i+args.batch_size]
- batch = [i.flip(dims=[0]) for i in batch]
- batch = torch.nn.utils.rnn.pad_sequence(batch,
- batch_first=True,
- padding_value=tokenizer.pad_token_id if tokenizer.pad_token_id is not None else 0).to(torch.cuda.current_device())
- batch = batch.flip(dims=[1])
- attention_mask = batch.ne(tokenizer.pad_token_id if tokenizer.pad_token_id is not None else 0)
-
- outputs = actor.model.generate(batch, attention_mask=attention_mask,
- max_length=args.max_length,
- do_sample=True,
- top_k=50,
- top_p=0.95,
- num_return_sequences=1)
-
- outputs = tokenizer.batch_decode(outputs, skip_special_tokens=True)
- for j in range(batch.size(0)):
- answers[i +
- j]['output'] = outputs[j].split("### Response:")[1].strip()
-
- bar.update()
-
- jdump(answers, os.path.join(args.answer_path,
- f'{args.model_name}_answers_rank{rank}.json'))
-
- if is_rank_0():
- logger.info(
- f'Peak CUDA mem: {torch.cuda.max_memory_allocated()/1024**3:.3f} GB')
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.add_argument('--strategy',
- choices=['naive', 'ddp', 'colossalai_gemini',
- 'colossalai_zero2', 'colossalai_zero2_cpu'],
- default='naive')
- parser.add_argument('--model', default='gpt2',
- choices=['gpt2', 'bloom', 'opt', 'roberta', 'llama'])
- parser.add_argument('--model_path', type=str, default=None)
- parser.add_argument('--model_name', type=str, default='model')
- parser.add_argument('--dataset', type=str, default=None)
- parser.add_argument('--batch_size', type=int, default=1)
- parser.add_argument('--max_datasets_size', type=int, default=None)
- parser.add_argument('--answer_path', type=str, default="answer")
- parser.add_argument('--max_length', type=int, default=1024)
- args = parser.parse_args()
- generate(args)
diff --git a/applications/Chat/evaluate/generate_answers.sh b/applications/Chat/evaluate/generate_answers.sh
deleted file mode 100755
index 36881f5f4f29..000000000000
--- a/applications/Chat/evaluate/generate_answers.sh
+++ /dev/null
@@ -1,25 +0,0 @@
-device_number=number of your devices
-model_name="name of your model"
-model_path="path to your model"
-dataset="path to the question dataset"
-answer_path="path to save the model answers"
-
-torchrun --standalone --nproc_per_node=$device_number generate_answers.py \
- --model 'llama' \
- --strategy ddp \
- --model_path $model_path \
- --model_name $model_name \
- --dataset $dataset \
- --batch_size 8 \
- --max_datasets_size 80 \
- --answer_path $answer_path \
- --max_length 512
-
-python merge.py \
- --model_name $model_name \
- --shards $device_number \
- --answer_path $answer_path \
-
-for (( i=0; i Dict[str, Any]:
+ """
+ Get evaluation from GPT-4.
+
+ Args:
+ sys_prompt: prompt for the system.
+ user_prompt: prompt for the user.
+ id: id of the answers for comparison.
+ max_tokens: the maximum number of tokens to generate in the chat completion.
+
+ Returns:
+ An evaluation of one comparison.
+ """
+
+ MAX_API_RETRY = 3
+ for _ in range(MAX_API_RETRY):
+ try:
+ response = openai.ChatCompletion.create(
+ model="gpt-4",
+ messages=[
+ {
+ "role": "system",
+ "content": sys_prompt
+ },
+ {
+ "role": "user",
+ "content": user_prompt,
+ },
+ ],
+ temperature=0.2,
+ max_tokens=max_tokens,
+ )
+ evaluation = response["choices"][0]["message"]["content"]
+ return {"evaluation": evaluation, "id": id}
+ except Exception as e:
+ print(e)
+ time.sleep(1)
+ print(f" Evaluation {id} failed after {MAX_API_RETRY} retries.")
+ return {"evaluation": "", "id": id}
+
+
+def parse_battle_score(evaluation: str) -> List[float]:
+ """
+ Parse evaluation from GPT-4 and get the scores of model 1 and 2.
+
+ Args:
+ evaluation: evaluation from GPT-4.
+
+ Returns:
+ A score pair of two different model answers.
+ """
+
+ try:
+ pattern = re.compile("([0-9]|10) out of 10")
+ sp = re.findall(pattern, evaluation)
+ if len(re.findall(pattern, evaluation)) == 2:
+ return [float(sp[0]), float(sp[1])]
+
+ pattern = re.compile("a score of ([0-9]|10)")
+ sp = re.findall(pattern, evaluation)
+ if len(re.findall(pattern, evaluation)) == 2:
+ return [float(sp[0]), float(sp[1])]
+
+ pattern = re.compile("([0-9]|10)/10")
+ sp = re.findall(pattern, evaluation)
+ if len(re.findall(pattern, evaluation)) == 2:
+ return [float(sp[0]), float(sp[1])]
+
+ score_pair = evaluation.split("\n")[0]
+ score_pair = score_pair.replace(",", " ")
+ sp = score_pair.split(" ")
+ if len(sp) == 2:
+ return [float(sp[0]), float(sp[1])]
+ else:
+ raise Exception(f"Invalid score pair. Got {evaluation}.")
+ except Exception as e:
+ return [-1, -1]
+
+
+def battle(answer1: List[Dict], answer2: List[Dict], prompt_dict: Dict[str, Any]) -> List[Dict]:
+ """
+ Use GPT-4 to compare answers of two different models.
+
+ Args:
+ answer1: answers of model 1.
+ answer2: answers of model 2.
+ prompt_dict: prompt for battle.
+
+ Returns:
+ Evaluations of all comparison pairs.
+ """
+
+ assert len(answer1) == len(answer2)
+
+ handles = []
+ evaluation_file = []
+
+ total_len = len(answer1)
+ question_idx_list = list(range(total_len))
+
+ print(f" Total number of answers: {len(answer1)}.")
+
+ evaluations = []
+ with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+ futures = []
+ for i in question_idx_list:
+ assert answer1[i]["id"] == answer2[i]["id"]
+ answer_id = answer1[i]["id"]
+
+ ques = answer1[i]["instruction"] if answer1[i][
+ "input"] == "" else answer1[i]["instruction"] + " " + answer1[i]["input"]
+ cat = answer1[i]["category"]
+ ans1 = answer1[i]["output"]
+ ans2 = answer2[i]["output"]
+
+ sys_prompt = prompt_dict["system_prompt"]
+ prompt_template = prompt_dict["prompt_template"]
+ prompt = prompt_template.format(
+ question=ques,
+ answer_1=ans1,
+ answer_2=ans2,
+ prompt=prompt_dict["prompt"],
+ )
+
+ future = executor.submit(get_battle_result, sys_prompt, prompt, answer_id, 2048)
+ futures.append(future)
+
+ for future in tqdm.tqdm(concurrent.futures.as_completed(futures), total=len(futures)):
+ evaluations.append(future.result())
+
+ evaluations.sort(key=lambda x: x["id"])
+
+ return evaluations
+
+
+def save_battle_results(evaluations: List[Dict], name1: str, name2: str, save_path: str) -> None:
+ """
+ Save evaluation results (model 1 vs model 2) from GPT-4.
+
+ Args:
+ evaluations: evaluation results from GPT-4.
+ name1: model 1 's name.
+ name2: model 2 's name.
+ save_path: path to save battle results.
+ """
+
+ evaluation_file = deepcopy(evaluations)
+
+ ans1_score = 0
+ ans2_score = 0
+ better_count = 0
+ worse_count = 0
+ tie_count = 0
+ invalid_count = 0
+
+ better_file = []
+ worse_file = []
+ tie_file = []
+ invalid_file = []
+
+ for idx, evaluation in enumerate(evaluations):
+ scores = parse_battle_score(evaluation["evaluation"])
+ evaluation_file[idx]["score"] = scores
+
+ if scores[0] == -1 and scores[1] == -1:
+ invalid_count += 1
+ invalid_file.append(evaluation_file[idx])
+ print(f'Invalid score pair: {evaluation_file[idx]["id"]}.')
+ else:
+ if scores[0] > scores[1]:
+ worse_count += 1
+ worse_file.append(evaluation_file[idx])
+ elif scores[0] < scores[1]:
+ better_count += 1
+ better_file.append(evaluation_file[idx])
+ else:
+ tie_count += 1
+ tie_file.append(evaluation_file[idx])
+ ans1_score += scores[0]
+ ans2_score += scores[1]
+
+ prefix = f"{name1}_vs_{name2}"
+
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+
+ jdump(better_file, os.path.join(save_path, prefix, f"{name2}_better.json"))
+ jdump(worse_file, os.path.join(save_path, prefix, f"{name2}_worse.json"))
+ jdump(tie_file, os.path.join(save_path, prefix, f"{prefix}_tie.json"))
+ jdump(invalid_file, os.path.join(save_path, prefix, f"{prefix}_invalid.json"))
+ jdump(evaluation_file, os.path.join(save_path, prefix, f"{prefix}_evaluations.json"))
+
+ if os.path.exists(os.path.join(save_path, "battle_results.json")):
+ results = jload(os.path.join(save_path, "battle_results.json"))
+ else:
+ results = {}
+
+ results[prefix] = {
+ "model": [name1, name2],
+ "better": better_count,
+ "worse": worse_count,
+ "tie": tie_count,
+ "win_rate": better_count / (len(evaluations) - invalid_count),
+ "score": [
+ ans1_score / (len(evaluations) - invalid_count),
+ ans2_score / (len(evaluations) - invalid_count),
+ ],
+ }
+ jdump(results, os.path.join(save_path, "battle_results.json"))
+
+ print(f"Total {invalid_count} invalid score pair(s).")
+ print(f"Model {name2} has {better_count} better answer(s).")
+ print(f"Model {name2} has {worse_count} worse answer(s).")
+ print(f"{tie_count} answer(s) play(s) to a tie.")
+ print(f"Win rate of model {name2}: {better_count/(len(evaluations)-invalid_count):.2f}")
+ print(f"Model {name1} average score: {ans1_score/(len(evaluations)-invalid_count):.2f}")
+ print(f"Model {name2} average score: {ans2_score/(len(evaluations)-invalid_count):.2f}")
+
+
+def get_gpt35_evaluation(prompt: Dict[str, Any],
+ inst: Dict[str, Any],
+ metrics: List[str],
+ max_tokens: int = 2048) -> Dict[str, Any]:
+ """
+ Use GPT-3.5 to evaluate one model answer.
+
+ Args:
+ prompt: a dictionary including prompt template, CoT and metrics.
+ inst: the instruction that is needed to be evaluated.
+ metrics: the metrics for evaluation.
+ max_tokens: the maximum number of tokens to generate in the completion.
+
+ Returns:
+ An evaluation of one answer.
+ """
+
+ MAX_API_RETRY = 3
+
+ question = (inst["instruction"] if inst["input"] == "" else inst["instruction"] + " " + inst["input"])
+ answer = inst["output"]
+ inst["evaluation"] = {}
+
+ for metric in metrics:
+ if prompt["metrics"].get(metric, None) is None:
+ raise Exception(
+ f"Unsupported metric {metric} for category {inst['category']}! You should add this metric in the prompt file!"
+ )
+ for i in range(MAX_API_RETRY):
+ try:
+ response = openai.Completion.create(
+ model="text-davinci-003",
+ prompt=prompt["prompt"].format(
+ question=question,
+ answer=answer,
+ metric=prompt["metrics"][metric],
+ steps=prompt["CoT"][metric],
+ ),
+ logprobs=5,
+ temperature=0,
+ max_tokens=max_tokens,
+ )
+ inst["evaluation"][metric] = {
+ "response": response["choices"][0]["text"],
+ "logprobs": response["choices"][0]["logprobs"]["top_logprobs"],
+ }
+ break
+ except Exception as e:
+ print(e)
+ time.sleep(1)
+ return inst
+
+
+def gpt35_evaluate(
+ answers: List[Dict],
+ prompt: Dict[str, Any],
+ metrics: List[str],
+ category: str,
+) -> List[Dict]:
+ """
+ Use GPT-3.5 to evaluate model answers and save evaluation results.
+
+ Args:
+ answers: model answers.
+ prompt: prompt for GPT-3.5 evaluation.
+ metrics: metrics for GPT-3.5 evaluation.
+ category: the category of the model answers for evaluation.
+
+ Returns:
+ Evaluations of the given answers.
+ """
+
+ print(f"The number of instances of category {category}'s is {len(answers)}.")
+
+ evaluations = []
+
+ metrics_str = ", ".join(x for x in metrics)
+ print(f"Category {category}'s metrics are {metrics_str}.")
+
+ with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
+ futures = []
+ for inst in answers:
+ future = executor.submit(get_gpt35_evaluation, prompt, inst, metrics, 1)
+ futures.append(future)
+
+ for future in tqdm.tqdm(
+ concurrent.futures.as_completed(futures),
+ desc=f"{category}: ",
+ total=len(futures),
+ ):
+ evaluations.append(future.result())
+
+ evaluations.sort(key=lambda x: x["id"])
+
+ print(f"{category} done.")
+
+ return evaluations
+
+
+def calculate_scores_form_logprobs(logprobs: Dict[str, Any]) -> float:
+ """
+ Calculate score from log probabilities returned by text-davinci-003.
+ Only openai.Completion can return logprobs.
+
+ Calculation formula:
+ score = sum(score_i * exp(value)) where score_i is the score which corresponds to the key(predicted token) and value is its log probability.
+
+ Ref: https://arxiv.org/abs/2303.16634
+ This paper proposes NLG evaluation methods using GPT-3.5(logprobs returned by openai api) and GPT-4(logprobs obtained by sampling).
+
+ Args:
+ logprobs: logprobs returned by openai.Completion.
+
+ Returns:
+ Score of one answer.
+ """
+
+ # GPT-3.5 only returns score of 1 to 5.
+ prob = np.zeros(5)
+
+ for key, value in logprobs.items():
+ # Sometimes the key will be one byte of a unicode character which takes the form of "bytes:\\xe7".
+ # It is meaningless and thus we don't calculate probability.
+ if "bytes" in key:
+ continue
+ # results[0] is the score which corresponds to the key(predicted token).
+ # For example, key "5" corresponds to score 5.
+ results = re.findall(r"\d", key)
+ if len(results) == 1:
+ prob[int(results[0]) - 1] = prob[int(results[0]) - 1] + np.exp(value)
+
+ score = np.dot(np.arange(1, 6), prob)
+
+ return score
+
+
+def save_gpt35_evaluation_statistics(model_name: str, evaluations: List[Dict], save_path: str) -> None:
+ """
+ Generate statistics for one model.
+
+ Args:
+ model_name: name of the model for saving statistics.
+ evaluations: evaluations for all of the model answers.
+ save_path: path to save GPT-3.5 evaluation statistics.
+ """
+
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+
+ data_per_category = {}
+ for evaluation in evaluations:
+ category = evaluation["category"]
+ if evaluation["category"] in data_per_category.keys():
+ data_per_category[category].append(evaluation)
+ else:
+ data_per_category[category] = [evaluation]
+
+ all_statistics = {}
+ for category, data in data_per_category.items():
+ metrics = data[0]["evaluation"].keys()
+ scores = {metric: [] for metric in metrics}
+ for evaluation in data:
+ for metric in metrics:
+ scores[metric].append(calculate_scores_form_logprobs(evaluation["evaluation"][metric]["logprobs"][0]))
+
+ statistics = {}
+ for metric in metrics:
+ arg_sort = np.argsort(scores[metric])
+ statistics[metric] = {}
+ statistics[metric]["avg_score"] = sum(scores[metric]) / len(data)
+ statistics[metric]["best_3"] = {data[i]["id"]: scores[metric][i] for i in arg_sort[-3:][::-1]}
+ statistics[metric]["worst_3"] = {data[i]["id"]: scores[metric][i] for i in arg_sort[:3]}
+
+ all_statistics[category] = statistics
+
+ jdump(
+ all_statistics,
+ os.path.join(save_path, f"{model_name}_evaluation_statistics.json"),
+ )
+
+
+def analyze_gpt35_evaluation_statistics(statistics_path: str, save_path: str) -> None:
+ """
+ Analyze and visualize all GPT-3.5 evaluation statistics in the given directory.
+
+ Args:
+ statistics_path: path to all the models' statistics.
+ save_path: path to save table and visualization results.
+ """
+
+ if not os.path.exists(statistics_path):
+ raise Exception(f'The given directory "{statistics_path}" doesn\'t exist! No statistics found!')
+
+ all_statistics = {}
+
+ for file_name in os.listdir(statistics_path):
+ if file_name.endswith("_evaluation_statistics.json"):
+ model_name = file_name.split("_evaluation_statistics.json")[0]
+ all_statistics[model_name] = jload(os.path.join(statistics_path, file_name))
+
+ if len(list(all_statistics.keys())) == 0:
+ raise Exception(f'There are no statistics in the given directory "{statistics_path}"!')
+
+ frame_all = {
+ "model": [],
+ "category": [],
+ "metric": [],
+ "avg_score": [],
+ "best_3": [],
+ "worst_3": [],
+ }
+ frame_per_category = {}
+ for model_name, model_statistics in all_statistics.items():
+ for category, category_statistics in model_statistics.items():
+ if frame_per_category.get(category) is None:
+ frame_per_category[category] = {
+ "model": [],
+ "metric": [],
+ "avg_score": [],
+ "best_3": [],
+ "worst_3": [],
+ }
+
+ for metric, metric_statistics in category_statistics.items():
+ frame_all["model"].append(model_name)
+ frame_all["category"].append(category)
+ frame_all["metric"].append(metric)
+ frame_all["avg_score"].append(metric_statistics["avg_score"])
+ frame_all["best_3"].append(metric_statistics["best_3"])
+ frame_all["worst_3"].append(metric_statistics["worst_3"])
+
+ frame_per_category[category]["model"].append(model_name)
+ frame_per_category[category]["metric"].append(metric)
+ frame_per_category[category]["avg_score"].append(metric_statistics["avg_score"])
+ frame_per_category[category]["best_3"].append(metric_statistics["best_3"])
+ frame_per_category[category]["worst_3"].append(metric_statistics["worst_3"])
+
+ if not os.path.exists(save_path):
+ os.makedirs(save_path)
+
+ frame_all = pd.DataFrame(frame_all)
+ frame_all.to_csv(os.path.join(save_path, "gpt35_evaluation_statistics.csv"))
+
+ for category in tqdm.tqdm(
+ frame_per_category.keys(),
+ desc=f"category: ",
+ total=len(frame_per_category.keys()),
+ ):
+ data = pd.DataFrame(frame_per_category[category])
+
+ sns.set()
+ fig = plt.figure(figsize=(16, 10))
+ plt.ylim((0, 5))
+
+ fig = sns.barplot(x="metric", y="avg_score", hue="model", data=data, dodge=True)
+ fig.set_title(f"Comparison between Different Models for Category {category.title()}")
+ plt.xlabel("Evaluation Metric")
+ plt.ylabel("Average Score")
+
+ figure = fig.get_figure()
+ figure.savefig(os.path.join(save_path, f"{category}.png"), dpi=400)
diff --git a/applications/Chat/evaluate/merge.py b/applications/Chat/evaluate/merge.py
deleted file mode 100644
index 295dd7fa7cb3..000000000000
--- a/applications/Chat/evaluate/merge.py
+++ /dev/null
@@ -1,25 +0,0 @@
-import argparse
-import os
-
-from utils import jload, jdump
-
-
-def generate(args):
- dataset = []
- for i in range(args.shards):
- shard = jload(os.path.join(args.answer_path,
- f'{args.model_name}_answers_rank{i}.json'))
- dataset.extend(shard)
-
- dataset.sort(key=lambda x: x['id'])
- jdump(dataset, os.path.join(args.answer_path,
- f'{args.model_name}_answers.json'))
-
-
-if __name__ == '__main__':
- parser = argparse.ArgumentParser()
- parser.add_argument('--model_name', type=str, default='model')
- parser.add_argument('--shards', type=int, default=4)
- parser.add_argument('--answer_path', type=str, default="answer")
- args = parser.parse_args()
- generate(args)
diff --git a/applications/Chat/evaluate/metrics.py b/applications/Chat/evaluate/metrics.py
new file mode 100644
index 000000000000..5e657234c61a
--- /dev/null
+++ b/applications/Chat/evaluate/metrics.py
@@ -0,0 +1,169 @@
+import statistics
+
+import jieba
+from bert_score import score
+from nltk.translate.bleu_score import sentence_bleu
+from rouge_chinese import Rouge as Rouge_cn
+from sklearn.metrics import f1_score, precision_score, recall_score
+
+
+def bleu_score(preds: list, targets: list) -> dict:
+ """Calculate BLEU Score Metric
+
+ The calculation includes BLEU-1 for unigram, BLEU-2 for bigram,
+ BLEU-3 for trigram and BLEU-4 for 4-gram. Unigram evaluates the
+ accuracy in word level, other n-gram evaluate the fluency in
+ sentence level.
+ """
+ bleu_scores = {"bleu1": 0, "bleu2": 0, "bleu3": 0, "bleu4": 0}
+ cumulative_bleu = [0] * 4
+ weights = [(1. / 1., 0., 0., 0.), (1. / 2., 1. / 2., 0., 0.), (1. / 3., 1. / 3., 1. / 3., 0.),
+ (1. / 4., 1. / 4., 1. / 4., 1. / 4.)]
+
+ for pred, target in zip(preds, targets):
+ pred_list = (' '.join(jieba.cut(pred))).split()
+ target_list = [(' '.join(jieba.cut(target))).split()]
+
+ bleu = sentence_bleu(target_list, pred_list, weights=weights)
+ cumulative_bleu = [a + b for a, b in zip(cumulative_bleu, bleu)]
+
+ for i in range(len(cumulative_bleu)):
+ bleu_scores[f"bleu{i+1}"] = cumulative_bleu[i] / len(preds)
+
+ return bleu_scores
+
+
+def rouge_cn_score(preds: list, targets: list) -> dict:
+ """Calculate Chinese ROUGE Score Metric
+
+ The calculation includes ROUGE-1 for unigram, ROUGE-2 for bigram
+ and ROUGE-L. ROUGE-N evaluates the number of matching n-grams between
+ the preds and targets. ROUGE-L measures the number of matching
+ longest common subsequence (LCS) between preds and targets.
+ """
+ rouge_scores = {"rouge1": {}, "rouge2": {}, "rougeL": {}}
+ all_preds = []
+ all_targets = []
+
+ for pred, target in zip(preds, targets):
+ pred_list = ' '.join(jieba.cut(pred))
+ target_list = ' '.join(jieba.cut(target))
+ all_preds.append(pred_list)
+ all_targets.append(target_list)
+
+ rouge_cn = Rouge_cn()
+ rouge_avg = rouge_cn.get_scores(all_preds, all_targets, avg=True)
+
+ rouge_scores["rouge1"] = rouge_avg["rouge-1"]["f"]
+ rouge_scores["rouge2"] = rouge_avg["rouge-2"]["f"]
+ rouge_scores["rougeL"] = rouge_avg["rouge-l"]["f"]
+
+ return rouge_scores
+
+
+def distinct_score(preds: list) -> dict:
+ """Calculate Distinct Score Metric
+
+ This metric refers to https://arxiv.org/abs/1510.03055.
+ It evaluates the diversity of generation text by counting
+ the unique n-grams.
+ """
+ distinct_score = {"distinct": 0}
+ cumulative_distinct = []
+
+ for pred in preds:
+ pred_seg_list = list(' '.join(jieba.cut(pred)))
+ count_segs = len(pred_seg_list)
+ unique_segs = set(pred_seg_list)
+ count_unique_chars = len(unique_segs)
+
+ cumulative_distinct.append(count_unique_chars / count_segs)
+
+ distinct_score["distinct"] = statistics.mean(cumulative_distinct)
+
+ return distinct_score
+
+
+def bert_score(preds: list, targets: list) -> dict:
+ """Calculate BERTScore Metric
+
+ The BERTScore evaluates the semantic similarity between
+ tokens of preds and targets with BERT.
+ """
+ bert_score = {"bert_score": 0}
+ pred_list = []
+ target_list = []
+
+ for pred, target in zip(preds, targets):
+ pred_list.append(' '.join(jieba.cut(pred)))
+ target_list.append(' '.join(jieba.cut(target)))
+
+ _, _, F = score(pred_list, target_list, lang="zh", verbose=True)
+
+ bert_score["bert_score"] = F.mean().item()
+
+ return bert_score
+
+
+def calculate_precision_recall_f1(preds: list, targets: list) -> dict:
+ """Precision, Recall and F1-Score Calculation
+
+ The calculation of precision, recall and f1-score is realized by counting
+ the number f overlaps between the preds and target. The comparison length
+ limited by the shorter one of preds and targets. This design is mainly
+ considered for classification and extraction categories.
+ """
+ precision_recall_f1 = {"precision": 0, "recall": 0, "f1_score": 0}
+ precision_scores = []
+ recall_scores = []
+ f1_scores = []
+
+ for pred, target in zip(preds, targets):
+ pred_list = [char for char in pred]
+ target_list = [char for char in target]
+
+ target_labels = [1] * min(len(target_list), len(pred_list))
+ pred_labels = [int(pred_list[i] == target_list[i]) for i in range(0, min(len(target_list), len(pred_list)))]
+
+ precision_scores.append(precision_score(target_labels, pred_labels, zero_division=0))
+ recall_scores.append(recall_score(target_labels, pred_labels, zero_division=0))
+ f1_scores.append(f1_score(target_labels, pred_labels, zero_division=0))
+
+ precision_recall_f1["precision"] = statistics.mean(precision_scores)
+ precision_recall_f1["recall"] = statistics.mean(recall_scores)
+ precision_recall_f1["f1_score"] = statistics.mean(f1_scores)
+
+ return precision_recall_f1
+
+
+def precision(preds: list, targets: list) -> dict:
+ """Calculate Precision Metric
+ (design for classification and extraction categories)
+
+ Calculating precision by counting the number of overlaps between the preds and target.
+ """
+ precision = {"precision": 0}
+ precision["precision"] = calculate_precision_recall_f1(preds, targets)["precision"]
+ return precision
+
+
+def recall(preds: list, targets: list) -> dict:
+ """Calculate Recall Metric
+ (design for classification and extraction categories)
+
+ Calculating recall by counting the number of overlaps between the preds and target.
+ """
+ recall = {"recall": 0}
+ recall["recall"] = calculate_precision_recall_f1(preds, targets)["recall"]
+ return recall
+
+
+def F1_score(preds: list, targets: list) -> dict:
+ """Calculate F1-score Metric
+ (design for classification and extraction categories)
+
+ Calculating f1-score by counting the number of overlaps between the preds and target.
+ """
+ f1 = {"f1_score": 0}
+ f1["f1_score"] = calculate_precision_recall_f1(preds, targets)["f1_score"]
+ return f1
diff --git a/applications/Chat/evaluate/prompt/battle_prompt/battle_prompt_cn.json b/applications/Chat/evaluate/prompt/battle_prompt/battle_prompt_cn.json
new file mode 100644
index 000000000000..ca66afd7e464
--- /dev/null
+++ b/applications/Chat/evaluate/prompt/battle_prompt/battle_prompt_cn.json
@@ -0,0 +1,6 @@
+{
+ "id": 1,
+ "system_prompt": "你是一个检查回答质量的好助手。",
+ "prompt_template": "[问题]\n{question}\n\n[1号AI助手的答案]\n{answer_1}\n\n[1号AI助手答案终止]\n\n[2号AI助手的答案]\n{answer_2}\n\n[2号AI助手答案终止]\n\n[要求]\n{prompt}\n\n",
+ "prompt": "我们需要你评价这两个AI助手回答的性能。\n请对他们的回答的有用性、相关性、准确性、详细程度进行评分。每个AI助手都会得到一个1到10分的总分,分数越高表示整体表现越好。\n请首先输出一行,该行只包含两个数值,分别表示1号和2号AI助手的分数。这两个分数之间要有一个空格。在随后的一行中,请对你的评价作出全面的解释,避免任何潜在的偏见,并确保AI助手回答的顺序不会影响您的判断。"
+}
diff --git a/applications/Chat/evaluate/prompt/evaluation_prompt/evaluation_prompt_cn.json b/applications/Chat/evaluate/prompt/evaluation_prompt/evaluation_prompt_cn.json
new file mode 100644
index 000000000000..d4b8d143eadf
--- /dev/null
+++ b/applications/Chat/evaluate/prompt/evaluation_prompt/evaluation_prompt_cn.json
@@ -0,0 +1,179 @@
+[
+ {
+ "id": 1,
+ "category": "brainstorming",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "creativity": "创意性(1-5):某些头脑风暴问题可能需要答案具有创意,提出新的思路。",
+ "practicality": "实用性(1-5):某些头脑风暴问题可能需要答案提出实用的建议或解决方法。",
+ "correctness": "正确性(1-5):答案应该符合常识、生活实际等等。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "creativity": "1. 仔细阅读所提供的头脑风暴问题,确保你理解问题的要点和背景。\n2. 根据你的知识和经验,判断所提供的答案是否可行。如果答案不可行,则创意性评分可能会受到影响。\n3. 考虑答案中是否包含新颖的想法或独特的思路。答案可能与已知的解决方案有所重叠,但仍然可以被认为是有创意的,只要它提供了新的角度或方法来解决问题。\n4. 根据答案的创意性,给出一个1到5的评分。如果答案缺乏创意,则应给出一个较低的评分。如果答案具有创意并提供了新的思路,应给出一个较高的评分。\n\n创意性:",
+ "practicality": "1. 仔细阅读所提供的头脑风暴问题,确保你理解问题的要点和背景。\n2. 根据你的知识和经验,判断所提供的答案是否可行。如果答案不可行,则实用性评分可能会受到影响。\n3. 考虑答案中提出的建议或解决方法是否实用并可行。答案可能看起来很好,但如果无法实现或应用,则实用性评分可能会受到影响。\n4. 根据答案的实用性,给出一个1到5的评分。如果答案缺乏实用性,则应给出一个较低的评分。如果答案提出了实用的建议或解决方法,并且可以很好地解决问题,则应给出一个较高的评分。\n\n实用性:",
+ "correctness": "1. 仔细阅读所提供的头脑风暴问题,确保你理解问题的要点和背景。\n2. 根据你的知识和经验,判断所提供的答案是否可行。如果答案不可行,则正确性评分可能会受到影响。\n3. 考虑答案中所提供的信息是否正确、符合常识、生活实际等等。如果答案中存在明显的错误或不合理之处,则正确性评分可能会受到影响。\n4. 根据答案的正确性,给出一个1到5的评分。如果答案存在明显的错误或不合理之处,则应给出一个较低的评分。如果答案正确、符合常识、生活实际等等,则应给出一个较高的评分。\n\n正确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面“头脑风暴”问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 2,
+ "category": "chat",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "naturalness": "自然(1-5):答案是否自然,并且符合问题给定的身份。",
+ "engagingness": "参与感(1-5):答案是否对前面的对话内容做出了恰当的反应,是否理解对话的语境和背景。",
+ "reasonableness": "合理性(1-5):答案是否能够与前面的对话内容形成逻辑上的衔接,是否符合常理,能否在这个上下文中合理存在。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "naturalness": "1. 阅读题目,确定题目提供的身份信息。\n2. 检查答案内容是否符合题目给定的身份。\n3. 根据以上因素,对该回答的自然性进行打分,分数从1到5,其中1表示不自然,5表示非常自然,并符合问题给定的身份。\n\n自然:",
+ "engagingness": "1. 阅读题目,确定对话的语境和背景。\n2. 检查答案是否充分理解对话的语境和背景,能否自然地融入到对话中而不显得突兀。\n3. 根据以上因素,对该回答的参与感进行打分,分数从1到5,其中1表示没有参与感,5表示非常有参与感,并且恰当地理解了对话的语境和背景。\n\n参与感:",
+ "reasonableness": "1. 阅读题目,确定对话的主题以及问题期望的回答方向。\n2. 判断答案是否能够与前面的对话内容形成逻辑上的衔接,是否符合常理,能否在这个上下文中合理存在。\n3. 根据以上因素,对该回答的合理性进行打分,分数从1到5,其中1表示不合理,5表示非常合理,并且能够与前面的对话内容形成逻辑上的衔接,并符合常理。\n\n合理性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的“补全对话”问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 3,
+ "category": "classification",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "正确性(1-5):答案是否正确。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读题目,尝试自己回答该问题。\n2. 检查答案的准确性。您可以使用已知的事实或研究来验证答案是否正确。如果答案是正确的,则可以将正确性得分为5分。如果答案是部分正确的,则可以给予适当的得分,例如2分、3分或4分。如果答案完全不正确,则只得1分。\n\n正确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的“分类“问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 4,
+ "category": "closed_qa",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "正确性(1-5):答案是否正确。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读题目,尝试自己回答该问题。\n2. 检查答案的准确性。您可以使用已知的事实或研究来验证答案是否正确。如果答案是正确的,则可以将正确性得分为5分。如果答案是部分正确的,则可以给予适当的得分,例如2分、3分或4分。如果答案完全不正确,则只得1分。\n\n正确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面问题的答案打分。\n\n问题如下:\n\n{question}\n\n需要你评分的答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 5,
+ "category": "extraction",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "准确性(1-5):回答应该准确无误地提取出所需信息,不应该包含任何错误或误导性信息。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读问题并确定需要从材料中提取的信息。\n2. 仔细阅读回答并确保它涵盖了所有需要提取的信息。\n3. 使用所提供的材料来验证回答的准确性。如果回答不准确或包含错误或误导性信息,则无法给出高分。\n4. 检查回答是否包含所有要求提取的信息,不要漏掉任何重要细节。\n5. 根据回答的准确性和完整性,给出一个介于1和5之间的分数,5分表示回答非常准确且完整,1分表示回答几乎没有提取出所需信息。\n\n准确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的“提取”问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 6,
+ "category": "generation",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "diversity": "多样性(1-5):答案使用语言是否优美,具有有一定的创造性和想象力。然而,回答也应该保持合理和适度,不要过于夸张或离题。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "diversity": "1. 仔细阅读整个回答,确保完全理解回答所表达的内容和主题。\n2. 在阅读回答的同时,注意语言的质量,例如措辞是否正确,语言是否生动等。\n3. 检查回答的创造性和想象力,看看回答是否能够吸引人阅读下去。\n4. 检查回答的合理性和适度,看看回答是否夸张或离题。\n5. 将多样性的评分打分在1到5之间,5分表示回答的质量很好,能够吸引人阅读,1分表示回答的内容生硬或者有离题的问题。\n\n多样性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的“生成”问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 7,
+ "category": "open_qa",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "正确性(1-5):答案是否正确。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读题目,尝试自己回答该问题。\n2. 检查答案的准确性。您可以使用已知的事实或研究来验证答案是否正确。如果答案是正确的,则可以将正确性得分为5分。如果答案是部分正确的,则可以给予适当的得分,例如2分、3分或4分。如果答案完全不正确,则只得1分。\n\n正确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 8,
+ "category": "rewriting",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "正确性(1-5):答案是否正确。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读题目,尝试自己回答该问题。\n2. 检查答案的准确性。您可以使用已知的事实或研究来验证答案是否正确。如果答案是正确的,则可以将正确性得分为5分。如果答案是部分正确的,则可以给予适当的得分,例如2分、3分或4分。如果答案完全不正确,则只得1分。\n\n正确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 9,
+ "category": "roleplay",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "fidelity": "保真度(1-5):答案是否能够严格遵守角色的设定回答给定的请求。",
+ "creativity": "创意性(1-5):角色扮演问题的回答需要具有一定创意,但同时需要遵守角色的设定。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "fidelity": "1. 仔细阅读问题,了解角色在问题中的设定和表现,包括职业、背景、观点、性格等方面。\n2. 阅读题目的请求,确认回答请求时需要注意的细节。\n3. 对比提供的回答与该角色的设定,评估回答是否能够严格遵守角色的设定。\n4. 结合以上评估结果给出保真度的评分,范围从1到5分,其中1分表示回答与角色设定完全不符,5分表示回答完全符合角色设定且满足给定请求。\n\n保真度:",
+ "creativity": "1. 仔细阅读问题,了解角色在问题中的设定和表现,包括职业、背景、观点、性格等方面。\n2. 评估回答是否具有独特的思路和建议,是否能够给提问者带来新的想法和启示。\n3. 对比回答中的创意和该角色的设定,评估回答是否遵守了该角色的设定和基本特征。\n4. 对回答的质量进行总体评估,并结合以上评估结果给出创意性的评分,范围从1到5分,其中1分表示回答缺乏创意,5分表示回答具有独特的思路和建议,并且能够遵守该角色的设定。\n\n创意性:"
+ },
+ "prompt": "你是一个好助手。请你为下面的“角色扮演”问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 10,
+ "category": "summarization",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "准确性(1-5):回答应该准确无误地总结出材料的重点。",
+ "conciseness": "简明扼要(1-5):答案是否简明扼要,没有冗余内容。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读问题给的材料,理解其内容和要点。\n2. 评估回答是否准确地总结出原始材料的重点。\n3. 评估回答是否包含原始材料中的所有关键信息。\n4. 根据以上步骤,给出一个1-5的分数,其中1表示回答不能准确地总结出材料的重点,5表示回答完全准确地总结出材料的重点。\n\n准确性:",
+ "conciseness": "1. 阅读题目,提取出材料的重点。\n2. 阅读该总结,并注意其中的主要观点和信息。\n3. 评估总结的长度。一个简明扼要的总结通常应该在几句话或几段文字内传达关键信息,而不是冗长的段落或文章。\n4. 检查总结是否包含与主要观点无关的信息或冗余信息。\n5.确定总结涵盖了材料中的关键信息,并且没有忽略任何重要细节。\n6.给总结打出1-5的分数,其中5表示总结简明扼要,没有冗余内容,而1表示总结冗长或包含不必要的信息,难以理解或记忆。根据您的判断,打出适当的得分。\n\n简明扼要:"
+ },
+ "prompt": "你是一个好助手。请你为下面的“总结”问题的答案打分。\n\n问题如下:\n\n{question}\n\n答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ },
+ {
+ "id": 11,
+ "category": "general",
+ "metrics": {
+ "language organization": "语言组织(1-5):答案语言是否流畅、连贯,使用正确的语法,具有一定逻辑性,使用恰当的连接词、过渡词等等。",
+ "relevance": "切题(1-5):答案内容是否切题,不答非所问,并且严格遵照题目要求。",
+ "correctness": "正确性(1-5):答案是否正确。"
+ },
+ "CoT": {
+ "language organization": "1. 阅读答案,并检查是否有语法错误、用词不当或其他显著的错误。\n2. 检查答案是否具有逻辑性,能够按照合理的顺序传达信息并且能够自圆其说。\n3. 确定答案是否与问题或主题相关,并且能够传达清晰的信息。\n4. 检查答案是否连贯,是否使用适当的转换和过渡来保持句子和段落之间的连贯性。\n5. 检查答案是否具有明确的结构和组织方式,使得读者可以轻松理解信息的层次和结构。\n6. 根据以上因素综合评估答案的语言组织,并给出一个1到5的分数,其中5表示语言组织非常好,而1表示语言组织非常差。\n\n语言组织:",
+ "relevance": "1. 阅读题目,确定题目所问的问题是什么,以及需要回答哪些方面的问题。\n2. 阅读答案,确认答案是否直接回答了题目所问的问题。\n3. 检查答案是否严格遵照了题目的要求,包括答题方式、答题长度、答题格式等等。\n4. 根据以上因素综合评估答案的切题程度,并给出一个1到5的分数,其中5表示答案非常切题,而1表示答案完全没有切题。\n\n切题:",
+ "correctness": "1. 仔细阅读题目,尝试自己回答该问题。\n2. 检查答案的准确性。您可以使用已知的事实或研究来验证答案是否正确。如果答案是正确的,则可以将正确性得分为5分。如果答案是部分正确的,则可以给予适当的得分,例如2分、3分或4分。如果答案完全不正确,则只得1分。\n\n正确性:"
+ },
+ "prompt": "你是一个好助手。请你为下面问题的答案打分。\n\n问题如下:\n\n{question}\n\n需要你评分的答案如下:\n\n{answer}\n\n评分的指标如下:\n\n{metric}\n\n请你遵照以下的评分步骤:\n\n{steps}"
+ }
+]
diff --git a/applications/Chat/evaluate/requirements.txt b/applications/Chat/evaluate/requirements.txt
new file mode 100644
index 000000000000..b0301c2f17f8
--- /dev/null
+++ b/applications/Chat/evaluate/requirements.txt
@@ -0,0 +1,10 @@
+jieba
+bert-score
+rouge_chinese
+scikit-metrics
+nltk
+openai
+seaborn
+pandas
+matplotlib
+numpy
diff --git a/applications/Chat/evaluate/sample/questions.json b/applications/Chat/evaluate/sample/questions.json
deleted file mode 100644
index e9ef9f8b1c66..000000000000
--- a/applications/Chat/evaluate/sample/questions.json
+++ /dev/null
@@ -1,9 +0,0 @@
-[
- {
- "id": 0,
- "instruction": "Help me summarize the following news?",
- "input": "National Commercial Bank (NCB), Saudi Arabia's largest lender by assets, agreed to buy rival Samba Financial Group for $15 billion in the biggest banking takeover this year.NCB will pay 28.45 riyals ($7.58) for each Samba share, according to a statement on Sunday, valuing it at about 55.7 billion riyals. NCB will offer 0.739 new shares for each Samba share, at the lower end of the 0.736-0.787 ratio the banks set when they signed an initial framework agreement in June.The offer is a 3.5% premium to Samba's Oct. 8 closing price of 27.50 riyals and about 24% higher than the level the shares traded at before the talks were made public. Bloomberg News first reported the merger discussions.The new bank will have total assets of more than $220 billion, creating the Gulf region's third-largest lender. The entity's $46 billion market capitalization nearly matches that of Qatar National Bank QPSC, which is still the Middle East's biggest lender with about $268 billion of assets.",
- "output": "NCB to pay 28.45 riyals for each Samba share. Deal will create Gulf region's third-largest lender",
- "category": "closed qa"
- }
-]
\ No newline at end of file
diff --git a/applications/Chat/evaluate/utils.py b/applications/Chat/evaluate/utils.py
index 692ee007c080..e855cd45221c 100644
--- a/applications/Chat/evaluate/utils.py
+++ b/applications/Chat/evaluate/utils.py
@@ -2,10 +2,6 @@
import json
import os
-import torch.distributed as dist
-
-def is_rank_0() -> bool:
- return not dist.is_initialized() or dist.get_rank() == 0
def _make_w_io_base(f, mode: str):
if not isinstance(f, io.IOBase):
@@ -15,11 +11,13 @@ def _make_w_io_base(f, mode: str):
f = open(f, mode=mode)
return f
+
def _make_r_io_base(f, mode: str):
if not isinstance(f, io.IOBase):
f = open(f, mode=mode)
return f
+
def jdump(obj, f, mode="w", indent=4, default=str):
"""Dump a str or dictionary to a file in json format.
Args:
@@ -38,6 +36,7 @@ def jdump(obj, f, mode="w", indent=4, default=str):
raise ValueError(f"Unexpected type: {type(obj)}")
f.close()
+
def jload(f, mode="r"):
"""Load a .json file into a dictionary."""
f = _make_r_io_base(f, mode)
@@ -45,9 +44,19 @@ def jload(f, mode="r"):
f.close()
return jdict
+
def get_json_list(file_path):
with open(file_path, 'r') as f:
json_list = []
for line in f:
json_list.append(json.loads(line))
return json_list
+
+
+def get_data_per_category(data, categories):
+ data_per_category = {category: [] for category in categories}
+ for item in data:
+ category = item["category"]
+ data_per_category[category].append(item)
+
+ return data_per_category
diff --git a/applications/Chat/examples/README.md b/applications/Chat/examples/README.md
index 2a2128e25a62..72810738d017 100644
--- a/applications/Chat/examples/README.md
+++ b/applications/Chat/examples/README.md
@@ -48,6 +48,7 @@ The following pic shows how we collected the data.
## Stage1 - Supervised instructs tuning
Stage1 is supervised instructs fine-tuning, which uses the datasets mentioned earlier to fine-tune the model.
+[[Stage1 tutorial video]](https://www.youtube.com/watch?v=-qFBZFmOJfg)
You can run the `examples/train_sft.sh` to start a supervised instructs fine-tuning.
@@ -83,6 +84,7 @@ torchrun --standalone --nproc_per_node=4 train_sft.py \
## Stage2 - Training reward model
We train a reward model in stage 2, which obtains corresponding scores by manually ranking different outputs for the same prompt and supervises the training of the reward model.
+[[Stage2 tutorial video]](https://www.youtube.com/watch?v=gMx2CApKhuo)
You can run the `examples/train_rm.sh` to start a reward model training.
@@ -141,6 +143,7 @@ Stage3 uses reinforcement learning algorithm, which is the most complex part of
You can run the `examples/train_prompts.sh` to start PPO training.
You can also use the cmd following to start PPO training.
+[[Stage3 tutorial video]](https://www.youtube.com/watch?v=Z8wwSHxPL9g)
```
torchrun --standalone --nproc_per_node=4 train_prompts.py \
@@ -153,7 +156,7 @@ torchrun --standalone --nproc_per_node=4 train_prompts.py \
--rm_path /your/rm/model/path
```
-Prompt dataset: the instruction dataset mentioned in the above figure which includes the instructions, e.g. you can use the [script](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/example_data_reformat.py) to reformat [seed_prompts_ch.jsonl](https://github.com/XueFuzhao/InstructionWild/blob/main/data/seed_prompts_ch.jsonl) or [seed_prompts_en.jsonl](https://github.com/XueFuzhao/InstructionWild/blob/main/data/seed_prompts_en.jsonl) in InstructionWild.
+Prompt dataset: the instruction dataset mentioned in the above figure which includes the instructions, e.g. you can use the [script](https://github.com/hpcaitech/ColossalAI/tree/main/applications/Chat/examples/generate_prompt_dataset.py) which samples `instinwild_en.json` or `instinwild_ch.json` in [InstructionWild](https://github.com/XueFuzhao/InstructionWild/tree/main/data#instructwild-data) to generate the prompt dataset.
Pretrain dataset: the pretrain dataset including the instruction and corresponding response, e.g. you can use the [InstructWild Data](https://github.com/XueFuzhao/InstructionWild/tree/main/data) in stage 1 supervised instructs tuning.
### Arg List
diff --git a/applications/Chat/examples/community/peft/README.md b/applications/Chat/examples/community/peft/README.md
index eabb56fd8294..844bfd3d22c3 100644
--- a/applications/Chat/examples/community/peft/README.md
+++ b/applications/Chat/examples/community/peft/README.md
@@ -18,7 +18,7 @@ For SFT training, just call train_peft_sft.py
Its arguments are almost identical to train_sft.py instead adding a new eval_dataset if you have a eval_dataset file. The data file is just a plain datafile, please check the format in the easy_dataset.py.
For stage-3 rlhf training, call train_peft_prompts.py.
-Its arguments are almost idential to train_prompts.py. The only difference is that I use text files to indicate the prompt and pretrained data file. The models are included in easy_models.py. Currently only bloom models are tested, but technically gpt2/opt/llama should be supported.
+Its arguments are almost identical to train_prompts.py. The only difference is that I use text files to indicate the prompt and pretrained data file. The models are included in easy_models.py. Currently only bloom models are tested, but technically gpt2/opt/llama should be supported.
# Dataformat
Please refer the formats in test_sft.txt, test_prompts.txt, test_pretrained.txt.
diff --git a/applications/Chat/examples/example_data_reformat.py b/applications/Chat/examples/example_data_reformat.py
deleted file mode 100644
index dc83b29b525b..000000000000
--- a/applications/Chat/examples/example_data_reformat.py
+++ /dev/null
@@ -1,12 +0,0 @@
-jsonl_file = 'seed_prompts_xx.jsonl' # seed_prompts_en.jsonl or seed_prompts_ch.json from InstructionWild
-reformat_file = 'prompts_xx.jsonl' # reformat jsonl file used as Prompt dataset in Stage3
-
-data = ''
-with open(jsonl_file, 'r', encoding="utf-8") as f1:
- for jsonstr in f1.readlines():
- jsonstr = '\t' + jsonstr.strip('\n') + ',\n'
- data = data + jsonstr
- data = '[\n' + data + ']'
-
-with open(reformat_file, 'w') as f2:
- f2.write(data)
\ No newline at end of file
diff --git a/applications/Chat/examples/generate_prompt_dataset.py b/applications/Chat/examples/generate_prompt_dataset.py
new file mode 100644
index 000000000000..95e40fefe7ff
--- /dev/null
+++ b/applications/Chat/examples/generate_prompt_dataset.py
@@ -0,0 +1,30 @@
+import argparse
+
+import random
+import json
+
+random.seed(42)
+
+
+def sample(args):
+ with open(args.dataset_path, mode='r') as f:
+ dataset_list = json.load(f)
+
+ sampled_dataset = [{"instruction": sample["instruction"], "id":idx}
+ for idx, sample in enumerate(random.sample(dataset_list, args.sample_size))]
+
+ with open(args.save_path, mode='w') as f:
+ json.dump(sampled_dataset, f, indent=4,
+ default=str, ensure_ascii=False)
+
+
+if __name__ == '__main__':
+ parser = argparse.ArgumentParser()
+ parser.add_argument('--dataset_path', type=str, default=None,
+ required=True, help="path to the pretrain dataset")
+ parser.add_argument('--save_path', type=str, default='prompt.json',
+ help="path to save the prompt dataset")
+ parser.add_argument('--sample_size', type=int,
+ default=16384, help="size of the prompt dataset")
+ args = parser.parse_args()
+ sample(args)
diff --git a/applications/Chat/inference/README.md b/applications/Chat/inference/README.md
index 434677c98fa5..4848817e0fd1 100644
--- a/applications/Chat/inference/README.md
+++ b/applications/Chat/inference/README.md
@@ -75,7 +75,7 @@ E.g. you can set `export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH`.
Please ensure you have downloaded HF-format model weights of LLaMA models first.
-Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight convertion script.
+Then you can follow [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). This lib provides efficient CUDA kernels and weight conversion script.
After installing this lib, we may convert the original HF-format LLaMA model weights to 4-bit version.
diff --git a/applications/Chat/inference/benchmark.py b/applications/Chat/inference/benchmark.py
index 59cd1eeea2aa..a8485f588705 100644
--- a/applications/Chat/inference/benchmark.py
+++ b/applications/Chat/inference/benchmark.py
@@ -123,7 +123,7 @@ def evaluate(
start = time()
for instruction in instructions:
print(f"Instruction: {instruction}")
- resp, tokens = evaluate(model, tokenizer, instruction, temparature=0.2, num_beams=1)
+ resp, tokens = evaluate(model, tokenizer, instruction, temperature=0.2, num_beams=1)
total_tokens += tokens
print(f"Response: {resp}")
print('\n----------------------------\n')
diff --git a/colossalai/amp/torch_amp/_grad_scaler.py b/colossalai/amp/torch_amp/_grad_scaler.py
index 7b78998fb8c2..ed4b8e484436 100644
--- a/colossalai/amp/torch_amp/_grad_scaler.py
+++ b/colossalai/amp/torch_amp/_grad_scaler.py
@@ -240,7 +240,7 @@ def _unscale_grads_(self, optimizer, inv_scale, found_inf, allow_fp16):
for grads in per_dtype_grads.values():
torch._amp_foreach_non_finite_check_and_unscale_(grads, per_device_found_inf.get(device),
per_device_inv_scale.get(device))
- # For tensor parallel paramters it should be all-reduced over tensor parallel process group
+ # For tensor parallel parameters it should be all-reduced over tensor parallel process group
if gpc.is_initialized(ParallelMode.MODEL) and gpc.get_world_size(ParallelMode.MODEL) > 1:
vals = [val for val in per_device_found_inf._per_device_tensors.values()]
coalesced = _flatten_dense_tensors(vals)
diff --git a/colossalai/auto_parallel/README.md b/colossalai/auto_parallel/README.md
index 8e47e1bb0b4a..f011ec8ccbd7 100644
--- a/colossalai/auto_parallel/README.md
+++ b/colossalai/auto_parallel/README.md
@@ -16,8 +16,8 @@ A *symbolic profiler* for collecting computing and memory overhead related to st
### Solver
**Solver** is designed to find the optimal execution plan for a given computation graph and cluster in two stages:
-1) *Intra-op parallelism stage* is to find the plan with the minimum total execution time of all nodes with respect to the constraint of the memory budget. The optimaztion goal of intra-op parallelism solver is modified from Alpa 's intra-op parallelsim ILP solver.
-2) *Activation checkpoint stage* is to search for the fastest execution plan that meets the memory budget on the computation graph after inserting the communication nodes by the intra-op parallelism stage. The algorithm to find optimial activation checkpoint is modified from Rotor . The reason we use two-stage optimization is that if the two tasks are formulated together, the solving time will be significantly increased, which will greatly affect the user experience of the system. On the contrary, solving in two hierarchical levels has many advantages. Firstly, compared with the computation graph with activation checkpointing, the original graph has fewer nodes, which can reduce the solving cost of intra-op parallelism solver. In addition, a more optimal solution can be found by adding the communication overhead into the activation checkpoint modeling.
+1) *Intra-op parallelism stage* is to find the plan with the minimum total execution time of all nodes with respect to the constraint of the memory budget. The optimization goal of intra-op parallelism solver is modified from Alpa 's intra-op parallelism ILP solver.
+2) *Activation checkpoint stage* is to search for the fastest execution plan that meets the memory budget on the computation graph after inserting the communication nodes by the intra-op parallelism stage. The algorithm to find optimal activation checkpoint is modified from Rotor . The reason we use two-stage optimization is that if the two tasks are formulated together, the solving time will be significantly increased, which will greatly affect the user experience of the system. On the contrary, solving in two hierarchical levels has many advantages. Firstly, compared with the computation graph with activation checkpointing, the original graph has fewer nodes, which can reduce the solving cost of intra-op parallelism solver. In addition, a more optimal solution can be found by adding the communication overhead into the activation checkpoint modeling.
### Generator
**Generator** applies the searched execution plan to the computation graph and recompiles the computation graph to optimized PyTorch code. It has *a series compile pass* to insert a communication node or do the kernel substitution as the intra-op parallelism solver required. Additionally, we implement a *code generation* feature to recognize the annotation from the activation checkpoint solver and inject the activation checkpoint block following annotation instructions.
diff --git a/colossalai/auto_parallel/meta_profiler/meta_registry/linear.py b/colossalai/auto_parallel/meta_profiler/meta_registry/linear.py
index 7697fc6c383d..94dd9143e0ae 100644
--- a/colossalai/auto_parallel/meta_profiler/meta_registry/linear.py
+++ b/colossalai/auto_parallel/meta_profiler/meta_registry/linear.py
@@ -325,7 +325,7 @@ def matmul_meta_info(*args, **kwargs) -> Tuple[TrainCycleItem, TrainCycleItem, L
else:
_is_batch_dims_same = False
- # retireve dimensions
+ # retrieve dimensions
input_dim_00 = input_tensors[0].shape[-2]
input_dim_01 = input_tensors[0].shape[-1]
input_dim_10 = input_tensors[1].shape[-2]
diff --git a/colossalai/auto_parallel/passes/meta_info_prop.py b/colossalai/auto_parallel/passes/meta_info_prop.py
index bc0960483980..0673b767de7b 100644
--- a/colossalai/auto_parallel/passes/meta_info_prop.py
+++ b/colossalai/auto_parallel/passes/meta_info_prop.py
@@ -148,7 +148,7 @@ def node_handler(self, node: Node) -> None:
graph_info.fwd_tmp = buffer_tensors
graph_info.fwd_out = output_tensors
- # fetch other memory informations
+ # fetch other memory information
memory_cost = meta_info.memory_cost
graph_info.fwd_mem_tmp = memory_cost.fwd.temp
graph_info.fwd_mem_out = memory_cost.fwd.activation
diff --git a/colossalai/auto_parallel/passes/runtime_apply_pass.py b/colossalai/auto_parallel/passes/runtime_apply_pass.py
index a473bb6e973d..2049a06187d2 100644
--- a/colossalai/auto_parallel/passes/runtime_apply_pass.py
+++ b/colossalai/auto_parallel/passes/runtime_apply_pass.py
@@ -219,7 +219,7 @@ def _comm_spec_apply(gm: torch.fx.GraphModule):
return gm
-def _act_annotataion_pass(gm: torch.fx.GraphModule):
+def _act_annotation_pass(gm: torch.fx.GraphModule):
"""
This pass is used to add the act annotation to the new inserted nodes.
"""
diff --git a/colossalai/auto_parallel/passes/runtime_preparation_pass.py b/colossalai/auto_parallel/passes/runtime_preparation_pass.py
index 08af846b221d..9a2314826448 100644
--- a/colossalai/auto_parallel/passes/runtime_preparation_pass.py
+++ b/colossalai/auto_parallel/passes/runtime_preparation_pass.py
@@ -54,7 +54,7 @@ def size_processing(size: Union[int, torch.Size],
return size
-def solution_annotatation_pass(gm: torch.fx.GraphModule, solution: List[int],
+def solution_annotation_pass(gm: torch.fx.GraphModule, solution: List[int],
strategies_constructor: StrategiesConstructor):
"""
This method is used to stick the solution strategy to the nodes and add the information
@@ -169,7 +169,7 @@ def _post_processing(node, size_processing_node):
This function is used to process the dependency between the size node and its users after
inserting the size_process_node.
'''
- # store original node and processing node pair in node_pairs dictioanry
+ # store original node and processing node pair in node_pairs dictionary
# It will be used to replace the original node with processing node in slice object
node_pairs[node] = size_processing_node
size_processing_node._meta_data = node._meta_data
@@ -388,7 +388,7 @@ def module_params_sharding_pass(gm: torch.fx.GraphModule, device_mesh: DeviceMes
"""
mod_graph = gm.graph
nodes = tuple(mod_graph.nodes)
- # This stream is created for overlaping the communication and computation.
+ # This stream is created for overlapping the communication and computation.
reduction_stream = torch.cuda.Stream()
def _add_hook_for_grad_communication(node, param, name=None):
@@ -496,7 +496,7 @@ def runtime_preparation_pass(gm: torch.fx.GraphModule,
device_mesh: DeviceMesh,
strategies_constructor: StrategiesConstructor,
overlap=False):
- gm, sharding_spec_convert_dict, origin_node_sharding_spec_dict, comm_actions_dict = solution_annotatation_pass(
+ gm, sharding_spec_convert_dict, origin_node_sharding_spec_dict, comm_actions_dict = solution_annotation_pass(
gm, solution, strategies_constructor)
gm = size_value_converting_pass(gm, device_mesh)
gm = node_args_converting_pass(gm, device_mesh)
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py b/colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py
index e154105b672d..112ee194b4ec 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/embedding_handler.py
@@ -155,7 +155,7 @@ def post_process(self, strategy: ShardingStrategy) -> Union[ShardingStrategy, Li
Convert the sharding spec from the logical shape to the physical shape.
"""
# create multiple sharding strategies for the inputs
- # as input can be multi-dimensinal and the partition dim is only 2D,
+ # as input can be multi-dimensional and the partition dim is only 2D,
# we need to map the partition at logical dim 0 to one of the first few dimensions of the input and output
strategies = _convert_logical_sharding_to_physical_sharding_spec_for_embedding(strategy=strategy,
input_name=str(
@@ -221,7 +221,7 @@ def post_process(self, strategy: ShardingStrategy):
Convert the sharding spec from the logical shape to the physical shape.
"""
# create multiple sharding strategies for the inputs
- # as input can be multi-dimensinal and the partition dim is only 2D,
+ # as input can be multi-dimensional and the partition dim is only 2D,
# we need to map the partition at logical dim 0 to one of the first few dimensions of the input and output
strategies = _convert_logical_sharding_to_physical_sharding_spec_for_embedding(strategy=strategy,
input_name=str(
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py b/colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py
index 59091dab519f..ea541e434009 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/linear_handler.py
@@ -23,7 +23,7 @@ def _update_sharding_spec_for_transposed_weight_for_linear(strategy: ShardingStr
weight_name: str) -> ShardingStrategy:
"""
This function is a helper function used by both module node handler and function node handler. This function will
- convert the sharding spec for the transposed weight to the correct partititon spec.
+ convert the sharding spec for the transposed weight to the correct partition spec.
Args:
strategy (ShardingStrategy): the strategy generated by the strategy generator.
@@ -197,7 +197,7 @@ def post_process(self, strategy: ShardingStrategy) -> Union[ShardingStrategy, Li
strategy = _update_sharding_spec_for_transposed_weight_for_linear(strategy=strategy, weight_name='weight')
# create multiple sharding strategies for the inputs
- # as input can be multi-dimensinal and the partition dim is only 2D,
+ # as input can be multi-dimensional and the partition dim is only 2D,
# we need to map the partition at dim 0 to one of the first few dimensions of the input
strategies = _convert_logical_sharding_to_physical_sharding_spec_for_linear(strategy=strategy,
input_name=str(self.node.args[0]),
@@ -267,7 +267,7 @@ def post_process(self, strategy: ShardingStrategy):
strategy = _update_sharding_spec_for_transposed_weight_for_linear(strategy=strategy,
weight_name=str(self.node.args[1]))
# create multiple sharding strategies for the inputs
- # as input can be multi-dimensinal and the partition dim is only 2D,
+ # as input can be multi-dimensional and the partition dim is only 2D,
# we need to map the partition at dim 0 to one of the first few dimensions of the input
strategies = _convert_logical_sharding_to_physical_sharding_spec_for_linear(strategy=strategy,
input_name=str(self.node.args[0]),
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py b/colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py
index f3c9d0cbf826..fa51114a5c94 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/matmul_handler.py
@@ -48,8 +48,8 @@ def get_matmul_type(input_dim: int, other_dim: int):
Determine which type of matmul operation should be executed for the given tensor dimensions.
Args:
- input_dim (int): the number of dimensions for the input tenosr
- other_dim (int): the number of dimensions for the other tenosr
+ input_dim (int): the number of dimensions for the input tensor
+ other_dim (int): the number of dimensions for the other tensor
"""
if input_dim == 1 and other_dim == 1:
matmul_type = MatMulType.DOT
@@ -206,7 +206,7 @@ def _remove_sharding_on_broadcast_dim(key, strategy):
# e.g. [1, 2, 4] x [4, 4, 8] -> [4, 2, 8]
# the dim 0 of [1, 2, 4] is multiplied to 4
tensor_shape[dim_idx] = 1
- elif broadcast_type == BroadcastType.PADDDING:
+ elif broadcast_type == BroadcastType.PADDING:
# if the dim is padded
# we remove its sharding
tensor_shape[dim_idx] = None
@@ -268,13 +268,13 @@ def _update_sharding_spec(key, strategy, physical_batch_dim):
dim_partition_dict = sharding_spec.dim_partition_dict
entire_shape = sharding_spec.entire_shape
- # upddate the dimension index for the matrix dimensions
+ # update the dimension index for the matrix dimensions
if 2 in dim_partition_dict:
dim_partition_dict[len(self.batch_dims_before_view) + 1] = dim_partition_dict.pop(2)
if 1 in dim_partition_dict:
dim_partition_dict[len(self.batch_dims_before_view)] = dim_partition_dict.pop(1)
- # map the logical batch dim to phyiscal batch dim
+ # map the logical batch dim to physical batch dim
if 0 in dim_partition_dict:
batch_dim_shard = dim_partition_dict.pop(0)
dim_partition_dict[physical_batch_dim] = batch_dim_shard
@@ -414,7 +414,7 @@ def _get_logical_shape_for_dot(self):
def _get_logical_shape_for_mm(self):
"""
- We need to handle the input tensor for a matrix-matrix multiplcation as the input
+ We need to handle the input tensor for a matrix-matrix multiplication as the input
tensor can be a 1D or 2D tensor. If it is a 1D tensor, 1 will be prepended to its shape
(e.g. [4] -> [1, 4]).
"""
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py b/colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py
index ab391ebfaf80..4262d76173e4 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/node_handler.py
@@ -75,7 +75,7 @@ def update_resharding_cost(self, strategy: ShardingStrategy) -> None:
prev_strategy.get_sharding_spec_by_name(node_name) for prev_strategy in prev_strategy_vector
]
- # create data structrure to store costs
+ # create data structure to store costs
if node not in resharding_costs:
resharding_costs[node] = []
@@ -212,7 +212,7 @@ def register_strategy(self, compute_resharding_cost: bool = True) -> StrategiesV
return self.strategies_vector
def post_process(self, strategy: ShardingStrategy) -> Union[ShardingStrategy, List[ShardingStrategy]]:
- # tranform the strategy generated
+ # transform the strategy generated
# e.g. to process the sharding strategy for the transposed weights
return strategy
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py
index 1f3812429fc2..416dc9c29cad 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/batch_norm_generator.py
@@ -24,7 +24,7 @@ class BatchNormStrategyGenerator(StrategyGenerator):
To keep the math consistency, there are two way to do BatchNorm if the input
shards on batch dimension:
1. We gather the input partitions through batch dimension, then do the normal BatchNorm.
- 2. We do the SyncBatchNorm on the each input partition seperately, the SyncBN op will help
+ 2. We do the SyncBatchNorm on the each input partition separately, the SyncBN op will help
us to keep the computing correctness.
In this generator, both methods will be considered.
"""
@@ -44,7 +44,7 @@ def update_compute_cost(self, strategy: ShardingStrategy):
'''
Compute the computation cost per device with this specific strategy.
- Note: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ Note: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
'''
# TODO: a constant coefficient need to be added.
# 1D: (L) * N * Cin
@@ -212,7 +212,7 @@ def split_input_batch(self, mesh_dim_0):
# set communication action
# For SyncBN case, we don't need to do communication for weight and bias.
- # TODO: the communication happens interally at SyncBN operation. We need to replace the BN operation
+ # TODO: the communication happens internally at SyncBN operation. We need to replace the BN operation
# to SyncBN operation instead of inserting a communication node.
output_comm_action = self.get_communication_action(
sharding_spec=sharding_spec_mapping["output"],
@@ -250,7 +250,7 @@ def split_input_batch_1d(self, mesh_dim_0, mesh_dim_1):
# set communication action
# For SyncBN case, we don't need to do communication for gradients of weight and bias.
- # TODO: the communication happens interally at SyncBN operation. We need to replace the BN operation
+ # TODO: the communication happens internally at SyncBN operation. We need to replace the BN operation
# to SyncBN operation instead of inserting a communication node.
output_comm_action = self.get_communication_action(
sharding_spec=sharding_spec_mapping["output"],
@@ -298,7 +298,7 @@ def split_input_both_dim(self, mesh_dim_0, mesh_dim_1):
# set communication action
# For SyncBN case, we don't need to do communication for gradients of weight and bias.
- # TODO: the communication happens interally at SyncBN operation. We need to replace the BN operation
+ # TODO: the communication happens internally at SyncBN operation. We need to replace the BN operation
# to SyncBN operation instead of inserting a communication node.
output_comm_action = self.get_communication_action(
sharding_spec=sharding_spec_mapping["output"],
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py
index fd7f811c8972..d27cc046eaf3 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/binary_elementwise_generator.py
@@ -51,7 +51,7 @@ def update_memory_cost(self, strategy: ShardingStrategy) -> ShardingStrategy:
# compute fwd memory cost in bytes
# as the elementwise ops are not memory-intensive
- # we approximate the fwd memroy cost to be the output
+ # we approximate the fwd memory cost to be the output
# and the backward memory cost to be grad of input and other
input_bytes = self._compute_size_in_bytes(strategy, 'input')
other_bytes = self._compute_size_in_bytes(strategy, 'other')
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py
index c2154b3104d3..e605a68a326b 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/conv_strategy_generator.py
@@ -38,9 +38,9 @@ def update_compute_cost(self, strategy: ShardingStrategy):
'''
Compute the computation cost per device with this specific strategy.
- Note: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ Note: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
'''
- # TODO: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ # TODO: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
# 1D: (L) * N * Cout * Cin * kernel
# 2D: (H * W) * N * Cout * Cin * kernel
# 3D: (H * W * D) * N * Cout * Cin * kernel
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py
index fbb6070f7e82..65b173bbf65d 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/layer_norm_generator.py
@@ -34,9 +34,9 @@ def update_compute_cost(self, strategy: ShardingStrategy):
'''
Compute the computation cost per device with this specific strategy.
- Note: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ Note: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
'''
- # TODO: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ # TODO: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
# TODO: a constant coefficient need to be added.
sharded_input_shape = strategy.sharding_specs[self.op_data['input']].get_sharded_shape_per_device()
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py
index 9df6d2fbfa12..b7db42f8f67e 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/normal_pooling_generator.py
@@ -17,7 +17,7 @@ class NormalPoolStrategyGenerator(StrategyGenerator):
"""
NormalPoolStrategyGenerator is a generic class to generate strategies for pool operation like MaxPoolxd.
The reason we call this normal pool is AvgPoolxd and MaxPoolxd are taking the kernel size element from image,
- and reduce them depening on the operation type.
+ and reduce them depending on the operation type.
"""
def validate(self) -> bool:
@@ -35,9 +35,9 @@ def update_compute_cost(self, strategy: ShardingStrategy) -> TrainCycleItem:
'''
Compute the computation cost per device with this specific strategy.
- Note: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ Note: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
'''
- # TODO: compute_cost need to be devided by TFLOPS, now it just shows the computation size.
+ # TODO: compute_cost need to be divided by TFLOPS, now it just shows the computation size.
# 1D: (Lout) * N * C * kernel
# 2D: (H * W) * N * Cout * Cin * kernel
# 3D: (H * W * D) * N * Cout * Cin * kernel
diff --git a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py
index 6d68521aaea7..d42429745c61 100644
--- a/colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py
+++ b/colossalai/auto_parallel/tensor_shard/node_handler/strategy/strategy_generator.py
@@ -225,7 +225,7 @@ def _compute_size_in_bytes_helper(sharding_spec, meta_data):
if isinstance(meta_data, torch.Tensor):
element_bytes = _compute_size_in_bytes_helper(sharding_spec, meta_data)
else:
- # if meta_data is not a tensor, we count the memroy as 0
+ # if meta_data is not a tensor, we count the memory as 0
element_bytes = 0
total_bytes += element_bytes
@@ -233,7 +233,7 @@ def _compute_size_in_bytes_helper(sharding_spec, meta_data):
if isinstance(op_data.data, torch.Tensor):
total_bytes = _compute_size_in_bytes_helper(strategy.sharding_specs[op_data], op_data.data)
else:
- # if op_data.data is not a tensor, we count the memroy as 0
+ # if op_data.data is not a tensor, we count the memory as 0
total_bytes = 0
return total_bytes
diff --git a/colossalai/auto_parallel/tensor_shard/solver/cost_graph.py b/colossalai/auto_parallel/tensor_shard/solver/cost_graph.py
index 74290453ca0c..1b2d3ad57407 100644
--- a/colossalai/auto_parallel/tensor_shard/solver/cost_graph.py
+++ b/colossalai/auto_parallel/tensor_shard/solver/cost_graph.py
@@ -9,7 +9,7 @@ class CostGraph:
1. To feed the quadratic resharding costs into solver, we need to linearize it. We build edge_cost in
CostGraph, and it stored every combinations of strategies for a src-dst node pair in an 1D list.
2. To reduce the searching space, we merge computationally-trivial operators, such as
- element-wise operators, transpose, and reduction, into their following nodes. The merging infomation will
+ element-wise operators, transpose, and reduction, into their following nodes. The merging information will
be given by the StrategiesVector depending on the type of target node and following nodes.
Argument:
@@ -90,7 +90,7 @@ def _check_tensor_in_node(data):
if self.simplify and strategies_vector.check_merge():
for followed_node in strategies_vector.predecessor_nodes:
# we only merge node pairs which src node has a tensor element inside.
- # This is necessay because the node without a tensor element inside will not
+ # This is necessary because the node without a tensor element inside will not
# be assigned any strategy.
if _check_tensor_in_node(followed_node._meta_data):
self.merge_pair.append((followed_node, dst_node))
diff --git a/colossalai/auto_parallel/tensor_shard/solver/graph_analysis.py b/colossalai/auto_parallel/tensor_shard/solver/graph_analysis.py
index be39a74cb237..171aa8b3399f 100644
--- a/colossalai/auto_parallel/tensor_shard/solver/graph_analysis.py
+++ b/colossalai/auto_parallel/tensor_shard/solver/graph_analysis.py
@@ -83,7 +83,7 @@ def graph(self) -> Graph:
def liveness_analysis(self) -> List[LiveStage]:
"""
- Analyse the graph to obtain the variable liveness information. This function returns
+ Analyses the graph to obtain the variable liveness information. This function returns
an ordered dictionary where the key is the compute stage ID and the value is a LivenessStage object.
"""
compute_nodes = self.graph.nodes
@@ -91,7 +91,7 @@ def liveness_analysis(self) -> List[LiveStage]:
# checked: record all variables created since the first stage
# all: record the live variables only exist until the current stage.
- # this can be different from the `checked list`` as some varialbes may be destroyed prior to this stage.
+ # this can be different from the `checked list`` as some variables may be destroyed prior to this stage.
# unique: record the unique live variables only exist until the current stage.
# this is different from `all list` as some variables are duplicated.
checked_variables = LiveVariableVector()
@@ -103,7 +103,7 @@ def liveness_analysis(self) -> List[LiveStage]:
# find new living variables #
#############################
# detect whether the current op is an in-place op
- # if it is an in-place op, we would deem it as a duplciate var
+ # if it is an in-place op, we would deem it as a duplicate var
is_inplace = False
if node.op == 'call_function':
# check if this is an inplace op such as torch.nn.functional.relu(x, inplace=True)
diff --git a/colossalai/auto_parallel/tensor_shard/solver/solver.py b/colossalai/auto_parallel/tensor_shard/solver/solver.py
index f5c6663dce80..564c5f09220c 100644
--- a/colossalai/auto_parallel/tensor_shard/solver/solver.py
+++ b/colossalai/auto_parallel/tensor_shard/solver/solver.py
@@ -44,7 +44,7 @@ def __init__(self,
graph: The computing graph to be optimized.
strategies_constructor: It will provide all the possible strategies for each node in the computing graph.
cost_graph: A graph data structure to simplify the edge cost graph.
- graph_analyser: graph_analyser will analyse the graph to obtain the variable liveness information, which will be used to generate memory constraints.
+ graph_analyser: graph_analyser will analyses the graph to obtain the variable liveness information, which will be used to generate memory constraints.
memory_budget: Memory constraint for the solution.
solution_numbers: If solution_numbers is larger than one, solver will us a serious of solutions based on different memory budget.
memory_increasing_coefficient: If solution_numbers is larger than one, we will use this coefficient to generate new memory budget.
diff --git a/colossalai/auto_parallel/tensor_shard/utils/broadcast.py b/colossalai/auto_parallel/tensor_shard/utils/broadcast.py
index 28aa551328d7..307348ea1eaf 100644
--- a/colossalai/auto_parallel/tensor_shard/utils/broadcast.py
+++ b/colossalai/auto_parallel/tensor_shard/utils/broadcast.py
@@ -21,7 +21,7 @@
class BroadcastType(Enum):
EQUAL = auto()
- PADDDING = auto()
+ PADDING = auto()
MULTIPLE = auto()
@@ -69,18 +69,18 @@ def get_broadcast_dim_info(logical_shape, physical_shape):
for i in range(logical_num_dims):
# get the trailing dim size
logical_dim_idx = logical_num_dims - i - 1
- phyiscal_dim_idx = physical_num_dims - i - 1
+ physical_dim_idx = physical_num_dims - i - 1
logical_dim_size = logical_shape[logical_dim_idx]
- if phyiscal_dim_idx >= 0:
- physical_dim_size = physical_shape[phyiscal_dim_idx]
+ if physical_dim_idx >= 0:
+ physical_dim_size = physical_shape[physical_dim_idx]
if physical_dim_size == logical_dim_size:
logical_dim_broadcast_info[logical_dim_idx] = BroadcastType.EQUAL
elif physical_dim_size == 1 and physical_dim_size != logical_dim_size:
logical_dim_broadcast_info[logical_dim_idx] = BroadcastType.MULTIPLE
else:
- logical_dim_broadcast_info[logical_dim_idx] = BroadcastType.PADDDING
+ logical_dim_broadcast_info[logical_dim_idx] = BroadcastType.PADDING
return logical_dim_broadcast_info
@@ -117,7 +117,7 @@ def recover_sharding_spec_for_broadcast_shape(logical_sharding_spec: ShardingSpe
for shape_dim, mesh_dim in logical_dim_partition.items():
logical_broadcast_type = logical_dim_broadcast_info[shape_dim]
- if logical_broadcast_type == BroadcastType.PADDDING or logical_broadcast_type == BroadcastType.MULTIPLE:
+ if logical_broadcast_type == BroadcastType.PADDING or logical_broadcast_type == BroadcastType.MULTIPLE:
removed_dims.extend(mesh_dim)
else:
# get the corresponding physical dim
diff --git a/colossalai/auto_parallel/tensor_shard/utils/factory.py b/colossalai/auto_parallel/tensor_shard/utils/factory.py
index 05331e560001..347c10aa102d 100644
--- a/colossalai/auto_parallel/tensor_shard/utils/factory.py
+++ b/colossalai/auto_parallel/tensor_shard/utils/factory.py
@@ -30,7 +30,7 @@ def generate_sharding_spec(input_: Union[Node, torch.Tensor], device_mesh: Devic
"""
if isinstance(input_, Node):
- assert hasattr(input_, '_meta_data'), f'The given node has no attribte _meta_data'
+ assert hasattr(input_, '_meta_data'), f'The given node has no attribute _meta_data'
meta_tensor = input_._meta_data
assert meta_tensor is not None, "The given node's _meta_data attribute is None"
shape = meta_tensor.shape
diff --git a/colossalai/auto_parallel/tensor_shard/utils/reshape.py b/colossalai/auto_parallel/tensor_shard/utils/reshape.py
index a32a14bf7d57..d0ebbd7e8b1b 100644
--- a/colossalai/auto_parallel/tensor_shard/utils/reshape.py
+++ b/colossalai/auto_parallel/tensor_shard/utils/reshape.py
@@ -6,12 +6,12 @@
class PreviousStatus(Enum):
"""
- This class shows the status of previous comparision.
+ This class shows the status of previous comparison.
"""
RESET = 0
- # ORIGIN means the dimension size of original tensor is larger in the previous comparision.
+ # ORIGIN means the dimension size of original tensor is larger in the previous comparison.
ORIGIN = 1
- # TGT means the dimension size of target tensor is larger in the previous comparision.
+ # TGT means the dimension size of target tensor is larger in the previous comparison.
TGT = 2
@@ -91,7 +91,7 @@ def detect_reshape_mapping(origin_shape: torch.Size, tgt_shape: torch.Size) -> D
tgt_index += 1
if previous_label == PreviousStatus.TGT:
- # if the target dimension size is larger in the previous comparision, which means
+ # if the target dimension size is larger in the previous comparison, which means
# the origin dimension size has already accumulated larger than target dimension size, so
# we need to offload the origin dims and tgt dims into the reshape_mapping_dict.
reshape_mapping_dict[tuple(origin_dims)] = tuple(tgt_dims)
@@ -111,7 +111,7 @@ def detect_reshape_mapping(origin_shape: torch.Size, tgt_shape: torch.Size) -> D
origin_index += 1
if previous_label == PreviousStatus.ORIGIN:
- # if the origin element is larger in the previous comparision, which means
+ # if the origin element is larger in the previous comparison, which means
# the target element has already accumulated larger than origin element, so
# we need to offload the origin dims and tgt dims into the reshape_mapping_dict.
reshape_mapping_dict[tuple(origin_dims)] = tuple(tgt_dims)
@@ -139,7 +139,7 @@ def check_keep_sharding_status(input_dim_partition_dict: Dict[int, List[int]],
Rule:
For a sharded dimension of input tensor, if it is not the minimum element of the input tuple,
the function will return false.
- To illustrate this issue, there are two cases to analyse:
+ To illustrate this issue, there are two cases to analyze:
1. no sharded dims in the input tuple: we could do the reshape operation safely just as the normal
operation without distributed tensor.
2. sharded dims in the input tuple: the sharded dim must be the minimum element, then during shape
diff --git a/colossalai/autochunk/autochunk_codegen.py b/colossalai/autochunk/autochunk_codegen.py
index d0a467254d72..cc98c1570b4a 100644
--- a/colossalai/autochunk/autochunk_codegen.py
+++ b/colossalai/autochunk/autochunk_codegen.py
@@ -40,7 +40,7 @@ def _gen_chunk_slice_dim(chunk_dim: int, chunk_indice_name: str, shape: List) ->
return new_shape
-def _gen_loop_start(chunk_input: List[Node], chunk_output: List[Node], chunk_ouput_dim: int, chunk_size=2) -> str:
+def _gen_loop_start(chunk_input: List[Node], chunk_output: List[Node], chunk_output_dim: int, chunk_size=2) -> str:
"""
Generate chunk loop start
@@ -52,7 +52,7 @@ def _gen_loop_start(chunk_input: List[Node], chunk_output: List[Node], chunk_oup
Args:
chunk_input (List[Node]): chunk input node
chunk_output (Node): chunk output node
- chunk_ouput_dim (int): chunk output node chunk dim
+ chunk_output_dim (int): chunk output node chunk dim
chunk_size (int): chunk size. Defaults to 2.
Returns:
@@ -74,7 +74,7 @@ def _gen_loop_start(chunk_input: List[Node], chunk_output: List[Node], chunk_oup
input_node.name, input_node.name)
out_shape = get_node_shape(chunk_output[0])
- chunk_shape = out_shape[chunk_ouput_dim[0]]
+ chunk_shape = out_shape[chunk_output_dim[0]]
context += "chunk_size = %d\nfor chunk_idx in range(0, %d, chunk_size):\n" % (chunk_size, chunk_shape)
return context
diff --git a/colossalai/autochunk/trace_flow.py b/colossalai/autochunk/trace_flow.py
index db25267e9b42..a1080fda1541 100644
--- a/colossalai/autochunk/trace_flow.py
+++ b/colossalai/autochunk/trace_flow.py
@@ -64,7 +64,7 @@ def check_index_compute(self, start_idx, end_dim, end_node, end_idx):
return False
return True
- def _assgin_single_node_flow(
+ def _assign_single_node_flow(
self,
arg_node: Node,
start_idx: int,
@@ -177,7 +177,7 @@ def _get_all_node_info(self, end_dim, start_idx, end_idx):
if get_node_shape(arg) is None:
continue
arg_list.append(arg)
- flow_flag = self._assgin_single_node_flow(
+ flow_flag = self._assign_single_node_flow(
arg,
start_idx,
end_idx,
@@ -315,7 +315,7 @@ def _get_prepose_nodes(self, all_node_info: Dict, start_idx: int, end_idx: int,
chunk_info["args"]["prepose_nodes"] = prepose_nodes
def _get_non_chunk_inputs(self, chunk_info, start_idx, end_idx):
- # we need to log input nodes to avoid deleteing them in the loop
+ # we need to log input nodes to avoid deleting them in the loop
chunk_node_list = self.node_mgr.get_node_slice_by_idx(start_idx, end_idx + 1)
# also need to get some prepose node's arg out of non_chunk_inputs
for n in chunk_info["args"]["prepose_nodes"]:
@@ -366,8 +366,8 @@ def flow_search(self, start_idx, start_dim, end_idx, end_dim):
# find non chunk inputs
chunk_info = self._get_non_chunk_inputs(chunk_info, start_idx, end_idx)
- # reassgin reshape size, some size may have changed due to chunk
- chunk_info = self._reassgin_reshape_size(chunk_info)
+ # reassign reshape size, some size may have changed due to chunk
+ chunk_info = self._reassign_reshape_size(chunk_info)
return chunk_info
@@ -428,10 +428,10 @@ def _update_chunk_info(self, chunk_info: Dict, new_all_node_info: Dict, output:
chunk_info["outputs_dim"].append(output_dim)
return True
- def _reassgin_reshape_size(self, chunk_info):
+ def _reassign_reshape_size(self, chunk_info):
"""
Some shape args in reshape may have changed due to chunk
- reassgin those changed shape
+ reassign those changed shape
"""
chunk_region = chunk_info["region"]
reshape_size = {}
diff --git a/colossalai/autochunk/trace_indice.py b/colossalai/autochunk/trace_indice.py
index c7fce4c8bee1..fbe0741b8827 100644
--- a/colossalai/autochunk/trace_indice.py
+++ b/colossalai/autochunk/trace_indice.py
@@ -18,7 +18,7 @@ class TraceIndice(object):
dim(x1)=dim(x2)=dim(x3)=[a, b, c]
This class will record every node's dims' indice, compute and source.
- Attibutes:
+ Attributes:
node_list (List)
indice_trace_list (List): [{"indice": [...], "compute": [...], "source": [...]}, {...}]
indice_view_list (Dict): not used for now
@@ -397,7 +397,7 @@ def _assign_conv2d_indice(self, node: Node, node_idx: int) -> None:
input_node = node.args[0]
assert len(get_node_shape(input_node)) == 4
- # assgin index
+ # assign index
self._assign_indice_as_input(node, node_idx, input_node)
self._del_dim(node_idx, 1)
self._add_dim(node_idx, 1)
@@ -415,7 +415,7 @@ def _assign_interpolate_indice(self, node: Node, node_idx: int) -> None:
assert node.kwargs['size'] is None
assert len(get_node_shape(node)) == 4
- # assgin index
+ # assign index
self._assign_indice_as_input(node, node_idx)
self._mark_computation(node, node_idx, [-1, -2])
@@ -461,7 +461,7 @@ def _assign_elementwise_indice(self, node, idx):
nodes_in.append(node_in)
self._inherit_more_indice_from_node_with_exclude(node_in, node)
- def _assgin_no_change_indice(self, node, idx):
+ def _assign_no_change_indice(self, node, idx):
self._assign_indice_as_input(node, idx)
for node_in in node.args:
if type(node_in) == type(node):
@@ -792,7 +792,7 @@ def _assign_view_reshape_indice(self, node: Node, node_idx: int) -> None:
self._add_dim(node_idx, i)
dim_from.reverse()
- # inheirt indice from current node
+ # inherit indice from current node
if len(dim_from) != 0 and len(dim_to) != 0:
if dim_diff == 1:
if origin_shape[dim_from[0]] == 1:
@@ -852,7 +852,7 @@ def trace_indice(self) -> None:
elif "split" == node_name:
self._assign_split_indice(node, idx)
elif any(i == node_name for i in ["to", "contiguous", "clone", "type", "float"]):
- self._assgin_no_change_indice(node, idx)
+ self._assign_no_change_indice(node, idx)
elif "new_ones" == node_name:
self._assign_all_indice(node, idx)
elif "flatten" == node_name:
@@ -914,7 +914,7 @@ def trace_indice(self) -> None:
elif "conv2d" == node_name:
self._assign_conv2d_indice(node, idx)
elif "identity" == node_name:
- self._assgin_no_change_indice(node, idx)
+ self._assign_no_change_indice(node, idx)
elif any(n == node_name for n in ["sigmoid", "dropout", "relu", "silu", "gelu"]):
self._assign_elementwise_indice(node, idx)
else:
diff --git a/colossalai/booster/booster.py b/colossalai/booster/booster.py
index c14e602deaf5..61d912157449 100644
--- a/colossalai/booster/booster.py
+++ b/colossalai/booster/booster.py
@@ -23,27 +23,28 @@ class Booster:
training with different precision, accelerator, and plugin.
Examples:
- >>> colossalai.launch(...)
- >>> plugin = GeminiPlugin(stage=3, ...)
- >>> booster = Booster(precision='fp16', plugin=plugin)
- >>>
- >>> model = GPT2()
- >>> optimizer = Adam(model.parameters())
- >>> dataloader = Dataloader(Dataset)
- >>> lr_scheduler = LinearWarmupScheduler()
- >>> criterion = GPTLMLoss()
- >>>
- >>> model, optimizer, lr_scheduler, dataloader = booster.boost(model, optimizer, lr_scheduler, dataloader)
- >>>
- >>> for epoch in range(max_epochs):
- >>> for input_ids, attention_mask in dataloader:
- >>> outputs = model(input_ids, attention_mask)
- >>> loss = criterion(outputs.logits, input_ids)
- >>> booster.backward(loss, optimizer)
- >>> optimizer.step()
- >>> lr_scheduler.step()
- >>> optimizer.zero_grad()
-
+ ```python
+ colossalai.launch(...)
+ plugin = GeminiPlugin(stage=3, ...)
+ booster = Booster(precision='fp16', plugin=plugin)
+
+ model = GPT2()
+ optimizer = Adam(model.parameters())
+ dataloader = Dataloader(Dataset)
+ lr_scheduler = LinearWarmupScheduler()
+ criterion = GPTLMLoss()
+
+ model, optimizer, lr_scheduler, dataloader = booster.boost(model, optimizer, lr_scheduler, dataloader)
+
+ for epoch in range(max_epochs):
+ for input_ids, attention_mask in dataloader:
+ outputs = model(input_ids, attention_mask)
+ loss = criterion(outputs.logits, input_ids)
+ booster.backward(loss, optimizer)
+ optimizer.step()
+ lr_scheduler.step()
+ optimizer.zero_grad()
+ ```
Args:
device (str or torch.device): The device to run the training. Default: 'cuda'.
@@ -130,6 +131,12 @@ def boost(
return model, optimizer, criterion, dataloader, lr_scheduler
def backward(self, loss: torch.Tensor, optimizer: Optimizer) -> None:
+ """Backward pass.
+
+ Args:
+ loss (torch.Tensor): The loss to be backpropagated.
+ optimizer (Optimizer): The optimizer to be updated.
+ """
# TODO: implement this method with plugin
optimizer.backward(loss)
@@ -146,11 +153,29 @@ def execute_pipeline(self,
pass
def no_sync(self, model: nn.Module) -> contextmanager:
+ """Context manager to disable gradient synchronization across DP process groups.
+
+ Args:
+ model (nn.Module): The model to be disabled gradient synchronization.
+
+ Returns:
+ contextmanager: Context to disable gradient synchronization.
+ """
assert self.plugin is not None, f'no_sync is only enabled when a plugin is provided and the plugin supports no_sync.'
assert self.plugin.support_no_sync, f'The plugin {self.plugin.__class__.__name__} does not support no_sync.'
return self.plugin.no_sync(model)
def load_model(self, model: nn.Module, checkpoint: str, strict: bool = True):
+ """Load model from checkpoint.
+
+ Args:
+ model (nn.Module): A model boosted by Booster.
+ checkpoint (str): Path to the checkpoint. It must be a local path.
+ It should be a directory path if the checkpoint is sharded. Otherwise, it should be a file path.
+ strict (bool, optional): whether to strictly enforce that the keys
+ in :attr:`state_dict` match the keys returned by this module's
+ :meth:`~torch.nn.Module.state_dict` function. Defaults to True.
+ """
self.checkpoint_io.load_model(model, checkpoint, strict)
def save_model(self,
@@ -159,16 +184,58 @@ def save_model(self,
prefix: str = None,
shard: bool = False,
size_per_shard: int = 1024):
- self.checkpoint_io.save_model(model, checkpoint, prefix, shard, size_per_shard)
+ """Save model to checkpoint.
+
+ Args:
+ model (nn.Module): A model boosted by Booster.
+ checkpoint (str): Path to the checkpoint. It must be a local path.
+ It is a file path if ``shard=False``. Otherwise, it is a directory path.
+ prefix (str, optional): A prefix added to parameter and buffer
+ names to compose the keys in state_dict. Defaults to None.
+ shard (bool, optional): Whether to save checkpoint a sharded way.
+ If true, the checkpoint will be a folder. Otherwise, it will be a single file. Defaults to False.
+ size_per_shard (int, optional): Maximum size of checkpoint shard file in MB. This is useful only when ``shard=True``. Defaults to 1024.
+ """
+ self.checkpoint_io.save_model(model, checkpoint=checkpoint, shard=shard, size_per_shard=size_per_shard)
def load_optimizer(self, optimizer: Optimizer, checkpoint: str):
+ """Load optimizer from checkpoint.
+
+ Args:
+ optimizer (Optimizer): An optimizer boosted by Booster.
+ checkpoint (str): Path to the checkpoint. It must be a local path.
+ It should be a directory path if the checkpoint is sharded. Otherwise, it should be a file path.
+ """
self.checkpoint_io.load_optimizer(optimizer, checkpoint)
def save_optimizer(self, optimizer: Optimizer, checkpoint: str, shard: bool = False, size_per_shard: int = 1024):
+ """Save optimizer to checkpoint.
+ Warning: Saving sharded optimizer checkpoint is not supported yet.
+
+ Args:
+ optimizer (Optimizer): An optimizer boosted by Booster.
+ checkpoint (str): Path to the checkpoint. It must be a local path.
+ It is a file path if ``shard=False``. Otherwise, it is a directory path.
+ shard (bool, optional): Whether to save checkpoint a sharded way.
+ If true, the checkpoint will be a folder. Otherwise, it will be a single file. Defaults to False.
+ size_per_shard (int, optional): Maximum size of checkpoint shard file in MB. This is useful only when ``shard=True``. Defaults to 1024.
+ """
self.checkpoint_io.save_optimizer(optimizer, checkpoint, shard, size_per_shard)
def save_lr_scheduler(self, lr_scheduler: LRScheduler, checkpoint: str):
+ """Save lr scheduler to checkpoint.
+
+ Args:
+ lr_scheduler (LRScheduler): A lr scheduler boosted by Booster.
+ checkpoint (str): Path to the checkpoint. It must be a local file path.
+ """
self.checkpoint_io.save_lr_scheduler(lr_scheduler, checkpoint)
def load_lr_scheduler(self, lr_scheduler: LRScheduler, checkpoint: str):
+ """Load lr scheduler from checkpoint.
+
+ Args:
+ lr_scheduler (LRScheduler): A lr scheduler boosted by Booster.
+ checkpoint (str): Path to the checkpoint. It must be a local file path.
+ """
self.checkpoint_io.load_lr_scheduler(lr_scheduler, checkpoint)
diff --git a/colossalai/booster/mixed_precision/__init__.py b/colossalai/booster/mixed_precision/__init__.py
index 3cf0ad28cdbe..0df9d84159f9 100644
--- a/colossalai/booster/mixed_precision/__init__.py
+++ b/colossalai/booster/mixed_precision/__init__.py
@@ -1,17 +1,19 @@
from .bf16 import BF16MixedPrecision
from .fp8 import FP8MixedPrecision
from .fp16_apex import FP16ApexMixedPrecision
+from .fp16_naive import FP16NaiveMixedPrecision
from .fp16_torch import FP16TorchMixedPrecision
from .mixed_precision_base import MixedPrecision
__all__ = [
'MixedPrecision', 'mixed_precision_factory', 'FP16_Apex_MixedPrecision', 'FP16_Torch_MixedPrecision',
- 'FP32_MixedPrecision', 'BF16_MixedPrecision', 'FP8_MixedPrecision'
+ 'FP32_MixedPrecision', 'BF16_MixedPrecision', 'FP8_MixedPrecision', 'FP16NaiveMixedPrecision'
]
_mixed_precision_mapping = {
'fp16': FP16TorchMixedPrecision,
'fp16_apex': FP16ApexMixedPrecision,
+ 'fp16_naive': FP16NaiveMixedPrecision,
'bf16': BF16MixedPrecision,
'fp8': FP8MixedPrecision
}
diff --git a/colossalai/booster/mixed_precision/fp16_apex.py b/colossalai/booster/mixed_precision/fp16_apex.py
index 266a750734b1..e184271e932a 100644
--- a/colossalai/booster/mixed_precision/fp16_apex.py
+++ b/colossalai/booster/mixed_precision/fp16_apex.py
@@ -1,5 +1,38 @@
+from typing import Any, Optional, Union
+
+import torch
+
from .mixed_precision_base import MixedPrecision
class FP16ApexMixedPrecision(MixedPrecision):
- pass
+ """
+ Precision for mixed precision training in FP16 using apex AMP.
+
+ Args:
+ opt_level(str, optional, default="O1" ): Pure or mixed precision optimization level. Accepted values are “O0”, “O1”, “O2”, and “O3”, explained in detail above Apex AMP Documentation.
+ cast_model_type (torch.dtype, optional, default=None): Casts your model’s parameters and buffers to the desired type.
+ patch_torch_functions (bool, optional, default=None): Patch all Torch functions and Tensor methods to perform Tensor Core-friendly ops like GEMMs and convolutions in FP16, and any ops that benefit from FP32 precision in FP32.
+ keep_batchnorm_fp32 (bool or str, optional, default=None): To enhance precision and enable cudnn batchnorm (which improves performance), it’s often beneficial to keep batchnorm weights in FP32 even if the rest of the model is FP16.
+ master_weights (bool, optional, default=None): Maintain FP32 master weights to accompany any FP16 model weights. FP32 master weights are stepped by the optimizer to enhance precision and capture small gradients.
+ loss_scale (float or str, optional, default=None): If loss_scale is a float value, use this value as the static (fixed) loss scale. If loss_scale is the string "dynamic", adaptively adjust the loss scale over time. Dynamic loss scale adjustments are performed by Amp automatically.
+ cast_model_outputs (torch.dpython:type, optional, default=None): Option to ensure that the outputs of your model(s) are always cast to a particular type regardless of opt_level.
+ num_losses(int, optional, default=1): Option to tell AMP in advance how many losses/backward passes you plan to use. When used in conjunction with the loss_id argument to `amp.scale_loss`, enables Amp to use a different loss scale per loss/backward pass, which can improve stability. If num_losses is left to 1, Amp will still support multiple losses/backward passes, but use a single global loss scale for all of them.
+ verbosity(int, default=1): Set to 0 to suppress Amp-related output.
+ min_loss_scale(float, default=None): Sets a floor for the loss scale values that can be chosen by dynamic loss scaling. The default value of None means that no floor is imposed. If dynamic loss scaling is not used, min_loss_scale is ignored.
+ max_loss_scale(float, default=2.**24 ): Sets a ceiling for the loss scale values that can be chosen by dynamic loss scaling. If dynamic loss scaling is not used, max_loss_scale is ignored.
+ """
+
+ def __init__(self,
+ opt_level: Optional[str] = "O1",
+ cast_model_type: torch.dtype = None,
+ patch_torch_functions: bool = None,
+ keep_batchnorm_fp32: Union[bool, str] = None,
+ master_weights: bool = None,
+ loss_scale: Union[float, str] = None,
+ cast_model_outputs: Any = None,
+ num_losses: Optional[int] = 1,
+ verbosity: int = 1,
+ min_loss_scale: float = None,
+ max_loss_scale: float = 2.**24) -> None:
+ pass
diff --git a/colossalai/booster/mixed_precision/fp16_naive.py b/colossalai/booster/mixed_precision/fp16_naive.py
new file mode 100644
index 000000000000..5d0d815257f3
--- /dev/null
+++ b/colossalai/booster/mixed_precision/fp16_naive.py
@@ -0,0 +1,26 @@
+from .mixed_precision_base import MixedPrecision
+
+
+class FP16NaiveMixedPrecision(MixedPrecision):
+ """
+ Precision for mixed precision training in FP16 using naive AMP.
+
+ Args:
+ log_num_zeros_in_grad(bool): return number of zeros in the gradients.
+ initial_scale(int): initial scale of gradient scaler.
+ growth_factor(int): the growth rate of loss scale.
+ backoff_factor(float): the decrease rate of loss scale.
+ hysteresis(int): delay shift in dynamic loss scaling.
+ max_scale(int): maximum loss scale allowed.
+ verbose(bool): if set to `True`, will print debug info.
+ """
+
+ def __init__(self,
+ log_num_zeros_in_grad: bool,
+ initial_scale: int,
+ growth_factor: int,
+ backoff_factor: float,
+ hysteresis: int,
+ max_scale: int,
+ verbose: bool = None) -> None:
+ pass
diff --git a/colossalai/booster/plugin/__init__.py b/colossalai/booster/plugin/__init__.py
index aa45bcb59ad7..a3b87b5f11d3 100644
--- a/colossalai/booster/plugin/__init__.py
+++ b/colossalai/booster/plugin/__init__.py
@@ -4,3 +4,10 @@
from .torch_ddp_plugin import TorchDDPPlugin
__all__ = ['Plugin', 'TorchDDPPlugin', 'GeminiPlugin', 'LowLevelZeroPlugin']
+
+import torch
+from packaging import version
+
+if version.parse(torch.__version__) >= version.parse('1.12.0'):
+ from .torch_fsdp_plugin import TorchFSDPPlugin
+ __all__.append('TorchFSDPPlugin')
diff --git a/colossalai/booster/plugin/gemini_plugin.py b/colossalai/booster/plugin/gemini_plugin.py
index a3789a39d94b..adbf4803eefe 100644
--- a/colossalai/booster/plugin/gemini_plugin.py
+++ b/colossalai/booster/plugin/gemini_plugin.py
@@ -52,8 +52,16 @@ def save_unsharded_optimizer(self, optimizer: Optimizer, checkpoint: str, gather
Save optimizer to checkpoint but only on master process.
"""
# TODO(ver217): optimizer state dict is sharded
+ warnings.warn('GeminiPlugin does not support save full optimizer checkpoint now. Save it on every process.')
+ checkpoint = f'{checkpoint}.rank{self.coordinator.rank}'
super().save_unsharded_optimizer(optimizer, checkpoint, gather_dtensor)
+ def load_optimizer(self, optimizer: Optimizer, checkpoint: str):
+ warnings.warn(
+ 'GeminiPlugin can only load optimizer checkpoint saved by itself with the same number of processes.')
+ checkpoint = f'{checkpoint}.rank{self.coordinator.rank}'
+ super().load_optimizer(optimizer, checkpoint)
+
def save_lr_scheduler(self, lr_scheduler: LRScheduler, checkpoint: str):
"""
Save model to checkpoint but only on master process.
@@ -171,7 +179,7 @@ class GeminiPlugin(DPPluginBase):
Users can provide this argument to speed up searching.
If users do not know this argument before training, it is ok. We will use a default value 1024.
min_chunk_size_mb (float, optional): the minimum chunk size in MegaByte.
- If the aggregate size of parameters is still samller than the minimum chunk size,
+ If the aggregate size of parameters is still smaller than the minimum chunk size,
all parameters will be compacted into one small chunk.
memstats (MemStats, optional) the memory statistics collector by a runtime memory tracer.
gpu_margin_mem_ratio (float, optional): The ratio of GPU remaining memory (after the first forward-backward)
diff --git a/colossalai/booster/plugin/low_level_zero_plugin.py b/colossalai/booster/plugin/low_level_zero_plugin.py
index edc0b7679686..5d93cf0e33be 100644
--- a/colossalai/booster/plugin/low_level_zero_plugin.py
+++ b/colossalai/booster/plugin/low_level_zero_plugin.py
@@ -9,7 +9,7 @@
from torch.utils._pytree import tree_map
from torch.utils.data import DataLoader
-from colossalai.checkpoint_io import CheckpointIO
+from colossalai.checkpoint_io import CheckpointIO, GeneralCheckpointIO
from colossalai.interface import ModelWrapper, OptimizerWrapper
from colossalai.utils import get_current_device
from colossalai.zero import zero_model_wrapper, zero_optim_wrapper
@@ -32,8 +32,17 @@ def save_unsharded_optimizer(self, optimizer: Optimizer, checkpoint: str, gather
"""
Save optimizer to checkpoint but only on master process.
"""
- # TODO(ver217): optimizer state dict is sharded
- super().save_unsharded_optimizer(optimizer, checkpoint, gather_dtensor)
+ # TODO(ver217): optimizer state dict is sharded, and cannot get full state dict now
+ warnings.warn(
+ 'LowLevelZeroPlugin does not support save full optimizer checkpoint now. Save it on every process.')
+ checkpoint = f'{checkpoint}.rank{self.coordinator.rank}'
+ GeneralCheckpointIO.save_unsharded_optimizer(self, optimizer, checkpoint, gather_dtensor)
+
+ def load_optimizer(self, optimizer: Optimizer, checkpoint: str):
+ warnings.warn(
+ 'LowLevelZeroPlugin can only load optimizer checkpoint saved by itself with the same number of processes.')
+ checkpoint = f'{checkpoint}.rank{self.coordinator.rank}'
+ super().load_optimizer(optimizer, checkpoint)
class LowLevelZeroModel(ModelWrapper):
diff --git a/colossalai/booster/plugin/torch_ddp_plugin.py b/colossalai/booster/plugin/torch_ddp_plugin.py
index 99cd2f7791d3..b317ccf48ad9 100644
--- a/colossalai/booster/plugin/torch_ddp_plugin.py
+++ b/colossalai/booster/plugin/torch_ddp_plugin.py
@@ -1,4 +1,4 @@
-from typing import Callable, Iterator, List, Tuple, Union
+from typing import Callable, Iterator, List, Optional, Tuple, Union
import torch.nn as nn
from torch.nn.parallel import DistributedDataParallel as DDP
@@ -50,6 +50,16 @@ def save_lr_scheduler(self, lr_scheduler: LRScheduler, checkpoint: str):
if self.coordinator.is_master():
super().save_lr_scheduler(lr_scheduler, checkpoint)
+ def save_sharded_model(self,
+ model: nn.Module,
+ checkpoint_path: str,
+ gather_dtensor: bool = False,
+ variant: Optional[str] = None,
+ max_shard_size: int = 1024,
+ use_safetensors: bool = False):
+ if self.coordinator.is_master():
+ super().save_sharded_model(model, checkpoint_path, gather_dtensor, variant, max_shard_size, use_safetensors)
+
class TorchDDPModel(ModelWrapper):
diff --git a/colossalai/booster/plugin/torch_fsdp_plugin.py b/colossalai/booster/plugin/torch_fsdp_plugin.py
new file mode 100644
index 000000000000..8d534ea4c061
--- /dev/null
+++ b/colossalai/booster/plugin/torch_fsdp_plugin.py
@@ -0,0 +1,221 @@
+from pathlib import Path
+from typing import Callable, Iterable, Iterator, List, Optional, Tuple, Union
+
+import torch
+import torch.nn as nn
+import warnings
+from packaging import version
+from torch.distributed import ProcessGroup
+
+if version.parse(torch.__version__) >= version.parse('1.12.0'):
+ from torch.distributed.fsdp import FullStateDictConfig
+ from torch.distributed.fsdp import FullyShardedDataParallel as FSDP
+ from torch.distributed.fsdp import StateDictType
+ from torch.distributed.fsdp.fully_sharded_data_parallel import (
+ BackwardPrefetch,
+ CPUOffload,
+ FullStateDictConfig,
+ MixedPrecision,
+ ShardingStrategy,
+ )
+else:
+ raise RuntimeError("FSDP is not supported while torch version under 1.12.0.")
+
+from torch.optim import Optimizer
+from torch.optim.lr_scheduler import _LRScheduler as LRScheduler
+from torch.utils.data import DataLoader
+
+from colossalai.checkpoint_io import CheckpointIO, GeneralCheckpointIO, utils
+from colossalai.cluster import DistCoordinator
+from colossalai.interface import ModelWrapper, OptimizerWrapper
+
+from .dp_plugin_base import DPPluginBase
+
+__all__ = ['TorchFSDPPlugin']
+
+
+class TorchFSDPCheckpointIO(GeneralCheckpointIO):
+
+ def __init__(self) -> None:
+ super().__init__()
+ self.coordinator = DistCoordinator()
+
+ def load_unsharded_model(self, model: nn.Module, checkpoint: str, strict: bool):
+ checkpoint = utils.load_state_dict(checkpoint)
+ model.load_state_dict(checkpoint)
+
+ def load_unsharded_optimizer(self, optimizer: Optimizer, checkpoint: Path):
+ checkpoint = utils.load_state_dict(checkpoint)
+ fsdp_model = optimizer.unwrap_model()
+ sharded_osd = FSDP.scatter_full_optim_state_dict(checkpoint, fsdp_model)
+ optimizer.load_state_dict(sharded_osd)
+
+ def save_unsharded_model(self, model: nn.Module, checkpoint: str, gather_dtensor: bool, use_safetensors: bool):
+ """
+ Save model to checkpoint but only on master process.
+ """
+ # the model should be unwrapped in self.load_model via ModelWrapper.unwrap
+ cfg = FullStateDictConfig(offload_to_cpu=True, rank0_only=True)
+ with FSDP.state_dict_type(model, StateDictType.FULL_STATE_DICT, cfg):
+ full_model_state = model.state_dict()
+ utils.save_state_dict(full_model_state, checkpoint_file_path=checkpoint, use_safetensors=use_safetensors)
+
+ def save_unsharded_optimizer(self, optimizer: Optimizer, checkpoint: str, gather_dtensor: bool):
+ """
+ Save optimizer to checkpoint but only on master process.
+ """
+ assert isinstance(optimizer, FSDPOptimizerWrapper)
+ fsdp_model = optimizer.unwrap_model()
+ full_optimizer_state = FSDP.full_optim_state_dict(fsdp_model, optim=optimizer, rank0_only=True)
+ utils.save_state_dict(full_optimizer_state, checkpoint_file_path=checkpoint, use_safetensors=False)
+
+ def save_sharded_model(self, model: nn.Module, checkpoint: str, gather_dtensor: bool, variant: Optional[str],
+ size_per_shard: int, use_safetensors: bool):
+ """
+ Save model to checkpoint but only on master process.
+ """
+ raise NotImplementedError("Sharded model checkpoint is not supported yet.")
+
+ def load_sharded_model(self,
+ model: nn.Module,
+ checkpoint_index_file: Path,
+ strict: bool = False,
+ use_safetensors: bool = False,
+ load_sub_module: bool = True):
+ """
+ Load model to checkpoint but only on master process.
+ """
+ raise NotImplementedError("Sharded model checkpoint is not supported yet.")
+
+ def save_sharded_optimizer(self, optimizer: Optimizer, checkpoint: str, gather_dtensor: bool):
+ """
+ Save optimizer to checkpoint but only on master process.
+ """
+ raise NotImplementedError("Sharded optimizer checkpoint is not supported yet.")
+
+ def load_sharded_optimizer(self, optimizer: Optimizer, index_file_path: str, prefix: str, size_per_shard: int):
+ """
+ Load optimizer to checkpoint but only on master process.
+ """
+ raise NotImplementedError("Sharded optimizer checkpoint is not supported yet.")
+
+ def save_lr_scheduler(self, lr_scheduler: LRScheduler, checkpoint: str):
+ """
+ Save model to checkpoint but only on master process.
+ """
+ if self.coordinator.is_master():
+ super().save_lr_scheduler(lr_scheduler, checkpoint)
+
+
+class TorchFSDPModel(ModelWrapper):
+
+ def __init__(self, module: nn.Module, *args, **kwargs) -> None:
+ super().__init__(module)
+ self.module = FSDP(module, *args, **kwargs)
+
+ def unwrap(self):
+ return self.module
+
+
+class FSDPOptimizerWrapper(OptimizerWrapper):
+
+ def __init__(self, optimizer: Optimizer, model: nn.Module):
+ self.model = model
+ super().__init__(optimizer)
+
+ def unwrap_model(self) -> nn.Module:
+ return self.model
+
+
+class TorchFSDPPlugin(DPPluginBase):
+ """
+ Plugin for PyTorch FSDP.
+
+ Example:
+ >>> from colossalai.booster import Booster
+ >>> from colossalai.booster.plugin import TorchFSDPPlugin
+ >>>
+ >>> model, train_dataset, optimizer, criterion = ...
+ >>> plugin = TorchFSDPPlugin()
+
+ >>> train_dataloader = plugin.prepare_train_dataloader(train_dataset, batch_size=8)
+ >>> booster = Booster(plugin=plugin)
+ >>> model, optimizer, train_dataloader, criterion = booster.boost(model, optimizer, train_dataloader, criterion)
+
+ Args:
+ See https://pytorch.org/docs/stable/fsdp.html for details.
+ """
+
+ if version.parse(torch.__version__) >= version.parse('1.12.0'):
+
+ def __init__(
+ self,
+ process_group: Optional[ProcessGroup] = None,
+ sharding_strategy: Optional[ShardingStrategy] = None,
+ cpu_offload: Optional[CPUOffload] = None,
+ auto_wrap_policy: Optional[Callable] = None,
+ backward_prefetch: Optional[BackwardPrefetch] = None,
+ mixed_precision: Optional[MixedPrecision] = None,
+ ignored_modules: Optional[Iterable[torch.nn.Module]] = None,
+ param_init_fn: Optional[Callable[[nn.Module], None]] = None,
+ sync_module_states: bool = False,
+ ):
+ super().__init__()
+ self.fsdp_kwargs = dict(process_group=process_group,
+ sharding_strategy=sharding_strategy,
+ cpu_offload=cpu_offload,
+ auto_wrap_policy=auto_wrap_policy,
+ backward_prefetch=backward_prefetch,
+ mixed_precision=mixed_precision,
+ ignored_modules=ignored_modules,
+ param_init_fn=param_init_fn,
+ sync_module_states=sync_module_states)
+ else:
+ raise RuntimeError("FSDP is not supported while torch version under 1.12.0.")
+
+ def support_no_sync(self) -> bool:
+ False
+
+ def no_sync(self, model: nn.Module) -> Iterator[None]:
+ raise NotImplementedError("Torch fsdp no_sync func not supported yet.")
+
+ def control_precision(self) -> bool:
+ return True
+
+ def supported_precisions(self) -> List[str]:
+ return ['fp16', 'bf16']
+
+ def control_device(self) -> bool:
+ return True
+
+ def supported_devices(self) -> List[str]:
+ return ['cuda']
+
+ def configure(
+ self,
+ model: nn.Module,
+ optimizer: Optimizer,
+ criterion: Callable = None,
+ dataloader: DataLoader = None,
+ lr_scheduler: LRScheduler = None,
+ ) -> Tuple[Union[nn.Module, OptimizerWrapper, LRScheduler, DataLoader]]:
+
+ # wrap the model with PyTorch FSDP
+ fsdp_model = TorchFSDPModel(model, device_id=torch.cuda.current_device(), **self.fsdp_kwargs)
+
+ if len(optimizer.param_groups) > 1:
+ warnings.warn(
+ 'TorchFSDPPlugin does not support optimizer that use multi param groups. The results may not be as expected if used.'
+ )
+ optimizer.__init__(fsdp_model.parameters(), **optimizer.defaults)
+
+ if not isinstance(optimizer, FSDPOptimizerWrapper):
+ optimizer = FSDPOptimizerWrapper(optimizer, fsdp_model)
+
+ return fsdp_model, optimizer, criterion, dataloader, lr_scheduler
+
+ def control_checkpoint_io(self) -> bool:
+ return True
+
+ def get_checkpoint_io(self) -> CheckpointIO:
+ return TorchFSDPCheckpointIO()
diff --git a/colossalai/checkpoint_io/checkpoint_io_base.py b/colossalai/checkpoint_io/checkpoint_io_base.py
index 9cf344ecc41b..fbc8fc5429ad 100644
--- a/colossalai/checkpoint_io/checkpoint_io_base.py
+++ b/colossalai/checkpoint_io/checkpoint_io_base.py
@@ -1,7 +1,6 @@
from abc import ABC, abstractmethod
from pathlib import Path
-from typing import Union
-from typing import Optional
+from typing import Optional, Union
import torch
import torch.nn as nn
@@ -84,9 +83,8 @@ def load_model(self,
# containing no distributed tensors, dtensor -> full tensor conversion
# should be done offline via our CLI
# the existence of index file means it is a sharded checkpoint
- ckpt_path = Path(checkpoint)
index_file_exists, index_file_path = has_index_file(checkpoint)
-
+
# return the origin model instead of the unwrapped model
origin_model = model
diff --git a/colossalai/checkpoint_io/general_checkpoint_io.py b/colossalai/checkpoint_io/general_checkpoint_io.py
index 96a883fdb42a..2cc9c3faa12b 100644
--- a/colossalai/checkpoint_io/general_checkpoint_io.py
+++ b/colossalai/checkpoint_io/general_checkpoint_io.py
@@ -1,26 +1,26 @@
-from pathlib import Path
+import gc
+import logging
+import os
from functools import reduce
+from pathlib import Path
+from typing import Iterator, Optional, OrderedDict, Tuple
import torch.nn as nn
from torch.optim import Optimizer
-import logging
-import os
-import gc
-from typing import Optional, Iterator, OrderedDict, Tuple
from .checkpoint_io_base import CheckpointIO
from .index_file import CheckpointIndexFile
from .utils import (
- has_index_file,
- load_state_dict,
- save_state_dict,
+ get_base_filenames,
+ get_shard_filename,
+ has_index_file,
is_safetensors_available,
- shard_checkpoint,
load_shard_state_dict,
+ load_state_dict,
load_state_dict_into_model,
- get_shard_filename,
- get_base_filenames
- )
+ save_state_dict,
+ shard_checkpoint,
+)
__all__ = ['GeneralCheckpointIO']
@@ -29,6 +29,7 @@ class GeneralCheckpointIO(CheckpointIO):
"""
Checkpoint IO
"""
+
def load_unsharded_model(self, model: nn.Module, checkpoint: str, strict: bool):
checkpoint = load_state_dict(checkpoint)
model.load_state_dict(checkpoint, strict=strict)
@@ -69,19 +70,23 @@ def save_unsharded_optimizer(
# TODO(FrankLeeeee): handle distributed tensors
save_state_dict(optimizer.state_dict(), checkpoint, use_safetensors=False)
-
- def save_sharded_model(self, model: nn.Module, checkpoint_path: str, gather_dtensor:bool = False,
- variant: Optional[str] = None, max_shard_size: int = 1024, use_safetensors: bool = False):
- """
+ def save_sharded_model(self,
+ model: nn.Module,
+ checkpoint_path: str,
+ gather_dtensor: bool = False,
+ variant: Optional[str] = None,
+ max_shard_size: int = 1024,
+ use_safetensors: bool = False):
+ """
implement this method as it can be supported by Huggingface model,
save shard model, save model to multiple files
"""
if os.path.isfile(checkpoint_path):
logging.error(f"Provided path ({checkpoint_path}) should be a directory, not a file")
return
-
+
Path(checkpoint_path).mkdir(parents=True, exist_ok=True)
-
+
# shard checkpoint
state_dict = model.state_dict()
state_dict_shard = shard_checkpoint(state_dict, max_shard_size=max_shard_size)
@@ -95,21 +100,22 @@ def save_sharded_model(self, model: nn.Module, checkpoint_path: str, gather_dten
total_size = total_size + shard_pair[1]
for key in shard.keys():
index_file.append_weight_map(key, shard_file)
-
+
checkpoint_file_path = os.path.join(checkpoint_path, shard_file)
save_state_dict(shard, checkpoint_file_path, use_safetensors)
-
+
index_file.append_meta_data("total_size", total_size)
index_file.write_index_file(save_index_file)
- logging.info(
- f"The model is going to be split to checkpoint shards. "
- f"You can find where each parameters has been saved in the "
- f"index located at {save_index_file}."
- )
-
-
- def load_sharded_model(self, model: nn.Module, checkpoint_index_file: Path, strict: bool = False,
- use_safetensors: bool = False, load_sub_module: bool = True):
+ logging.info(f"The model is going to be split to checkpoint shards. "
+ f"You can find where each parameters has been saved in the "
+ f"index located at {save_index_file}.")
+
+ def load_sharded_model(self,
+ model: nn.Module,
+ checkpoint_index_file: Path,
+ strict: bool = False,
+ use_safetensors: bool = False,
+ load_sub_module: bool = True):
"""
load shard model, load model from multiple files
"""
@@ -119,7 +125,7 @@ def load_sharded_model(self, model: nn.Module, checkpoint_index_file: Path, stri
if use_safetensors and not is_safetensors_available():
raise ImportError("`safe_serialization` requires the `safetensors` library: `pip install safetensors`.")
-
+
# read checkpoint index file
ckpt_index_file = CheckpointIndexFile.from_file(checkpoint_index_file)
checkpoint_files, _ = ckpt_index_file.get_checkpoint_fileanames()
@@ -134,10 +140,7 @@ def load_sharded_model(self, model: nn.Module, checkpoint_index_file: Path, stri
if strict:
remain_keys = reduce(lambda a, b: a & b, map(set, missing_keys))
if len(remain_keys) > 0:
- error_msgs = 'Missing key(s) in state_dict: {}. '.format(
- ', '.join('"{}"'.format(k) for k in missing_keys))
+ error_msgs = 'Missing key(s) in state_dict: {}. '.format(', '.join(
+ '"{}"'.format(k) for k in missing_keys))
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
- self.__class__.__name__, "\n\t".join(error_msgs)))
-
-
-
+ self.__class__.__name__, "\n\t".join(error_msgs)))
diff --git a/colossalai/checkpoint_io/index_file.py b/colossalai/checkpoint_io/index_file.py
index 15a6d09f3b5e..334ecbc04738 100644
--- a/colossalai/checkpoint_io/index_file.py
+++ b/colossalai/checkpoint_io/index_file.py
@@ -159,7 +159,7 @@ def get_all_param_names(self):
def write_index_file(self, save_index_file):
"""
- Wriete index file.
+ Write index file.
"""
save_index_file = os.path.join(self.root_path, save_index_file)
index = {"metadata": self.metadata, "weight_map": self.weight_map}
diff --git a/colossalai/checkpoint_io/utils.py b/colossalai/checkpoint_io/utils.py
index 16e41631f0d5..435feda4ac6a 100644
--- a/colossalai/checkpoint_io/utils.py
+++ b/colossalai/checkpoint_io/utils.py
@@ -1,10 +1,12 @@
# coding=utf-8
+import re
from pathlib import Path
+from typing import Iterator, List, Mapping, Optional, OrderedDict, Tuple
+
import torch
import torch.nn as nn
-from typing import List, Mapping, OrderedDict, Optional, Tuple, Iterator
+
from colossalai.tensor.d_tensor.d_tensor import DTensor
-import re
SAFE_WEIGHTS_NAME = "model.safetensors"
WEIGHTS_NAME = "pytorch_model.bin"
@@ -15,19 +17,21 @@
# General helper functions
# ======================================
+
def calculate_tensor_size(tensor: torch.Tensor) -> float:
"""
Calculate the size of a parameter in MB. Used to compute whether a group of params exceed the shard size.
If so, a new shard should be created.
Args:
- tenosr (torch.Tensor): the tensor to calculate size for.
+ tensor (torch.Tensor): the tensor to calculate size for.
Returns:
float: size of the tensor in MB.
"""
return tensor.numel() * tensor.element_size() / 1024 / 1024
+
def is_safetensors_available() -> bool:
"""
Check whether safetensors is available.
@@ -78,7 +82,6 @@ def is_safetensor_checkpoint(checkpoint_file_path: str) -> bool:
# Helper functions for saving shard file
# ======================================
def shard_checkpoint(state_dict: torch.Tensor, max_shard_size: int = 1024) -> Iterator[Tuple[OrderedDict, int]]:
-
"""
Splits a model state dictionary in sub-checkpoints so that the final size of each sub-checkpoint does not exceed a
given size.
@@ -100,35 +103,39 @@ def shard_checkpoint(state_dict: torch.Tensor, max_shard_size: int = 1024) -> It
current_block_size = 0
current_block[key] = weight
current_block_size += weight_size
-
+
if ret_block != None:
yield ret_block, ret_block_size
yield current_block, current_block_size
-def load_shard_state_dict(checkpoint_file: Path, use_safetensors: bool =False):
+def load_shard_state_dict(checkpoint_file: Path, use_safetensors: bool = False):
"""
load shard state dict into model
"""
if use_safetensors and not checkpoint_file.suffix == ".safetensors":
raise Exception("load the model using `safetensors`, but no file endwith .safetensors")
if use_safetensors:
- from safetensors.torch import safe_open
from safetensors.torch import load_file as safe_load_file
+ from safetensors.torch import safe_open
with safe_open(checkpoint_file, framework="pt") as f:
metadata = f.metadata()
if metadata["format"] != "pt":
raise NotImplementedError(
- f"Conversion from a {metadata['format']} safetensors archive to PyTorch is not implemented yet."
- )
+ f"Conversion from a {metadata['format']} safetensors archive to PyTorch is not implemented yet.")
return safe_load_file(checkpoint_file)
else:
return torch.load(checkpoint_file)
-
-def load_state_dict_into_model(model: nn.Module, state_dict: torch.Tensor, missing_keys: List, strict: bool = False, load_sub_module: bool = True):
+
+
+def load_state_dict_into_model(model: nn.Module,
+ state_dict: torch.Tensor,
+ missing_keys: List,
+ strict: bool = False,
+ load_sub_module: bool = True):
r"""Copies parameters and buffers from :attr:`state_dict` into
- this module and its descendants.
+ this module and its descendants.
Args:
state_dict (dict): a dict containing parameters and
@@ -166,11 +173,12 @@ def load(module: nn.Module, state_dict, prefix="", load_sub_module: bool = True)
if strict:
if len(unexpected_keys) > 0:
- error_msgs = 'Unexpected key(s) in state_dict: {}. '.format(
- ', '.join('"{}"'.format(k) for k in unexpected_keys))
+ error_msgs = 'Unexpected key(s) in state_dict: {}. '.format(', '.join(
+ '"{}"'.format(k) for k in unexpected_keys))
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
- model.__class__.__name__, "\n\t".join(error_msgs)))
-
+ model.__class__.__name__, "\n\t".join(error_msgs)))
+
+
# ======================================
# Helper functions for saving state dict
# ======================================
@@ -350,6 +358,8 @@ def has_index_file(checkpoint_path: str) -> Tuple[bool, Optional[Path]]:
return True, index_files[0]
else:
return False, None
+ else:
+ raise RuntimeError(f'Invalid checkpoint path {checkpoint_path}. Expected a file or a directory.')
def load_state_dict(checkpoint_file_path: Path):
@@ -380,7 +390,6 @@ def load_state_dict(checkpoint_file_path: Path):
else:
# load with torch
return torch.load(checkpoint_file_path)
-
def add_variant(weights_name: str, variant: Optional[str] = None) -> str:
@@ -392,17 +401,18 @@ def add_variant(weights_name: str, variant: Optional[str] = None) -> str:
return weights_name
-def get_base_filenames(variant: str=None, use_safetensors: bool=False):
- """
- generate base weight filenames
- """
- weights_name = SAFE_WEIGHTS_NAME if use_safetensors else WEIGHTS_NAME
- weights_name = add_variant(weights_name, variant)
+def get_base_filenames(variant: str = None, use_safetensors: bool = False):
+ """
+ generate base weight filenames
+ """
+ weights_name = SAFE_WEIGHTS_NAME if use_safetensors else WEIGHTS_NAME
+ weights_name = add_variant(weights_name, variant)
+
+ save_index_file = SAFE_WEIGHTS_INDEX_NAME if use_safetensors else WEIGHTS_INDEX_NAME
+ save_index_file = add_variant(save_index_file, variant)
- save_index_file = SAFE_WEIGHTS_INDEX_NAME if use_safetensors else WEIGHTS_INDEX_NAME
- save_index_file = add_variant(save_index_file, variant)
+ return weights_name, save_index_file
- return weights_name, save_index_file
def get_shard_filename(weights_name: str, idx: int):
"""
@@ -410,4 +420,4 @@ def get_shard_filename(weights_name: str, idx: int):
"""
shard_file = weights_name.replace(".bin", f"-{idx+1:05d}.bin")
shard_file = shard_file.replace(".safetensors", f"-{idx + 1:05d}.safetensors")
- return shard_file
\ No newline at end of file
+ return shard_file
diff --git a/colossalai/cli/check/check_installation.py b/colossalai/cli/check/check_installation.py
index cb3dbbc09301..4a481f3bd122 100644
--- a/colossalai/cli/check/check_installation.py
+++ b/colossalai/cli/check/check_installation.py
@@ -31,7 +31,7 @@ def check_installation():
found_aot_cuda_ext = _check_aot_built_cuda_extension_installed()
cuda_version = _check_cuda_version()
torch_version, torch_cuda_version = _check_torch_version()
- colossalai_verison, prebuilt_torch_version_required, prebuilt_cuda_version_required = _parse_colossalai_version()
+ colossalai_version, prebuilt_torch_version_required, prebuilt_cuda_version_required = _parse_colossalai_version()
# if cuda_version is None, that means either
# CUDA_HOME is not found, thus cannot compare the version compatibility
@@ -57,7 +57,7 @@ def check_installation():
click.echo(f'#### Installation Report ####')
click.echo(f'\n------------ Environment ------------')
- click.echo(f"Colossal-AI version: {to_click_output(colossalai_verison)}")
+ click.echo(f"Colossal-AI version: {to_click_output(colossalai_version)}")
click.echo(f"PyTorch version: {to_click_output(torch_version)}")
click.echo(f"System CUDA version: {to_click_output(cuda_version)}")
click.echo(f"CUDA version required by PyTorch: {to_click_output(torch_cuda_version)}")
@@ -137,7 +137,7 @@ def _parse_colossalai_version():
# 1. X.X.X+torchX.XXcuXX.X (when colossalai is installed with CUDA extensions)
# 2. X.X.X (when colossalai is not installed with CUDA extensions)
# where X represents an integer.
- colossalai_verison = colossalai.__version__.split('+')[0]
+ colossalai_version = colossalai.__version__.split('+')[0]
try:
torch_version_for_aot_build = colossalai.__version__.split('torch')[1].split('cu')[0]
@@ -145,7 +145,7 @@ def _parse_colossalai_version():
except:
torch_version_for_aot_build = None
cuda_version_for_aot_build = None
- return colossalai_verison, torch_version_for_aot_build, cuda_version_for_aot_build
+ return colossalai_version, torch_version_for_aot_build, cuda_version_for_aot_build
def _check_aot_built_cuda_extension_installed():
diff --git a/colossalai/cluster/dist_coordinator.py b/colossalai/cluster/dist_coordinator.py
index 99dde810e112..3ee364ec3364 100644
--- a/colossalai/cluster/dist_coordinator.py
+++ b/colossalai/cluster/dist_coordinator.py
@@ -181,7 +181,7 @@ def on_master_only(self, process_group: ProcessGroup = None):
"""
is_master = self.is_master(process_group)
- # define an inner functiuon
+ # define an inner function
def decorator(func):
@functools.wraps(func)
diff --git a/colossalai/device/alpha_beta_profiler.py b/colossalai/device/alpha_beta_profiler.py
index af2b10928c6f..f8b20de9bc37 100644
--- a/colossalai/device/alpha_beta_profiler.py
+++ b/colossalai/device/alpha_beta_profiler.py
@@ -381,7 +381,7 @@ def _extract_alpha_beta(pg, pg_handler):
first_latency, first_bandwidth = _extract_alpha_beta(first_axis, first_axis_process_group)
second_latency, second_bandwidth = _extract_alpha_beta(second_axis, second_axis_process_group)
mesh_alpha = [first_latency, second_latency]
- # The beta values have been enlarged by 1e10 times temporarilly because the computation cost
+ # The beta values have been enlarged by 1e10 times temporarily because the computation cost
# is still estimated in the unit of TFLOPs instead of time. We will remove this factor in future.
mesh_beta = [1e10 / first_bandwidth, 1e10 / second_bandwidth]
diff --git a/colossalai/engine/schedule/_pipeline_schedule.py b/colossalai/engine/schedule/_pipeline_schedule.py
index 38175fe0941c..9fc301a26559 100644
--- a/colossalai/engine/schedule/_pipeline_schedule.py
+++ b/colossalai/engine/schedule/_pipeline_schedule.py
@@ -152,9 +152,9 @@ def _get_data_slice(self, data, offset):
raise TypeError(f"Expected data to be of type torch.Tensor, list, tuple, or dict, but got {type(data)}")
def load_micro_batch(self):
- mciro_batch_data = self._get_data_slice(self.batch_data, self.microbatch_offset)
+ micro_batch_data = self._get_data_slice(self.batch_data, self.microbatch_offset)
self.microbatch_offset += self.microbatch_size
- return self._move_to_device(mciro_batch_data)
+ return self._move_to_device(micro_batch_data)
def pre_processing(self, engine):
from colossalai.zero.legacy import ShardedModelV2
diff --git a/colossalai/engine/schedule/_pipeline_schedule_v2.py b/colossalai/engine/schedule/_pipeline_schedule_v2.py
index 28c58bd82b5c..89e45c7aacec 100644
--- a/colossalai/engine/schedule/_pipeline_schedule_v2.py
+++ b/colossalai/engine/schedule/_pipeline_schedule_v2.py
@@ -84,7 +84,7 @@ def forward_backward_step(self,
'The argument \'return_loss\' has to be True when \'forward_only\' is False, but got False.'
self.load_batch(data_iter)
- # num_warmup_microbatches is the step when not all the processers are working
+ # num_warmup_microbatches is the step when not all the processes are working
num_warmup_microbatches = \
(gpc.get_world_size(ParallelMode.PIPELINE)
- gpc.get_local_rank(ParallelMode.PIPELINE) - 1)
diff --git a/colossalai/fx/codegen/activation_checkpoint_codegen.py b/colossalai/fx/codegen/activation_checkpoint_codegen.py
index 5a72cb9ca923..33b164800262 100644
--- a/colossalai/fx/codegen/activation_checkpoint_codegen.py
+++ b/colossalai/fx/codegen/activation_checkpoint_codegen.py
@@ -523,7 +523,7 @@ def emit_code_with_activation_checkpoint(body, ckpt_func, nodes, emit_node_func,
# append code text to body
for idx, node in enumerate(node_list):
# if this is the first node of the ckpt region
- # append the ckpt function defition
+ # append the ckpt function definition
if idx in start_idx:
label = start_idx.index(idx)
ckpt_fn_def = _gen_ckpt_fn_def(label, input_vars[label])
diff --git a/colossalai/fx/passes/adding_split_node_pass.py b/colossalai/fx/passes/adding_split_node_pass.py
index 2c7b842b530c..245ba5d776da 100644
--- a/colossalai/fx/passes/adding_split_node_pass.py
+++ b/colossalai/fx/passes/adding_split_node_pass.py
@@ -206,7 +206,7 @@ def avgcompute_split_pass(gm: torch.fx.GraphModule, pp_size: int):
def avgnode_split_pass(gm: torch.fx.GraphModule, pp_size: int):
"""
- In avgnode_split_pass, simpliy split graph by node number.
+ In avgnode_split_pass, simply split graph by node number.
"""
mod_graph = gm.graph
avg_num_node = len(mod_graph.nodes) // pp_size
diff --git a/colossalai/fx/passes/experimental/adding_shape_consistency_pass.py b/colossalai/fx/passes/experimental/adding_shape_consistency_pass.py
index f28d65e2668a..4571bd93a790 100644
--- a/colossalai/fx/passes/experimental/adding_shape_consistency_pass.py
+++ b/colossalai/fx/passes/experimental/adding_shape_consistency_pass.py
@@ -16,7 +16,7 @@ def apply(*args, **kwargs):
return shape_consistency_manager.apply(*args, **kwargs)
-def solution_annotatation_pass(gm: torch.fx.GraphModule, solution: List[int], device_mesh):
+def solution_annotation_pass(gm: torch.fx.GraphModule, solution: List[int], device_mesh):
mod_graph = gm.graph
nodes = tuple(mod_graph.nodes)
diff --git a/colossalai/fx/passes/meta_info_prop.py b/colossalai/fx/passes/meta_info_prop.py
index 2b4a8749cfd7..ab203dfd7440 100644
--- a/colossalai/fx/passes/meta_info_prop.py
+++ b/colossalai/fx/passes/meta_info_prop.py
@@ -31,7 +31,7 @@ class TensorMetadata(NamedTuple):
numel: int
is_tensor: bool
# TODO: we can add a list of sharding spec here, and record the sharding
- # behaviour by appending sharding spec into list.
+ # behavior by appending sharding spec into list.
def _extract_tensor_metadata(result: torch.Tensor) -> TensorMetadata:
diff --git a/colossalai/fx/passes/passes_for_gpt2_test.py b/colossalai/fx/passes/passes_for_gpt2_test.py
index abc1a089e9a9..efdd34a01fe0 100644
--- a/colossalai/fx/passes/passes_for_gpt2_test.py
+++ b/colossalai/fx/passes/passes_for_gpt2_test.py
@@ -230,7 +230,7 @@ def record_cross_partition_use(def_node: torch.fx.node.Node,
use_partition.partitions_dependent_on.setdefault(def_partition_name)
node_process_list = list(m.graph.nodes)
- # split nodes into parititons
+ # split nodes into partitions
while node_process_list:
node = node_process_list.pop(0)
orig_nodes[node.name] = node
@@ -277,7 +277,7 @@ def record_cross_partition_use(def_node: torch.fx.node.Node,
if len(sorted_partitions) != len(partitions):
raise RuntimeError("cycle exists between partitions!")
- # add placeholders to parititons
+ # add placeholders to partitions
for partition_name in sorted_partitions:
partition = partitions[partition_name]
for input in partition.inputs:
diff --git a/colossalai/fx/passes/split_module.py b/colossalai/fx/passes/split_module.py
index 5ce5b969cbde..61ed037ab7a1 100644
--- a/colossalai/fx/passes/split_module.py
+++ b/colossalai/fx/passes/split_module.py
@@ -29,8 +29,8 @@ def __repr__(self) -> str:
f" nodes: {self.node_names},\n" \
f" inputs: {self.inputs},\n" \
f" outputs: {self.outputs},\n" \
- f" partitions depenent on: {self.partitions_dependent_on},\n" \
- f" parition dependents: {self.partition_dependents}"
+ f" partitions dependent on: {self.partitions_dependent_on},\n" \
+ f" partition dependents: {self.partition_dependents}"
# Creates subgraphs out of main graph
diff --git a/colossalai/nn/optimizer/cpu_adam.py b/colossalai/nn/optimizer/cpu_adam.py
index 54036973e1e3..bb561a106515 100644
--- a/colossalai/nn/optimizer/cpu_adam.py
+++ b/colossalai/nn/optimizer/cpu_adam.py
@@ -13,7 +13,7 @@
class CPUAdam(NVMeOptimizer):
"""Implements Adam algorithm.
- Supports parameters updating on both GPU and CPU, depanding on the device of paramters.
+ Supports parameters updating on both GPU and CPU, depanding on the device of parameters.
But the parameters and gradients should on the same device:
* Parameters on CPU and gradients on CPU is allowed.
* Parameters on GPU and gradients on GPU is allowed.
diff --git a/colossalai/nn/optimizer/hybrid_adam.py b/colossalai/nn/optimizer/hybrid_adam.py
index 1d0fb92de499..be6311c6c29f 100644
--- a/colossalai/nn/optimizer/hybrid_adam.py
+++ b/colossalai/nn/optimizer/hybrid_adam.py
@@ -13,19 +13,19 @@
class HybridAdam(NVMeOptimizer):
"""Implements Adam algorithm.
- Supports parameters updating on both GPU and CPU, depanding on the device of paramters.
+ Supports parameters updating on both GPU and CPU, depanding on the device of parameters.
But the parameters and gradients should on the same device:
* Parameters on CPU and gradients on CPU is allowed.
* Parameters on GPU and gradients on GPU is allowed.
* Parameters on GPU and gradients on CPU is **not** allowed.
- `HybriadAdam` requires CUDA extensions which can be built during installation or runtime.
+ `HybridAdam` requires CUDA extensions which can be built during installation or runtime.
This version of Hybrid Adam is an hybrid of CPUAdam and FusedAdam.
* For parameters updating on CPU, it uses CPUAdam.
* For parameters updating on GPU, it uses FusedAdam.
- * Hybird precision calculation of fp16 and fp32 is supported, eg fp32 parameters and fp16 gradients.
+ * Hybrid precision calculation of fp16 and fp32 is supported, eg fp32 parameters and fp16 gradients.
:class:`colossalai.nn.optimizer.HybridAdam` may be used as a drop-in replacement for ``torch.optim.AdamW``,
or ``torch.optim.Adam`` with ``adamw_mode=False``
@@ -131,7 +131,7 @@ def step(self, closure=None, div_scale: float = -1):
assert state['exp_avg'].device.type == 'cuda', "exp_avg should stay on cuda"
assert state['exp_avg_sq'].device.type == 'cuda', "exp_avg should stay on cuda"
- # record the state by gruop and update at once
+ # record the state by group and update at once
g_l.append(p.grad.data)
p_l.append(p.data)
m_l.append(state['exp_avg'])
diff --git a/colossalai/nn/parallel/layers/cache_embedding/cache_mgr.py b/colossalai/nn/parallel/layers/cache_embedding/cache_mgr.py
index da043df368ae..a6159856dcce 100644
--- a/colossalai/nn/parallel/layers/cache_embedding/cache_mgr.py
+++ b/colossalai/nn/parallel/layers/cache_embedding/cache_mgr.py
@@ -20,8 +20,8 @@ def _wait_for_data(t, stream: Optional[torch.cuda.streams.Stream]) -> None:
return
torch.cuda.current_stream().wait_stream(stream)
# As mentioned in https://pytorch.org/docs/stable/generated/torch.Tensor.record_stream.html,
- # PyTorch uses the "caching allocator" for memroy allocation for tensors. When a tensor is
- # freed, its memory is likely to be reused by newly constructed tenosrs. By default,
+ # PyTorch uses the "caching allocator" for memory allocation for tensors. When a tensor is
+ # freed, its memory is likely to be reused by newly constructed tensors. By default,
# this allocator traces whether a tensor is still in use by only the CUDA stream where it
# was created. When a tensor is used by additional CUDA streams, we need to call record_stream
# to tell the allocator about all these streams. Otherwise, the allocator might free the
@@ -294,7 +294,7 @@ def print_comm_stats(self):
print(
f"CPU->CUDA BWD {self._cpu_to_cuda_numel * self.elem_size_in_byte / 1e6 / elapsed} MB/s {self._cpu_to_cuda_numel / 1e6} M elem"
)
- print(f'cpu_to_cuda_elpase {elapsed} sec')
+ print(f'cpu_to_cuda_elapse {elapsed} sec')
for k, v in self._elapsed_dict.items():
print(f'{k}: {v}')
diff --git a/colossalai/testing/pytest_wrapper.py b/colossalai/testing/pytest_wrapper.py
index a472eb3723ec..b264b009028a 100644
--- a/colossalai/testing/pytest_wrapper.py
+++ b/colossalai/testing/pytest_wrapper.py
@@ -33,7 +33,7 @@ def test_for_something():
assert isinstance(name, str)
flag = os.environ.get(name.upper(), '0')
- reason = f'Environment varialbe {name} is {flag}'
+ reason = f'Environment variable {name} is {flag}'
if flag == '1':
return pytest.mark.skipif(False, reason=reason)
else:
diff --git a/colossalai/testing/utils.py b/colossalai/testing/utils.py
index 6583eeb12bf4..a4370a8d4933 100644
--- a/colossalai/testing/utils.py
+++ b/colossalai/testing/utils.py
@@ -167,10 +167,10 @@ def test_something():
"""
# check version
torch_version = version.parse(torch.__version__)
- assert torch_version.major == 1
+ assert torch_version.major >= 1
# only torch >= 1.8 has ProcessRaisedException
- if torch_version.minor >= 8:
+ if torch_version >= version.parse("1.8.0"):
exception = torch.multiprocessing.ProcessRaisedException
else:
exception = Exception
diff --git a/colossalai/utils/common.py b/colossalai/utils/common.py
index 95b3b8014af1..8022e84dc24b 100644
--- a/colossalai/utils/common.py
+++ b/colossalai/utils/common.py
@@ -324,7 +324,7 @@ def clip_grad_norm_fp32(parameters, max_norm, norm_type=2):
norm_type = float(norm_type)
# Parameters can be on CPU or CUDA
- # If parameters are on CPU, disable CUDA kernerls
+ # If parameters are on CPU, disable CUDA kernels
# Calculate norm.
if norm_type == inf:
diff --git a/colossalai/utils/tensor_detector/readme.md b/colossalai/utils/tensor_detector/readme.md
index 840dc8f4eca6..d6852ea55b54 100644
--- a/colossalai/utils/tensor_detector/readme.md
+++ b/colossalai/utils/tensor_detector/readme.md
@@ -46,7 +46,7 @@ detector.detect()
I have made some comments on the right of the output for your understanding.
-Note that the total `Mem` of all the tensors and parameters is not equal to `Total GPU Memery Allocated`. PyTorch's memory management is really complicated, and for models of a large scale, it's impossible to figure out clearly.
+Note that the total `Mem` of all the tensors and parameters is not equal to `Total GPU Memory Allocated`. PyTorch's memory management is really complicated, and for models of a large scale, it's impossible to figure out clearly.
**The order of print is not equal to the order the tensor creates, but they are really close.**
@@ -61,7 +61,7 @@ Note that the total `Mem` of all the tensors and parameters is not equal to `Tot
+ mlp.2.bias cuda:0 (32,) True torch.float32 128 B
------------------------------------------------------------------------------------------------------------
Detect Location: "test_tensor_detector.py" line 27
-Totle GPU Memery Allocated on cuda:0 is 4.5 KB
+Total GPU Memory Allocated on cuda:0 is 4.5 KB
------------------------------------------------------------------------------------------------------------
@@ -72,7 +72,7 @@ Totle GPU Memery Allocated on cuda:0 is 4.5 KB
+ Tensor cuda:0 (32,) True torch.float32 128 B # output
------------------------------------------------------------------------------------------------------------
Detect Location: "test_tensor_detector.py" line 30
-Totle GPU Memery Allocated on cuda:0 is 5.5 KB
+Total GPU Memory Allocated on cuda:0 is 5.5 KB
------------------------------------------------------------------------------------------------------------
@@ -82,7 +82,7 @@ Totle GPU Memery Allocated on cuda:0 is 5.5 KB
+ Tensor cuda:0 () True torch.float32 4 B # loss
------------------------------------------------------------------------------------------------------------
Detect Location: "test_tensor_detector.py" line 32
-Totle GPU Memery Allocated on cuda:0 is 6.0 KB
+Total GPU Memory Allocated on cuda:0 is 6.0 KB
------------------------------------------------------------------------------------------------------------
@@ -103,7 +103,7 @@ Totle GPU Memery Allocated on cuda:0 is 6.0 KB
- Tensor cuda:0 (8,) True torch.float32 32 B # deleted activation
------------------------------------------------------------------------------------------------------------
Detect Location: "test_tensor_detector.py" line 34
-Totle GPU Memery Allocated on cuda:0 is 10.0 KB
+Total GPU Memory Allocated on cuda:0 is 10.0 KB
------------------------------------------------------------------------------------------------------------
@@ -117,7 +117,7 @@ Totle GPU Memery Allocated on cuda:0 is 10.0 KB
+ Tensor cuda:0 (32,) False torch.float32 128 B
------------------------------------------------------------------------------------------------------------
Detect Location: "test_tensor_detector.py" line 36
-Totle GPU Memery Allocated on cuda:0 is 14.0 KB
+Total GPU Memory Allocated on cuda:0 is 14.0 KB
------------------------------------------------------------------------------------------------------------
```
diff --git a/colossalai/utils/tensor_detector/tensor_detector.py b/colossalai/utils/tensor_detector/tensor_detector.py
index a8186f76834c..cfcd4e47b4cb 100644
--- a/colossalai/utils/tensor_detector/tensor_detector.py
+++ b/colossalai/utils/tensor_detector/tensor_detector.py
@@ -55,7 +55,7 @@ def get_tensor_mem(self, tensor):
return self.mem_format(memory_size)
def mem_format(self, real_memory_size):
- # format the tensor memory into a reasonal magnitude
+ # format the tensor memory into a reasonable magnitude
if real_memory_size >= 2**30:
return str(real_memory_size / (2**30)) + ' GB'
if real_memory_size >= 2**20:
@@ -71,7 +71,7 @@ def collect_tensors_state(self):
if (not self.include_cpu) and obj.device == torch.device('cpu'):
continue
self.detected.append(id(obj))
- # skip paramters we had added in __init__ when module is an instance of nn.Module for the first epoch
+ # skip parameters we had added in __init__ when module is an instance of nn.Module for the first epoch
if id(obj) not in self.tensor_info:
name = type(obj).__name__
@@ -84,7 +84,7 @@ def collect_tensors_state(self):
name = par_name + ' (with grad)'
else:
# with no grad attached
- # there will be no new paramters created during running
+ # there will be no new parameters created during running
# so it must be in saved_tensor_info
continue
# we can also marked common tensors as tensor(with grad)
@@ -155,7 +155,7 @@ def print_tensors_state(self):
if device == torch.device('cpu'):
continue
gpu_mem_alloc = self.mem_format(torch.cuda.memory_allocated(device))
- self.info += f"Totle GPU Memery Allocated on {device} is {gpu_mem_alloc}\n"
+ self.info += f"Total GPU Memory Allocated on {device} is {gpu_mem_alloc}\n"
self.info += LINE
self.info += '\n\n'
if self.show_info:
diff --git a/colossalai/zero/gemini/chunk/manager.py b/colossalai/zero/gemini/chunk/manager.py
index d85df0b00476..77368d06d255 100644
--- a/colossalai/zero/gemini/chunk/manager.py
+++ b/colossalai/zero/gemini/chunk/manager.py
@@ -102,7 +102,7 @@ def access_chunk(self, chunk: Chunk) -> None:
"""
if chunk in self.accessed_chunks:
return
- self.__sub_memroy_usage(chunk.memory_usage)
+ self.__sub_memory_usage(chunk.memory_usage)
if chunk.device_type == 'cpu':
chunk.shard_move(get_current_device())
self.__add_accessed_chunk(chunk)
@@ -114,7 +114,7 @@ def release_chunk(self, chunk: Chunk) -> None:
if chunk not in self.accessed_chunks:
return
if chunk.can_release:
- self.__sub_memroy_usage(chunk.memory_usage)
+ self.__sub_memory_usage(chunk.memory_usage)
self.__sub_accessed_chunk(chunk)
self.__add_memory_usage(chunk.memory_usage)
@@ -123,7 +123,7 @@ def move_chunk(self, chunk: Chunk, device: torch.device, force_copy: bool = Fals
"""
if not chunk.can_move or chunk.device_type == device.type:
return
- self.__sub_memroy_usage(chunk.memory_usage)
+ self.__sub_memory_usage(chunk.memory_usage)
chunk.shard_move(device, force_copy)
self.__add_memory_usage(chunk.memory_usage)
@@ -138,7 +138,7 @@ def reduce_chunk(self, chunk: Chunk) -> bool:
"""
if not chunk.can_reduce:
return False
- self.__sub_memroy_usage(chunk.memory_usage)
+ self.__sub_memory_usage(chunk.memory_usage)
chunk.reduce()
self.__sub_accessed_chunk(chunk)
self.__add_memory_usage(chunk.memory_usage)
@@ -228,11 +228,11 @@ def __get_chunk_group(self, group_name: str) -> Deque:
return self.chunk_groups[group_name]
def __close_one_chunk(self, chunk: Chunk):
- self.__sub_memroy_usage(chunk.memory_usage)
+ self.__sub_memory_usage(chunk.memory_usage)
chunk.close_chunk()
self.__add_memory_usage(chunk.memory_usage)
- def __sub_memroy_usage(self, usage: Dict[str, int]):
+ def __sub_memory_usage(self, usage: Dict[str, int]):
for k, v in usage.items():
self.total_mem[k] -= v
diff --git a/colossalai/zero/gemini/chunk/search_utils.py b/colossalai/zero/gemini/chunk/search_utils.py
index da58e038c879..881ceb0b3b97 100644
--- a/colossalai/zero/gemini/chunk/search_utils.py
+++ b/colossalai/zero/gemini/chunk/search_utils.py
@@ -85,7 +85,7 @@ def classify_params_by_dp_degree(param_order: OrderedParamGenerator,
Classify the parameters by their dp degree
Args:
- param_order (OrderedParamGenerator): the order of param be visied
+ param_order (OrderedParamGenerator): the order of param be vised
strict_ddp_flag (bool, optional): whether to enable the strict ddp mode. Defaults to False.
Returns:
diff --git a/colossalai/zero/gemini/memory_tracer/memory_stats.py b/colossalai/zero/gemini/memory_tracer/memory_stats.py
index 9a45034ee27e..41d7e5754e96 100644
--- a/colossalai/zero/gemini/memory_tracer/memory_stats.py
+++ b/colossalai/zero/gemini/memory_tracer/memory_stats.py
@@ -59,7 +59,7 @@ def increase_preop_step(self, param_list: List[torch.nn.Parameter]):
time step.
Args:
- param_list (List[torch.nn.Parameter]): a list of torch paramters.
+ param_list (List[torch.nn.Parameter]): a list of torch parameters.
"""
for p in param_list:
if p not in self._param_step_dict:
diff --git a/docker/Dockerfile b/docker/Dockerfile
index 49ff9b344268..2c7bafd9604c 100644
--- a/docker/Dockerfile
+++ b/docker/Dockerfile
@@ -8,14 +8,19 @@ LABEL org.opencontainers.image.base.name = "docker.io/library/hpcaitech/cuda-con
# install torch
RUN conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
+# install ninja
+RUN apt-get install -y --no-install-recommends ninja-build
+
# install apex
RUN git clone https://github.com/NVIDIA/apex && \
cd apex && \
+ git checkout 91fcaa && \
pip install packaging && \
pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" ./
# install colossalai
-RUN git clone https://github.com/hpcaitech/ColossalAI.git \
+ARG VERSION=1
+RUN git clone -b ${VERSION} https://github.com/hpcaitech/ColossalAI.git \
&& cd ./ColossalAI \
&& CUDA_EXT=1 pip install -v --no-cache-dir .
diff --git a/docs/README-zh-Hans.md b/docs/README-zh-Hans.md
index 9d5bcfe3f974..1dde7a816676 100644
--- a/docs/README-zh-Hans.md
+++ b/docs/README-zh-Hans.md
@@ -121,12 +121,22 @@ Colossal-AI 为您提供了一系列并行组件。我们的目标是让您的
### ColossalChat