[evaluate] support gpt evaluation by chengeharrison · Pull Request #3807 · hpcaitech/ColossalAI

chengeharrison · 2023-05-23T07:20:42Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

#3714
Some legacy code was not formatted using pre-commit.

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Add support for gpt evaluation.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

chengeharrison · 2023-05-23T07:23:25Z

-bert-score
-rouge_chinese
-scikit-metrics
-nltk


Requirements for evaluation should be included in Chat/evaluate/requirements.txt instead of Chat/requirements.txt.

Is the evaluation a part of the Chat? If we include these in Chat/evaluate/requirements.txt, we need to install them seperately from the Chat when we want to use evaluate feature?

Chat/examples and Chat/inference have their own requirements.txt.

TongLi3701

Thanks Yuanchen.

Overall looks fine. Just have some questions.

Are we going to add a sample config file later ?
Do we support GPT-4? I assume that we only need to add argument to control if we want to use 4 or 3.5? So the function name can be renamed ?

TongLi3701 · 2023-05-23T07:41:14Z

+jieba
+bert-score
+rouge_chinese
+scikit-metrics
+nltk
+openai
+seaborn
+pandas
+matplotlib
+numpy


We do not have additional setup.py for evaluation. So adding requirements.txt here can not install all these packages.

All these should be moved back to applications/Chat/requirements.txt as the setup.py will fetch this file there.

But Chat/examples and Chat/inference have their own requirements.txt.

I assume what we have done before was wrong? If you check the setup.py file it only fetches the requirements.txt from the same directory.

Any comments or my understanding is wrong? @ver217

requirements.txt is essential to use coati. examples/requirements.txt is like extra requirements to run examples. I think it's OK. As for inference, the installation process of GPTQ is very difficult and users must follow its read me then install it manually.

As for evaluation, do you think it's a part of coati? If so, move the evaluation directory to coati/ and add requirements to requirements.txt. Otherwise, keep them apart.

coati is regarded as a library. So if evaluation is a part of coati, you'd better provide CLI.

Ok, sure.

We can keep it separate, but make sure in the readme file, you will need to add a setup section for pip install -r requirements.txt

@chengeharrison @Camille7777

TongLi3701 · 2023-05-23T07:45:25Z

+        # gpt35 evaluation
+        for category in self.params:
+            category_metrics = self.params[category]["GPT-3.5"]
+
+            prompt = self.gpt_evaluation_prompt.get(category, None)
+            if prompt is None:
+                print(f"No prompt for category {category}! Use prompt for category general now.")
+                prompt = self.gpt_evaluation_prompt["general"]
+
+            self.gpt35_evaluation_results[category] = gpt_evaluate.gpt35_evaluate(answers_per_category[category],
+                                                                                  prompt, category_metrics, category)


Do we support GPT-4? It seems that all functions are designed for GPT-3.5?

For evaluation, we now only support GPT-3.5. Evaluation using GPT-4 requires sampling which costs a lot and we can't test it because we don't have access to GPT-4 currently.

Sure. We can add it into TODO list.

TongLi3701

Thanks Yuanchen.

support gpt evaluation

2abf191

chengeharrison added the code reviewing label May 23, 2023

chengeharrison requested review from Camille7777, TongLi3701 and ver217 May 23, 2023 07:20

chengeharrison commented May 23, 2023

View reviewed changes

TongLi3701 reviewed May 23, 2023

View reviewed changes

TongLi3701 self-requested a review May 23, 2023 08:15

TongLi3701 approved these changes May 23, 2023

View reviewed changes

ver217 approved these changes May 23, 2023

View reviewed changes

Camille7777 approved these changes May 23, 2023

View reviewed changes

chengeharrison merged commit 0a0a069 into hpcaitech:dev/evaluation May 23, 2023

Conversation

chengeharrison commented May 23, 2023

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

chengeharrison May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TongLi3701 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

chengeharrison May 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TongLi3701 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

chengeharrison May 23, 2023 •

edited

Loading

chengeharrison May 23, 2023 •

edited

Loading