Skip to content

[evaluate] support gpt evaluation#3807

Merged
chengeharrison merged 1 commit intohpcaitech:dev/evaluationfrom
chengeharrison:dev/evaluate
May 23, 2023
Merged

[evaluate] support gpt evaluation#3807
chengeharrison merged 1 commit intohpcaitech:dev/evaluationfrom
chengeharrison:dev/evaluate

Conversation

@chengeharrison
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

#3714
Some legacy code was not formatted using pre-commit.

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

Add support for gpt evaluation.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

bert-score
rouge_chinese
scikit-metrics
nltk
Copy link
Copy Markdown
Contributor Author

@chengeharrison chengeharrison May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requirements for evaluation should be included in Chat/evaluate/requirements.txt instead of Chat/requirements.txt.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the evaluation a part of the Chat? If we include these in Chat/evaluate/requirements.txt, we need to install them seperately from the Chat when we want to use evaluate feature?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chat/examples and Chat/inference have their own requirements.txt.

Copy link
Copy Markdown
Contributor

@TongLi3701 TongLi3701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yuanchen.

Overall looks fine. Just have some questions.

  1. Are we going to add a sample config file later ?
  2. Do we support GPT-4? I assume that we only need to add argument to control if we want to use 4 or 3.5? So the function name can be renamed ?

Comment on lines +1 to +10
jieba
bert-score
rouge_chinese
scikit-metrics
nltk
openai
seaborn
pandas
matplotlib
numpy
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not have additional setup.py for evaluation. So adding requirements.txt here can not install all these packages.

All these should be moved back to applications/Chat/requirements.txt as the setup.py will fetch this file there.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But Chat/examples and Chat/inference have their own requirements.txt.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume what we have done before was wrong? If you check the setup.py file it only fetches the requirements.txt from the same directory.

Any comments or my understanding is wrong? @ver217

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requirements.txt is essential to use coati. examples/requirements.txt is like extra requirements to run examples. I think it's OK. As for inference, the installation process of GPTQ is very difficult and users must follow its read me then install it manually.

As for evaluation, do you think it's a part of coati? If so, move the evaluation directory to coati/ and add requirements to requirements.txt. Otherwise, keep them apart.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coati is regarded as a library. So if evaluation is a part of coati, you'd better provide CLI.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sure.

We can keep it separate, but make sure in the readme file, you will need to add a setup section for pip install -r requirements.txt

@chengeharrison @Camille7777

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted

Comment on lines +77 to +87
# gpt35 evaluation
for category in self.params:
category_metrics = self.params[category]["GPT-3.5"]

prompt = self.gpt_evaluation_prompt.get(category, None)
if prompt is None:
print(f"No prompt for category {category}! Use prompt for category general now.")
prompt = self.gpt_evaluation_prompt["general"]

self.gpt35_evaluation_results[category] = gpt_evaluate.gpt35_evaluate(answers_per_category[category],
prompt, category_metrics, category)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support GPT-4? It seems that all functions are designed for GPT-3.5?

Copy link
Copy Markdown
Contributor Author

@chengeharrison chengeharrison May 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For evaluation, we now only support GPT-3.5. Evaluation using GPT-4 requires sampling which costs a lot and we can't test it because we don't have access to GPT-4 currently.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. We can add it into TODO list.

@TongLi3701 TongLi3701 self-requested a review May 23, 2023 08:15
Copy link
Copy Markdown
Contributor

@TongLi3701 TongLi3701 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yuanchen.

@chengeharrison chengeharrison merged commit 0a0a069 into hpcaitech:dev/evaluation May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants