[Chat] fix the tokenizer "int too big to convert" error in SFT training by Camille7777 · Pull Request #3453 · hpcaitech/ColossalAI

Camille7777 · 2023-04-05T12:20:21Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234
fixed #3438

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.
add max_length setting in _tokenize_fn when using SupervisedDataset

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

RoBERTa for RLHF Stage 2 & 3 (still in testing)

This reverts commit 06741d8.

1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci

This reverts commit 9c7352b.

RoBERTa for RLHF Stage 2 & 3 (still in testing)

This reverts commit 06741d8.

1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci

This reverts commit 9c7352b.

This reverts commit 17ae7ae.

fix the tokenizer error during SFT training using Bloom and OPT

Camille7777 added 14 commits April 6, 2023 09:25

Add RoBERTa for RLHF Stage 2 & 3 (test)

5c69dc1

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

548e499

This reverts commit 06741d8.

Add RoBERTa for RLHF stage 2 & 3

026c363

1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci

Update test_ci.sh

a6aa921

Revert "Update test_ci.sh"

96dda63

This reverts commit 9c7352b.

Add RoBERTa for RLHF Stage 2 & 3 (test)

a4a860f

RoBERTa for RLHF Stage 2 & 3 (still in testing)

Revert "Add RoBERTa for RLHF Stage 2 & 3 (test)"

2825ec4

This reverts commit 06741d8.

Add RoBERTa for RLHF stage 2 & 3

b1403ea

1. add roberta folder under model folder 2. add roberta option in train_reward_model.py 3. add some test in testci

Update test_ci.sh

ff52770

Revert "Update test_ci.sh"

512f70e

This reverts commit 9c7352b.

update roberta with coati

d643e91

chat ci update

7c4dad6

Revert "chat ci update"

7327cf3

This reverts commit 17ae7ae.

[Chat] fix the tokenizer "int too big to convert" error in SFT training

98b0742

fix the tokenizer error during SFT training using Bloom and OPT

Camille7777 force-pushed the hotfix/chat branch from 2e56071 to 98b0742 Compare April 6, 2023 01:25

Fazziekey approved these changes Apr 6, 2023

View reviewed changes

Fazziekey merged commit 72cb4dd into hpcaitech:main Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chat] fix the tokenizer "int too big to convert" error in SFT training#3453

[Chat] fix the tokenizer "int too big to convert" error in SFT training#3453
Fazziekey merged 14 commits intohpcaitech:mainfrom
Camille7777:hotfix/chat

Camille7777 commented Apr 5, 2023 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Camille7777 commented Apr 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Camille7777 commented Apr 5, 2023 •

edited

Loading