-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[chat]: update rm, add wandb and fix bugs #4471
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
34 commits
Select commit
Hold shift + click to select a range
99e0fd0
feat: modify forward fn of critic and reward model
cwher 482e1e4
feat: modify calc_action_log_probs
cwher 3c28e62
to: add wandb in sft and rm trainer
cwher 69da25b
feat: update train_sft
cwher a4ab376
feat: update train_rm
cwher 11babfa
style: modify type annotation and add warning
cwher 88a5409
feat: pass tokenizer to ppo trainer
cwher 2097139
to: modify trainer base and maker base
cwher 18d664c
feat: add wandb in ppo trainer
cwher 1dd3269
feat: pass tokenizer to generate
cwher 5af5e58
test: update generate fn tests
cwher a0e32aa
test: update train tests
cwher aa94aa3
fix: remove action_mask
cwher c1c8026
feat: remove unused code
cwher b160a26
fix: fix wrong ignore_index
cwher a81f004
fix: fix mock tokenizer
cwher 18f879b
chore: update requirements
cwher f4cd1a5
revert: modify make_experience
cwher 640867e
fix: fix inference
cwher a43f481
fix: add padding side
cwher 6c9fa1d
style: modify _on_learn_batch_end
cwher d0166ec
test: use mock tokenizer
cwher 756f84a
fix: use bf16 to avoid overflow
cwher dde4b13
fix: fix workflow
cwher e8d1b7b
[chat] fix gemini strategy
flybird11111 d1084e4
[chat] fix
flybird11111 93caf5a
sync: update colossalai strategy
cwher 50488a0
fix: fix args and model dtype
cwher 2ef1100
fix: fix checkpoint test
cwher c26f751
fix: fix requirements
cwher 82b61da
fix: fix missing import and wrong arg
cwher ab6bc55
fix: temporarily skip gemini test in stage 3
cwher 3434309
style: apply pre-commit
cwher 1f3d7f1
fix: temporarily skip gemini test in stage 1&2
cwher File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.