🐛 Describe the bug
generate_kwargs is not properly passed to PPOTrainer. So, generating sentences might be incorrect. (At leat in my case)
|
experience_maker = NaiveExperienceMaker(actor, critic, reward_model, initial_model, kl_coef) |
|
replay_buffer = NaiveReplayBuffer(train_batch_size, buffer_limit, buffer_cpu_offload) |
|
super().__init__(strategy, experience_maker, replay_buffer, experience_batch_size, max_epochs, tokenizer, |
|
sample_replay_buffer, dataloader_pin_memory, callbacks, **generate_kwargs) |
|
self.actor = actor |
|
self.critic = critic |
|
|
|
self.actor_loss_fn = PolicyLoss(eps_clip) |
|
self.critic_loss_fn = ValueLoss(value_clip) |
|
|
|
self.actor_optim = actor_optim |
|
self.critic_optim = critic_optim |
|
self._set_default_generate_kwargs(generate_kwargs, actor) |
Like Huggingface GPT2 Model, prepare_inputs_fn and update_model_kwargs_fn should be passed to generate. But in that code, these functions are not applied because _set_default_generate_kwargs() is called after super().__init__()
So, I think the order of call should be changed.
Environment
No response
🐛 Describe the bug
generate_kwargsis not properly passed toPPOTrainer. So, generating sentences might be incorrect. (At leat in my case)ColossalAI/applications/ChatGPT/chatgpt/trainer/ppo.py
Lines 64 to 76 in 5d5f475
Like Huggingface GPT2 Model,
prepare_inputs_fnandupdate_model_kwargs_fnshould be passed to generate. But in that code, these functions are not applied because_set_default_generate_kwargs()is called aftersuper().__init__()So, I think the order of call should be changed.
Environment
No response