Skip to content

[chat] fix train_prompts.py gemini strategy bug#3666

Merged
ver217 merged 4 commits intohpcaitech:mainfrom
zhang-yi-chi:fix/chat-train-prompts-gemini
May 6, 2023
Merged

[chat] fix train_prompts.py gemini strategy bug#3666
ver217 merged 4 commits intohpcaitech:mainfrom
zhang-yi-chi:fix/chat-train-prompts-gemini

Conversation

@zhang-yi-chi
Copy link
Copy Markdown
Contributor

📌 Checklist before creating the PR

  • I have created an issue for this PR for traceability
  • The title follows the standard format: [doc/gemini/tensor/...]: A concise description
  • I have added relevant tags if possible for us to better distinguish different PRs

🚨 Issue number

📝 What does this PR do?

Intial model and reward model are not wrapped by ZeroDDP wrapper,so they cannot accept ColoTensor as model input. Here we use .data as model input.

💥 Checklist before requesting a review

  • I have linked my PR to an issue (instruction)
  • My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
  • I have performed a self-review of my code
  • I have added thorough tests.
  • I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

  • 🌝 Yes, I do.
  • 🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

@ver217
Copy link
Copy Markdown
Contributor

ver217 commented Apr 28, 2023

What is the problem? I test https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/benchmarks/benchmark_opt_lora_dummy.py with gemini strategy and no error occurs.

A naive torch module should be able to receive ColoTensor as well.

@zhang-yi-chi
Copy link
Copy Markdown
Contributor Author

What is the problem? I test https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/benchmarks/benchmark_opt_lora_dummy.py with gemini strategy and no error occurs.

A naive torch module should be able to receive ColoTensor as well.

run train_prompts.sh with colossal_gemini strategy cause following error

File "ColossalAI/applications/Chat/coati/experience_maker/naive.py", line 25, in make_experience
    base_action_log_probs = self.initial_model(sequences, num_actions, attention_mask)
...
File "/lib/python3.8/site-packages/colossalai/nn/_ops/embedding.py", line 111, in colo_embedding
    assert isinstance(weight, ColoTensor)

initial_model and reward_model are not ZeroDDP module so their weights are not ColoTensor.

@ver217
Copy link
Copy Markdown
Contributor

ver217 commented Apr 28, 2023

What is the problem? I test https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/benchmarks/benchmark_opt_lora_dummy.py with gemini strategy and no error occurs.
A naive torch module should be able to receive ColoTensor as well.

run train_prompts.sh with colossal_gemini strategy cause following error

File "ColossalAI/applications/Chat/coati/experience_maker/naive.py", line 25, in make_experience
    base_action_log_probs = self.initial_model(sequences, num_actions, attention_mask)
...
File "/lib/python3.8/site-packages/colossalai/nn/_ops/embedding.py", line 111, in colo_embedding
    assert isinstance(weight, ColoTensor)

initial_model and reward_model are not ZeroDDP module so their weights are not ColoTensor.

How do you run this script?

@zhang-yi-chi
Copy link
Copy Markdown
Contributor Author

What is the problem? I test https://github.com/hpcaitech/ColossalAI/blob/main/applications/Chat/benchmarks/benchmark_opt_lora_dummy.py with gemini strategy and no error occurs.
A naive torch module should be able to receive ColoTensor as well.

run train_prompts.sh with colossal_gemini strategy cause following error

File "ColossalAI/applications/Chat/coati/experience_maker/naive.py", line 25, in make_experience
    base_action_log_probs = self.initial_model(sequences, num_actions, attention_mask)
...
File "/lib/python3.8/site-packages/colossalai/nn/_ops/embedding.py", line 111, in colo_embedding
    assert isinstance(weight, ColoTensor)

initial_model and reward_model are not ZeroDDP module so their weights are not ColoTensor.

How do you run this script?

I installed colossal by CUDA_EXT=1 pip3 install -v .

Something like this under applications/Chat folder

cp examples/train_prompts.py .
torchrun --standalone --nproc_per_node=1 train_prompts.py \
   --strategy colossalai_gemini \
   --pretrain_dataset /projects/llm/data/coati/instinwild_en.json \
   --prompt_dataset /projects/llm/data/coati/instinwild_en.json \
   --model 'bloom' \
   --pretrain /projects/llm/Coati-BLOOM-560M \
   --rm_model 'bloom' \
   --rm_pretrain /projects/llm/hf/bloom-560m \
   --rm_path /projects/llm/Coati-RM/hh-rlhf.pt \
   --save_path /projects/llm/Coati-PROMPTS/ppo.pt

@ver217
Copy link
Copy Markdown
Contributor

ver217 commented Apr 28, 2023

This issue can be simply resolved by move with strategy.model_init_context(): to here

@ver217
Copy link
Copy Markdown
Contributor

ver217 commented Apr 28, 2023

ColoTensor will be removed in the future. So we'd better reduce the dependency on ColoTensor.

@zhang-yi-chi
Copy link
Copy Markdown
Contributor Author

This issue can be simply resolved by move with strategy.model_init_context(): to here

That's a better solution. I submitted what you proposed.

@ver217 ver217 merged commit 2da5d81 into hpcaitech:main May 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants