📚 The doc issue
As raised by some users, the current guide to training Stages 3 regarding the required datasets is yet clear. Some user has composed a guide which still confuses the datasets.
Some clarification is needed in this section, regarding how to download and set prompt_path and pretrain_dataset.
Below is an example user query I received on Slack.
torchrun --standalone --nproc_per_node=4 train_prompts.py \
--pretrain "bigscience/bloom-560m" \
--model 'bloom' \
--strategy colossalai_zero2 \
--prompt_path /data/chenhao/train/ColossalAI/prompt_dataset/data.json \ -------Where is the data.json data
--pretrain_dataset /data/chenhao/train/ColossalAI/pretrain_dataset/data.json \ -----------Where is the data.json data
--rm_pretrain /data/chenhao/train/ColossalAI/Coati-7B \
--rm_path /data/chenhao/train/ColossalAI/rmstatic.pt \
--train_batch_size 4 \
--experience_batch_size 4 \
--max_epochs 1 \
--num_episodes 1
📚 The doc issue
As raised by some users, the current guide to training Stages 3 regarding the required datasets is yet clear. Some user has composed a guide which still confuses the datasets.
Some clarification is needed in this section, regarding how to download and set
prompt_pathandpretrain_dataset.Below is an example user query I received on Slack.