I'm trying to reproduce ColossalChat. In fact, I only modified the two parameters of pretrain and model, but I encountered the error in the title.
torchrun --standalone --nproc_per_node=8 train_sft.py \
--pretrain 'IDEA-CCNL/Wenzhong-GPT2-110M' \
--model 'gpt2' \
--strategy colossalai_zero2 \
--log_interval 10 \
--save_path ./models/Coati \
--dataset ./data/instinwild_ch.json \
--batch_size 1 \
--accimulation_steps 8 \
--lr 2e-5 \
--max_datasets_size 512 \
--max_epochs 1
steps: 0%| | 0/64 [00:00<?, ?it/s]Traceback (most recent call last):
File "/data/ColossalAI/applications/Chat/train_sft.py", line 158, in <module>
train(args)
File "/data/ColossalAI/applications/Chat/train_sft.py", line 129, in train
trainer.fit(logger=logger, log_interval=args.log_interval)
File "/data/ColossalAI/applications/Chat/coati/trainer/sft.py", line 94, in fit
outputs = self.model(prompt_ids, attention_mask=p_mask, labels=labels)
File "/datafile/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'labels'
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /data/ColossalAI/applications/Chat/train_sft.py:158 in <module> │
│ │
│ 155 │ parser.add_argument('--lr', type=float, default=5e-6) │
│ 156 │ parser.add_argument('--accimulation_steps', type=int, default=8) │
│ 157 │ args = parser.parse_args() │
│ ❱ 158 │ train(args) │
│ 159 │
│ │
│ /data/ColossalAI/applications/Chat/train_sft.py:129 in train │
│ │
│ 126 │ │ │ │ │ │ max_epochs=args.max_epochs, │
│ 127 │ │ │ │ │ │ accimulation_steps=args.accimulation_steps) │
│ 128 │ │
│ ❱ 129 │ trainer.fit(logger=logger, log_interval=args.log_interval) │
│ 130 │ │
│ 131 │ # save model checkpoint after fitting on only rank0 │
│ 132 │ trainer.save_model(path=args.save_path, only_rank0=True, tokenizer=tokenizer) │
│ │
│ /data/ColossalAI/applications/Chat/coati/trainer/sft.py:94 in fit │
│ │
│ 91 │ │ │ │ # p_mask = p_mask.squeeze(1).cuda() │
│ 92 │ │ │ │ # prompt_logits = self.model(prompt_ids, attention_mask=p_mask, labels=l │
│ 93 │ │ │ │ │
│ ❱ 94 │ │ │ │ outputs = self.model(prompt_ids, attention_mask=p_mask, labels=labels) │
│ 95 │ │ │ │ │
│ 96 │ │ │ │ loss = outputs.loss │
│ 97 │ │ │ │ prompt_logits = outputs.logits │
│ │
│ /datafile/anaconda3/lib/python3.9/site-packages/torch/nn/modules/module.py:1130 in │
│ _call_impl │
│ │
│ 1127 │ │ # this function, and just call forward. │
│ 1128 │ │ if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks o │
│ 1129 │ │ │ │ or _global_forward_hooks or _global_forward_pre_hooks): │
│ ❱ 1130 │ │ │ return forward_call(*input, **kwargs) │
│ 1131 │ │ # Do not call functions when jit is used │
│ 1132 │ │ full_backward_hooks, non_full_backward_hooks = [], [] │
│ 1133 │ │ if self._backward_hooks or _global_backward_hooks: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: forward() got an unexpected keyword argument 'labels'
$ pip list | grep torch
pytorch-lightning 1.6.3
torch 1.12.1
$ pip list | grep transformers
transformers 4.28.0.dev0
🐛 Describe the bug
I'm trying to reproduce ColossalChat. In fact, I only modified the two parameters of pretrain and model, but I encountered the error in the title.
train_sft.sh
model: Wenzhong-GPT2-110M
train log:
Environment