Skip to content

[BUG]: UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value #5930

@zhurunhua

Description

@zhurunhua

Is there an existing issue for this bug?

  • I have searched the existing issues

🐛 Describe the bug

I got an error when run applications/Colossal-LLaMA/prepare_sft_dataset.py

image

the script is :
python /mnt/data/tool/ColossalAI-0.4.0/applications/Colossal-LLaMA/prepare_sft_dataset.py \ --data_input_dirs "/mnt/data/dataset/llama3/prepare/original/2000items" \ --tokenizer_dir "/mnt/data/model/modelscope/Meta-Llama-3-8B-Instruct" \ --data_output_dirs "/mnt/data/dataset/llama3/prepare/2000items-llama3" \ --max_length 1024 \ --num_spliced_dataset_bins 10 \ --llama_version 3

the error is:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. [07/19/24 16:52:06] INFO colossalai - colossalai - INFO: /mnt/data/tool/ColossalAI-0.4.0/applications/Colossal-LLaMA/prepare_sft_dataset.py:102 main INFO colossalai - colossalai - INFO: Start to process part-0/10 of all original datasets. Traceback (most recent call last): File "/mnt/data/tool/ColossalAI-0.4.0/applications/Colossal-LLaMA/prepare_sft_dataset.py", line 147, in <module> main() File "/mnt/data/tool/ColossalAI-0.4.0/applications/Colossal-LLaMA/prepare_sft_dataset.py", line 106, in main "tokenizer": tokenizer, ^^^^^^^^^ UnboundLocalError: cannot access local variable 'default_conversation' where it is not associated with a value

I've solved this bug and commit a PR soon...

Environment

● ubuntu22.04
● CPU:96c;
● RAM:736 GiB;
● GPU:8 * NVIDIA V100 (32GB)
● Python 3.11.5;
● ColossalAI 0.4.0;
● cuda_11.8;
● pytorch 2.1.0+cu118

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions