Skip to content

Loss does not drop when using Liger Kernel at Qwen2.5 #257

@Se-Hun

Description

@Se-Hun

🐛 Describe the bug

I am trying to instruction tuning Qwen2.5-14B-Instruct with Liger Kernel.

I know that the liger kernel is supported in the dev version of huggingface transformers. However, when training the Qwen2.5 model with Liger Kernel, the loss value does not drop. Not supported yet at Qwen2.5?

Reproduce

Python Code Example :

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "Qwen/Qwen2.5-14B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

...

trainer = Trainer(
      model=model,
      args=training_args,
      train_dataset=train_dataset,
)
trainer.train()

Run Example :

deepspeed --include localhost:0,1 --master_port 61000 train.py \
    --learning_rate=1e-5 \
    --lr_scheduler_type=cosine \
    --max_length=8192 \
    --per_device_train_batch_size=4 \
    --gradient_accumulation_steps=1 \
    --evaluation_strategy=no \
    --num_train_epochs=3 \
    --save_strategy=epoch \
    --logging_strategy=steps \
    --logging_steps=1 \
    --save_total_limit=1 \
    --remove_unused_columns=False \
    --dataloader_num_workers=16 \
    --warmup_ratio=0.03 \
    --gradient_checkpointing=True \
    --torch_compile=True \
    --optim=adafactor \
    --bf16 \
    --deepspeed=./config/zero3.json \
    --use_liger_kernel=True

Versions

Environment Report:

Operating System: Linux-5.15.0-1047-oracle-x86_64-with-glibc2.35
Python version: 3.10.14
PyTorch version: 2.4.0+cu121
CUDA version: 12.1
Triton version: 3.0.0
Transformers version: 4.45.0.dev0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions