Cannor reproduce results in paper

Hi,
Great work!

However, when I tried to reproduce the results reported in the paper, I got the following:
Evaluation loss and perplexity at step 11001 （60m）:

Loss: 6.293065547943115
Perplexity: 540.8086654237502
which are quite different from those reported in the paper.

![image](https://github.com/user-attachments/assets/f7c1551f-99d9-49dd-9c72-a50065751e38)

Here are the details of my script
torchrun --standalone --nproc_per_node 1 torchrun_main.py \
    --model_config  configs/llama_60m.json \
    --lr 0.003 \
    --peft_model sltrain\
    --optimizer adamw \
    --rank 128 \
    --sp_ratio 0.03 \
    --batch_size 256 \
    --total_batch_size 512 \
    --num_training_steps 11000 \
    --warmup_steps 1100 \
    --weight_decay 0 \
    --dtype bfloat16 \
    --eval_every 1000 \
    --save_dir path/to/save\
    --lora_alpha 32 

Are there any specific hyperparameters I should pay special attention to? （like lr？）
Thank you for your help!

Guinan

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannor reproduce results in paper #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Cannor reproduce results in paper #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions