Skip to content

Cannor reproduce results in paper #5

@Guinan-Su

Description

@Guinan-Su

Hi,
Great work!

However, when I tried to reproduce the results reported in the paper, I got the following:
Evaluation loss and perplexity at step 11001 (60m):

Loss: 6.293065547943115
Perplexity: 540.8086654237502
which are quite different from those reported in the paper.

image

Here are the details of my script
torchrun --standalone --nproc_per_node 1 torchrun_main.py
--model_config configs/llama_60m.json
--lr 0.003
--peft_model sltrain
--optimizer adamw
--rank 128
--sp_ratio 0.03
--batch_size 256
--total_batch_size 512
--num_training_steps 11000
--warmup_steps 1100
--weight_decay 0
--dtype bfloat16
--eval_every 1000
--save_dir path/to/save
--lora_alpha 32

Are there any specific hyperparameters I should pay special attention to? (like lr?)
Thank you for your help!

Guinan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions