Skip to content

Fix perplexity computation, MQA/GQA models & models requiring position_ids#129

Open
fxmarty wants to merge 3 commits intomainfrom
fix-perplexity
Open

Fix perplexity computation, MQA/GQA models & models requiring position_ids#129
fxmarty wants to merge 3 commits intomainfrom
fix-perplexity

Conversation

@fxmarty
Copy link
Copy Markdown
Contributor

@fxmarty fxmarty commented Apr 10, 2024

As per title

@fxmarty
Copy link
Copy Markdown
Contributor Author

fxmarty commented Apr 10, 2024

@Giuseppe5, with this, comparing to Brevitas model_eval I do get better eval results using CUDA_VISIBLE_DEVICES=3 python quantize_llm.py --fuse-sequences (for brevitas, simply using ppl = model_eval(model, validation_dataset, args.seqlen)):

Computing perplexity...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:02<00:00, 52.39it/s]
Perplexity (original model): 68.6707534790039
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 128/128 [00:01<00:00, 72.13it/s]
brevitas ppl: tensor(80.3409, device='cuda:0')

Which kind of makes sense as in optimum-amd a minimum context length is enforced.

@fxmarty fxmarty requested a review from Giuseppe5 April 10, 2024 12:31
@fxmarty fxmarty changed the title Fix perplexity computation Fix perplexity computation, MQA/GQA models & models requiring position_ids Apr 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant