llama : default pooling last for qwen3 by ngxson · Pull Request #14028 · ggml-org/llama.cpp

ngxson · 2025-06-05T11:48:05Z

Maybe too late to fix upstream model (one of the GGUF already set to public), so I think we can do a quick patch in llama.cpp

CC @yuhao318 do you have any visibility about why only one GGUF is public?

CISC · 2025-06-05T11:57:18Z

                ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
+                hparams.pooling_type = LLAMA_POOLING_TYPE_LAST; // for embeddings model


Suggested change

ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);

hparams.pooling_type = LLAMA_POOLING_TYPE_LAST; // for embeddings model

hparams.pooling_type = LLAMA_POOLING_TYPE_LAST; // for embeddings model

ml.get_key(LLM_KV_POOLING_TYPE, hparams.pooling_type, false);

ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);

Perhaps use metadata if it exists, ~~and add the pooling check to conversion~~ (do they use 1_Pooling)? Ah, the check is already there, guess they don't.

For rerank model, there is be a KV metadata to switch the pooling to "rank"

So yes, it's better to do as you suggested

ngxson · 2025-06-05T12:53:57Z

superseded by the other PR

llama : default pooling last for qwen3

23c5b57

ngxson requested a review from ggerganov June 5, 2025 11:48

CISC reviewed Jun 5, 2025

View reviewed changes

ngxson mentioned this pull request Jun 5, 2025

llama : support qwen3 rerank and embeddings #14029

Closed

ngxson closed this Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : default pooling last for qwen3#14028

llama : default pooling last for qwen3#14028
ngxson wants to merge 1 commit intoggml-org:masterfrom
ngxson:xsn/qwen_embd_pooling

ngxson commented Jun 5, 2025

Uh oh!

CISC Jun 5, 2025

Uh oh!

CISC Jun 5, 2025 •

edited

Loading

Uh oh!

ngxson Jun 5, 2025 •

edited

Loading

Uh oh!

ngxson commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		ml.get_key(LLM_KV_ATTENTION_LAYERNORM_RMS_EPS, hparams.f_norm_rms_eps);
		hparams.pooling_type = LLAMA_POOLING_TYPE_LAST; // for embeddings model

Conversation

ngxson commented Jun 5, 2025

Uh oh!

CISC Jun 5, 2025

Choose a reason for hiding this comment

Uh oh!

CISC Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ngxson commented Jun 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CISC Jun 5, 2025 •

edited

Loading

ngxson Jun 5, 2025 •

edited

Loading