Add support for Qwen3-Reranker by iamlemec · Pull Request #15824 · ggml-org/llama.cpp

iamlemec · 2025-09-05T21:21:40Z

Add support for Qwen3 reranking models. This is largely based on #14029 by @ngxson, with a few tweaks to reflect changes to the codebase in the interim.

This hardcodes the chat template provided by in the README.md, which I'm assuming is the intended usage. If folks want to be able to change that, then we'd need a new CLI option. The template uses string substitution rather than jinja, as it seems like jinja is only used for chat messages.

Edit: Here's an example usage similar to that used in the official Qwen repo. Note that \t separates queries from documents and \n separates different prompts.

build/bin/llama-embedding -m qwen3-reranker-0.6b-f32.gguf --embd-normalize -1 -p "What is the capital of China?\tThe capital of China is Beijing.\nExplain gravity\tGravity is a force that attracts two bodies towards each other."

Notice that we need to pass --embd-normalize -1 to disable normalization (the default is L2 norm).

ggerganov · 2025-09-07T18:35:48Z

Great - will take a look tomorrow. Would be useful to add a basic usage example in the OP of this PR.

iamlemec · 2025-09-08T20:14:09Z

Yup, will add a usage example up above. Actually encountering some numerical differences comparing the output here to the transformers numbers. Might want to hold off on review/merge until I can sort this out. (Or if you spot something amiss in the code, that would be great too.)

iamlemec · 2025-09-18T20:02:57Z

Ok, finally fixed it! Now we have numerical parity with the HF implementation. It turned out to be a small difference in the chat template. Should be ready for review @ggerganov.

ggerganov · 2025-09-25T08:52:49Z

-        bool last = cparams.pooling_type == LLAMA_POOLING_TYPE_LAST;
+        const bool last = (
+             cparams.pooling_type == LLAMA_POOLING_TYPE_LAST ||
+            (cparams.pooling_type == LLAMA_POOLING_TYPE_RANK && arch == LLM_ARCH_QWEN3) // qwen3 reranking & embedding models use last token


I am wondering if it makes sense to remove pooling type RANK all together from libllama? Do you have any thoughts about if having a separate pooling class RANK is really necessary?

I think you could get really close to merging RANK with LAST. The main differentiator is in llm_graph_context::build_pooling where you apply cls_out to map from the last token of the last hidden state to the classification output (usual yes/no). Unlike the other pooling types, you actually need knowledge of the model weights to do the calculation.

Backport upstream commit b5bd037 ("llama : add support for qwen3 reranker ggml-org#15824") to b6440, the last version before the Metal async backend bug (b6441+) that crashes embedding/reranker models on Apple Silicon. Changes: - Add cls.output tensor to qwen3 arch definition - Load cls_out classification head in qwen3 model loader - Support RANK pooling with only cls_out (no cls required) - Use last-token pooling for qwen3 RANK mode - Add softmax output for qwen3 reranker - Use rerank chat template when available (skip SEP token requirement) - Add one-click deployment scripts for Embedding and Reranker models Tested on Apple M1 Pro with: - Qwen3-Embedding-0.6B-Q8_0 (embedding, port 8080) - Qwen3-Reranker-4B-Q4_K_M (reranker, port 8082) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

iamlemec requested a review from ngxson as a code owner September 5, 2025 21:21

github-actions Bot added examples python python script changes server labels Sep 5, 2025

ggerganov self-requested a review September 7, 2025 18:35

ggerganov marked this pull request as draft September 10, 2025 11:52

add support for qwen3 reranker

22dd428

iamlemec force-pushed the qwen3-rerank branch from 9d73260 to 22dd428 Compare September 18, 2025 18:30

iamlemec marked this pull request as ready for review September 18, 2025 20:24

ggerganov approved these changes Sep 25, 2025

View reviewed changes

ggerganov merged commit b5bd037 into ggml-org:master Sep 25, 2025
56 of 59 checks passed

iamlemec mentioned this pull request Sep 26, 2025

llama : support qwen3 rerank and embeddings #14029

Closed

struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025

llama : add support for qwen3 reranker (ggml-org#15824)

e5819d6

pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025

llama : add support for qwen3 reranker (ggml-org#15824)

ba7ae50

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

llama : add support for qwen3 reranker (ggml-org#15824)

1bbb5d8

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

llama : add support for qwen3 reranker (#15824)

23d5266

jesper-bylund mentioned this pull request Feb 11, 2026

Qwen3 Reranker: rankAndSort() returns NaN/null scores withcatai/node-llama-cpp#550

Closed

schwebke mentioned this pull request Mar 1, 2026

server: add Qwen3-Reranker instruction support #20009

Open

Seunghhon pushed a commit to Seunghhon/llama.cpp that referenced this pull request Apr 26, 2026

llama : add support for qwen3 reranker (ggml-org#15824)

51ec958

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Qwen3-Reranker#15824

Add support for Qwen3-Reranker#15824
ggerganov merged 1 commit intoggml-org:masterfrom
iamlemec:qwen3-rerank

iamlemec commented Sep 5, 2025 •

edited

Loading

Uh oh!

ggerganov commented Sep 7, 2025

Uh oh!

iamlemec commented Sep 8, 2025

Uh oh!

iamlemec commented Sep 18, 2025

Uh oh!

ggerganov Sep 25, 2025

Uh oh!

iamlemec Sep 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

iamlemec commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Sep 7, 2025

Uh oh!

iamlemec commented Sep 8, 2025

Uh oh!

iamlemec commented Sep 18, 2025

Uh oh!

ggerganov Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

iamlemec Sep 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

iamlemec commented Sep 5, 2025 •

edited

Loading