server: add Qwen3-Reranker instruction support by schwebke · Pull Request #20009 · ggml-org/llama.cpp

schwebke · 2026-03-01T14:15:41Z

This PR extends #15824 and adds optional support for the reranking task instruction
provided by Qwen3 Reranker:

... Tip: We recommend that developers customize the instruct according to their specific scenarios, tasks, and languages.

Besides the generic web-query instruction in the rerank template,
convert_hf_to_gguf.py adds a second template rerank_instruct following
the logic from the original model card:

def format_instruction(instruction, query, doc):
    if instruction is None:
        instruction = 'Given a web search query, retrieve relevant passages that answer the query'
    output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)
    return output

If a instruction string property is provided when calling llama-server and this template exists in the model,
the custom instruction will be used for reranking.
If either is missing, the static rerank template is used, maintaining existing behaviour.

Usage examples:

curl http://127.0.0.1:8033/reranking \
   -H "Content-Type: application/json" \
   -d '{
	   "instruction": "Rank the document based on its material and shape",
	   "query": "frying pan",
	   "documents": [
		"ice cream bowl",
		"ribeye steak"
	   ]
       }'

# Response
{"model":"Qwen3-Reranker-4.0B-BF16.gguf","object":"list","usage":{"prompt_tokens":146,"total_tokens":146},
"results":[{"index":0,"relevance_score":0.02189893089234829},{"index":1,"relevance_score":0.009148189797997475}]}


curl http://127.0.0.1:8033/reranking \
   -H "Content-Type: application/json" \
   -d '{
	   "instruction": "Rank the document based on its likelihood of being used together in a single workflow or culinary task",
	   "query": "frying pan",
	   "documents": [
		"ice cream bowl",
		"ribeye steak"
	   ]
       }'

# Response
{"model":"Qwen3-Reranker-4.0B-BF16.gguf","object":"list","usage":{"prompt_tokens":164,"total_tokens":164},
"results":[{"index":1,"relevance_score":0.16672350466251373},{"index":0,"relevance_score":0.024293947964906693}]}

CISC · 2026-03-01T21:16:07Z

May I suggest that this does not in fact require a second template, but merely checking that the variable instruction exists and has content (simply if instruction) in the template itself to exchange its contents for the default.

schwebke · 2026-03-02T02:03:57Z

@CISC I considered that approach, but was undecided on the best location for the default value — specifically whether it warrants a dedicated model property.

And we would turn the template silently bad for all implementations unaware of the instruction substitution, including older versions of llama-server itself.

What do you suggest?

CISC · 2026-03-02T07:51:40Z

@CISC I considered that approach, but was undecided on the best location for the default value — specifically whether it warrants a dedicated model property.

And we would turn the template silently bad for all implementations unaware of the instruction substitution, including older versions of llama-server itself.

I don't follow? If you just add an instruction if instruction else "blabla" to the template it will work regardless.

schwebke · 2026-03-02T09:17:31Z

@CISC

I don't follow? If you just add an instruction if instruction else "blabla" to the template it will work regardless.

It would in case of jinja templating. Quote from prev. #15824 however:

The template uses string substitution rather than jinja, as it seems like jinja is only used for chat messages.

Current tools/server/server-common.cpp format_prompt_rerank() just uses string_replace_all() from common/common.cpp.

CISC · 2026-03-02T09:31:25Z

It would in case of jinja templating. Quote from prev. #15824 however:

The template uses string substitution rather than jinja, as it seems like jinja is only used for chat messages.

Current tools/server/server-common.cpp format_prompt_rerank() just uses string_replace_all() from common/common.cpp.

Ahhh, I see, didn't notice the weird formatting even. :)

Well, that does indeed change things.

server: add Qwen3-Reranker instruction support

b47e0cf

schwebke requested review from CISC, ggerganov and ngxson as code owners March 1, 2026 14:15

github-actions Bot added examples python python script changes server labels Mar 1, 2026

loci-dev mentioned this pull request Mar 2, 2026

UPSTREAM PR #20009: server: add Qwen3-Reranker instruction support auroralabs-loci/llama.cpp#1217

Open

Merge branch 'ggml-org:master' into qwen3-rerank-instruct

f2e77e9

schwebke requested a review from a team as a code owner April 13, 2026 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: add Qwen3-Reranker instruction support#20009

server: add Qwen3-Reranker instruction support#20009
schwebke wants to merge 2 commits intoggml-org:masterfrom
schwebke:qwen3-rerank-instruct

schwebke commented Mar 1, 2026 •

edited

Loading

Uh oh!

CISC commented Mar 1, 2026 •

edited

Loading

Uh oh!

schwebke commented Mar 2, 2026 •

edited

Loading

Uh oh!

CISC commented Mar 2, 2026

Uh oh!

schwebke commented Mar 2, 2026

Uh oh!

CISC commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

schwebke commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

schwebke commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CISC commented Mar 2, 2026

Uh oh!

schwebke commented Mar 2, 2026

Uh oh!

CISC commented Mar 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

schwebke commented Mar 1, 2026 •

edited

Loading

CISC commented Mar 1, 2026 •

edited

Loading

schwebke commented Mar 2, 2026 •

edited

Loading