Skip to content

server: add Qwen3-Reranker instruction support#20009

Open
schwebke wants to merge 2 commits intoggml-org:masterfrom
schwebke:qwen3-rerank-instruct
Open

server: add Qwen3-Reranker instruction support#20009
schwebke wants to merge 2 commits intoggml-org:masterfrom
schwebke:qwen3-rerank-instruct

Conversation

@schwebke
Copy link
Copy Markdown

@schwebke schwebke commented Mar 1, 2026

This PR extends #15824 and adds optional support for the reranking task instruction
provided by Qwen3 Reranker:

... Tip: We recommend that developers customize the instruct according to their specific scenarios, tasks, and languages.

Besides the generic web-query instruction in the rerank template,
convert_hf_to_gguf.py adds a second template rerank_instruct following
the logic from the original model card:

def format_instruction(instruction, query, doc):
    if instruction is None:
        instruction = 'Given a web search query, retrieve relevant passages that answer the query'
    output = "<Instruct>: {instruction}\n<Query>: {query}\n<Document>: {doc}".format(instruction=instruction,query=query, doc=doc)
    return output

If a instruction string property is provided when calling llama-server and this template exists in the model,
the custom instruction will be used for reranking.
If either is missing, the static rerank template is used, maintaining existing behaviour.

Usage examples:

curl http://127.0.0.1:8033/reranking \
   -H "Content-Type: application/json" \
   -d '{
	   "instruction": "Rank the document based on its material and shape",
	   "query": "frying pan",
	   "documents": [
		"ice cream bowl",
		"ribeye steak"
	   ]
       }'

# Response
{"model":"Qwen3-Reranker-4.0B-BF16.gguf","object":"list","usage":{"prompt_tokens":146,"total_tokens":146},
"results":[{"index":0,"relevance_score":0.02189893089234829},{"index":1,"relevance_score":0.009148189797997475}]}


curl http://127.0.0.1:8033/reranking \
   -H "Content-Type: application/json" \
   -d '{
	   "instruction": "Rank the document based on its likelihood of being used together in a single workflow or culinary task",
	   "query": "frying pan",
	   "documents": [
		"ice cream bowl",
		"ribeye steak"
	   ]
       }'

# Response
{"model":"Qwen3-Reranker-4.0B-BF16.gguf","object":"list","usage":{"prompt_tokens":164,"total_tokens":164},
"results":[{"index":1,"relevance_score":0.16672350466251373},{"index":0,"relevance_score":0.024293947964906693}]}

@github-actions github-actions Bot added examples python python script changes server labels Mar 1, 2026
@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 1, 2026

May I suggest that this does not in fact require a second template, but merely checking that the variable instruction exists and has content (simply if instruction) in the template itself to exchange its contents for the default.

@schwebke
Copy link
Copy Markdown
Author

schwebke commented Mar 2, 2026

@CISC ​I considered that approach, but was undecided on the best location for the default value — specifically whether it warrants a dedicated model property.

And we would turn the template silently bad for all implementations unaware of the instruction substitution, including older versions of llama-server itself.

What do you suggest?

@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 2, 2026

@CISC ​I considered that approach, but was undecided on the best location for the default value — specifically whether it warrants a dedicated model property.

And we would turn the template silently bad for all implementations unaware of the instruction substitution, including older versions of llama-server itself.

I don't follow? If you just add an instruction if instruction else "blabla" to the template it will work regardless.

@schwebke
Copy link
Copy Markdown
Author

schwebke commented Mar 2, 2026

@CISC

I don't follow? If you just add an instruction if instruction else "blabla" to the template it will work regardless.

It would in case of jinja templating. Quote from prev. #15824 however:

The template uses string substitution rather than jinja, as it seems like jinja is only used for chat messages.

Current tools/server/server-common.cpp format_prompt_rerank() just uses string_replace_all() from common/common.cpp.

@CISC
Copy link
Copy Markdown
Member

CISC commented Mar 2, 2026

It would in case of jinja templating. Quote from prev. #15824 however:

The template uses string substitution rather than jinja, as it seems like jinja is only used for chat messages.

Current tools/server/server-common.cpp format_prompt_rerank() just uses string_replace_all() from common/common.cpp.

Ahhh, I see, didn't notice the weird formatting even. :)

Well, that does indeed change things.

@schwebke schwebke requested a review from a team as a code owner April 13, 2026 02:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

examples python python script changes server

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants