Raise 400 on model mismatch when `transformers serve` is pinned by qgallouedec · Pull Request #45443 · huggingface/transformers

qgallouedec · 2026-04-14T19:14:10Z

When transformers serve is launched with a positional model argument, the server silently overwrites the "model" field in every incoming request with the pinned model id. This is surprising: a client that asks for model B receives a response generated by model A, with no indication that the requested model was ignored.

This PR changes _resolve_model in src/transformers/cli/serving/utils.py to return 400 Bad Request when the client-supplied model disagrees with the pinned one. Requests that omit model or send an empty value still fall back to the pinned model, so existing well-behaved clients are unaffected.

Reproducer

Start the server pinned to model A:

transformers serve meta-llama/Llama-3.2-1B-Instruct

Then query with a different model, either via curl:

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer x" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [{"role": "user", "content": "Hi!"}]
  }'

or via the OpenAI Python client:

# repro.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

print(client.chat.completions.create(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    messages=[{"role": "user", "content": "Hi!"}],
))

Before:

ChatCompletion(id='5b793e5a-189b-49ac-a022-7f74d60c563a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='How can I assist you today?', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1776194005, model='meta-llama/Llama-3.2-1B-Instruct@main', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=8, prompt_tokens=37, total_tokens=45, completion_tokens_details=None, prompt_tokens_details=None))

After:

Traceback (most recent call last):
  File "/fsx/qgallouedec/transformers/repro.py", line 5, in <module>
    resp = client.chat.completions.create(
        model="Qwen/Qwen2.5-0.5B-Instruct",
        messages=[{"role": "user", "content": "Hi!"}],
    )
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/resources/chat/completions/completions.py", line 1204, in create
    return self._post(
           ~~~~~~~~~~^
        "/chat/completions",
        ^^^^^^^^^^^^^^^^^^^^
    ...<47 lines>...
        stream_cls=Stream[ChatCompletionChunk],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/_base_client.py", line 1297, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/_base_client.py", line 1070, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'detail': "Server is pinned to 'meta-llama/Llama-3.2-1B-Instruct'; requested 'Qwen/Qwen2.5-0.5B-Instruct'."}

Notes

AI assistance was used to draft this change

HuggingFaceDocBuilderDev · 2026-04-14T19:31:20Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

Fine for me ! cc @LysandreJik for confirmation

…ingface#45443) * Raise 400 on model mismatch when `transformers serve` is pinned * test * style

qgallouedec added 3 commits April 14, 2026 19:10

Raise 400 on model mismatch when transformers serve is pinned

5959ea5

test

5832607

style

650668f

qgallouedec requested review from SunMarc and vasqu April 14, 2026 19:15

SunMarc requested a review from LysandreJik April 15, 2026 11:40

SunMarc approved these changes Apr 15, 2026

View reviewed changes

SunMarc added this pull request to the merge queue Apr 20, 2026

Merged via the queue into main with commit 243f2d7 Apr 20, 2026
18 checks passed

SunMarc deleted the error-model-mismatch branch April 20, 2026 15:42

lvliang-intel pushed a commit to lvliang-intel/transformers that referenced this pull request Apr 21, 2026

Raise 400 on model mismatch when transformers serve is pinned (hugg…

8ad65ce

…ingface#45443) * Raise 400 on model mismatch when `transformers serve` is pinned * test * style

artem-spector pushed a commit to artem-spector/transformers that referenced this pull request Apr 21, 2026

Raise 400 on model mismatch when transformers serve is pinned (hugg…

3dce928

…ingface#45443) * Raise 400 on model mismatch when `transformers serve` is pinned * test * style

evalstate mentioned this pull request Apr 28, 2026

Cumulative defect fixes from recent Transformers PRs evalstate/transformers#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Raise 400 on model mismatch when `transformers serve` is pinned#45443

Raise 400 on model mismatch when `transformers serve` is pinned#45443
SunMarc merged 3 commits intomainfrom
error-model-mismatch

qgallouedec commented Apr 14, 2026

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

SunMarc left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

qgallouedec commented Apr 14, 2026

Reproducer

Notes

Uh oh!

HuggingFaceDocBuilderDev commented Apr 14, 2026

Uh oh!

SunMarc left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants