Skip to content

Raise 400 on model mismatch when transformers serve is pinned#45443

Merged
SunMarc merged 3 commits intomainfrom
error-model-mismatch
Apr 20, 2026
Merged

Raise 400 on model mismatch when transformers serve is pinned#45443
SunMarc merged 3 commits intomainfrom
error-model-mismatch

Conversation

@qgallouedec
Copy link
Copy Markdown
Member

When transformers serve is launched with a positional model argument, the server silently overwrites the "model" field in every incoming request with the pinned model id. This is surprising: a client that asks for model B receives a response generated by model A, with no indication that the requested model was ignored.

This PR changes _resolve_model in src/transformers/cli/serving/utils.py to return 400 Bad Request when the client-supplied model disagrees with the pinned one. Requests that omit model or send an empty value still fall back to the pinned model, so existing well-behaved clients are unaffected.

Reproducer

Start the server pinned to model A:

transformers serve meta-llama/Llama-3.2-1B-Instruct

Then query with a different model, either via curl:

curl http://localhost:8000/v1/chat/completions \
  -H "Authorization: Bearer x" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen/Qwen2.5-0.5B-Instruct",
    "messages": [{"role": "user", "content": "Hi!"}]
  }'

or via the OpenAI Python client:

# repro.py
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="x")

print(client.chat.completions.create(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    messages=[{"role": "user", "content": "Hi!"}],
))

Before:

ChatCompletion(id='5b793e5a-189b-49ac-a022-7f74d60c563a', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='How can I assist you today?', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None))], created=1776194005, model='meta-llama/Llama-3.2-1B-Instruct@main', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=8, prompt_tokens=37, total_tokens=45, completion_tokens_details=None, prompt_tokens_details=None))

After:

Traceback (most recent call last):
  File "/fsx/qgallouedec/transformers/repro.py", line 5, in <module>
    resp = client.chat.completions.create(
        model="Qwen/Qwen2.5-0.5B-Instruct",
        messages=[{"role": "user", "content": "Hi!"}],
    )
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/_utils/_utils.py", line 286, in wrapper
    return func(*args, **kwargs)
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/resources/chat/completions/completions.py", line 1204, in create
    return self._post(
           ~~~~~~~~~~^
        "/chat/completions",
        ^^^^^^^^^^^^^^^^^^^^
    ...<47 lines>...
        stream_cls=Stream[ChatCompletionChunk],
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/_base_client.py", line 1297, in post
    return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
                           ~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/fsx/qgallouedec/miniconda3/envs/trl/lib/python3.13/site-packages/openai/_base_client.py", line 1070, in request
    raise self._make_status_error_from_response(err.response) from None
openai.BadRequestError: Error code: 400 - {'detail': "Server is pinned to 'meta-llama/Llama-3.2-1B-Instruct'; requested 'Qwen/Qwen2.5-0.5B-Instruct'."}

Notes

  • AI assistance was used to draft this change

@qgallouedec qgallouedec requested review from SunMarc and vasqu April 14, 2026 19:15
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@SunMarc SunMarc requested a review from LysandreJik April 15, 2026 11:40
Copy link
Copy Markdown
Member

@SunMarc SunMarc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine for me ! cc @LysandreJik for confirmation

@SunMarc SunMarc added this pull request to the merge queue Apr 20, 2026
Merged via the queue into main with commit 243f2d7 Apr 20, 2026
18 checks passed
@SunMarc SunMarc deleted the error-model-mismatch branch April 20, 2026 15:42
lvliang-intel pushed a commit to lvliang-intel/transformers that referenced this pull request Apr 21, 2026
…ingface#45443)

* Raise 400 on model mismatch when `transformers serve` is pinned

* test

* style
artem-spector pushed a commit to artem-spector/transformers that referenced this pull request Apr 21, 2026
…ingface#45443)

* Raise 400 on model mismatch when `transformers serve` is pinned

* test

* style
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants