Conversation
Member
Author
|
@bot /style |
qgallouedec
approved these changes
Apr 28, 2026
Member
There was a problem hiding this comment.
thanks. For context the motivation of this PR is that I occasionally get this kind of error, and this should help to debug.
[...]
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52036 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52036 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52036 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52036 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52036 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
2026-04-28 02:46:28,290 - ContinuousBatchingLogger - ERROR - Error in generation loop: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 1101, in _run_generation_loop
self._inner_generation_loop(batch_processor)
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 1124, in _inner_generation_loop
self._generation_step()
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 1040, in _generation_step
self.batch_processor._generation_step(self.model)
File "/opt/conda/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 538, in _generation_step
self.capture_graph(forward_fn, compute_stream, *args)
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 546, in capture_graph
forward_fn(*args)
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 569, in _forward_process_and_sample
logits = self._model_forward(model, batch_data).float() # convert to fp32 to match generate
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/generation/continuous_batching/continuous_api.py", line 575, in _model_forward
return model(**batch_data).logits
^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/utils/generic.py", line 887, in wrapper
output = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 462, in forward
outputs: BaseModelOutputWithPast = self.model(
^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/utils/generic.py", line 963, in wrapper
output = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/utils/output_capturing.py", line 248, in wrapper
outputs = func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 399, in forward
hidden_states = decoder_layer(
^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/modeling_layers.py", line 93, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 293, in forward
hidden_states, _ = self.self_attn(
^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 222, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1784, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/kernels/layer/func.py", line 297, in forward
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/models/qwen2/modeling_qwen2.py", line 145, in apply_rotary_pos_emb
k_embed = (k * cos) + (rotate_half(k) * sin)
~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~
torch.AcceleratorError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
INFO: ::1:52036 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: ::1:52056 - "POST /v1/chat/completions HTTP/1.1" 200 OK
INFO: ::1:52052 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error
ERROR: Exception in ASGI application
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/uvicorn/protocols/http/h11_impl.py", line 415, in run_asgi
result = await app( # type: ignore[func-returns-value]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/uvicorn/middleware/proxy_headers.py", line 56, in __call__
return await self.app(scope, receive, send)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/fastapi/applications.py", line 1159, in __call__
await super().__call__(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/applications.py", line 90, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 191, in __call__
with recv_stream, send_stream, collapse_excgroups():
File "/opt/conda/lib/python3.11/contextlib.py", line 158, in __exit__
self.gen.throw(typ, value, traceback)
File "/opt/conda/lib/python3.11/site-packages/starlette/_utils.py", line 87, in collapse_excgroups
raise exc
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 193, in __call__
response = await self.dispatch_func(request, call_next)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/cli/serving/server.py", line 85, in request_id_middleware
response = await call_next(request)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 168, in call_next
raise app_exc from app_exc.__cause__ or app_exc.__context__
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/base.py", line 144, in coro
await self.app(scope, receive_or_disconnect, send_no_error)
File "/opt/conda/lib/python3.11/site-packages/starlette/middleware/exceptions.py", line 63, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.11/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 660, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 680, in app
await route.handle(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/routing.py", line 276, in handle
await self.app(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/fastapi/routing.py", line 134, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
raise exc
File "/opt/conda/lib/python3.11/site-packages/starlette/_exception_handler.py", line 42, in wrapped_app
await app(scope, receive, sender)
File "/opt/conda/lib/python3.11/site-packages/fastapi/routing.py", line 120, in app
response = await f(request)
^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/fastapi/routing.py", line 674, in app
raw_response = await run_endpoint_function(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/fastapi/routing.py", line 328, in run_endpoint_function
return await dependant.call(**values)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/cli/serving/server.py", line 93, in chat_completions
return await chat_handler.handle_request(body, request.state.request_id)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/cli/serving/chat_completion.py", line 158, in handle_request
return await self._non_streaming(
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/cli/serving/chat_completion.py", line 299, in _non_streaming
parsed = parse_tool_calls(processor, generated_ids, tool_config["schema"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/cli/serving/utils.py", line 150, in parse_tool_calls
parsed = processor.parse_response(generated_ids, schema)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/conda/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 3328, in parse_response
(isinstance(response, list) and not isinstance(response[0], int))
~~~~~~~~^^^
IndexError: list index out of range
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
[transformers] [Request received] Model: Qwen/Qwen2.5-7B-Instruct@main, CB: True
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Member
Author
|
@bot /style |
Contributor
|
Style fix fix runs successfully without any file modified. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
This PR adds better support for CB when the CB worker thread dies due to unexpected errors. We display clearly to the user that they need to restart the server.
cc @qgallouedec