Skip to content

Fix CUDA errors in sharded generation with Qwen3#41734

Open
SrijanUpadhyay wants to merge 1 commit intohuggingface:mainfrom
SrijanUpadhyay:fix-sharded-generation-nans
Open

Fix CUDA errors in sharded generation with Qwen3#41734
SrijanUpadhyay wants to merge 1 commit intohuggingface:mainfrom
SrijanUpadhyay:fix-sharded-generation-nans

Conversation

@SrijanUpadhyay
Copy link
Copy Markdown
Contributor

Issue #41720: CUDA asserts during multi-GPU generation with Qwen3 models due to NaN/Inf in hidden states.

Changes:

  • Enhanced InfNanRemoveLogitsProcessor to handle hidden state stabilization
  • Added automatic remove_invalid_values=True for sharded models
  • Removed direct nan handling from Qwen3 model for cleaner architecture

Fixes #41720

Issue huggingface#41720: CUDA asserts during multi-GPU generation with Qwen3 models due to NaN/Inf in hidden states.

Changes:
- Enhanced InfNanRemoveLogitsProcessor to handle hidden state stabilization
- Added automatic remove_invalid_values=True for sharded models
- Removed direct nan handling from Qwen3 model for cleaner architecture

Fixes huggingface#41720
@SrijanUpadhyay
Copy link
Copy Markdown
Contributor Author

Hey! @vasqu, i have made these changes, please look into it and provide me feedback on this PR.

@Bobchenyx
Copy link
Copy Markdown

Hi there, thanks for this potential fix. I’m truly grateful that you’re taking the time to look into this issue. I pulled and built your branch locally, but I’m still running into a similar problem.
I’ve attached the logs / error messages I’m seeing below in case that helps with debugging.

(moe-pwe) user1@nnmc67:~/workspace/bobchenyx/MoE-PWE$ CUDA_VISIBLE_DEVICES=0,1 python qwen3-generate.py 
Loading model from: ../Qwen/Qwen3-30B-A3B-Instruct-2507
Using device: cuda
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 16/16 [00:17<00:00,  1.11s/it]
Model loaded successfully!

Prompt: Explain the concept of Mixture of Experts(MoE) in a few sentences.
/pytorch/aten/src/ATen/native/cuda/TensorCompare.cu:112: _assert_async_cuda_kernel: block: [0,0,0], thread: [0,0,0] Assertion `probability tensor contains either `inf`, `nan` or element < 0` failed.
Traceback (most recent call last):
  File "/home/user1/workspace/bobchenyx/MoE-PWE/qwen3-generate.py", line 78, in <module>
    main()
  File "/home/user1/workspace/bobchenyx/MoE-PWE/qwen3-generate.py", line 42, in main
    outputs = model.generate(
  File "/home/user1/miniconda3/envs/moe-pwe/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
  File "/home/user1/miniconda3/envs/moe-pwe/lib/python3.10/site-packages/transformers/generation/utils.py", line 2695, in generate
    result = decoding_method(
  File "/home/user1/miniconda3/envs/moe-pwe/lib/python3.10/site-packages/transformers/generation/utils.py", line 2903, in _sample
    while self._has_unfinished_sequences(this_peer_finished, synced_gpus, device=input_ids.device):
  File "/home/user1/miniconda3/envs/moe-pwe/lib/python3.10/site-packages/transformers/generation/utils.py", line 2721, in _has_unfinished_sequences
    elif this_peer_finished:
torch.AcceleratorError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

(moe-pwe) user1@nnmc67:~/workspace/bobchenyx/MoE-PWE$ pip show transformers
Name: transformers
Version: 5.0.0.dev0
Summary: Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /home/user1/miniconda3/envs/moe-pwe/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm, typer-slim
Required-by: 
(moe-pwe) user1@nnmc67:~/workspace/bobchenyx/MoE-PWE$ 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Qwen3 with auto device mapping fails due to cudaErrorAssert on A800

2 participants