Skip to content

fix(trtllm): install pip into runtime venv for NVRTC JIT include discovery#8296

Merged
nv-yna merged 1 commit into
ai-dynamo:mainfrom
nv-yna:yna/dyn-2715
Apr 17, 2026
Merged

fix(trtllm): install pip into runtime venv for NVRTC JIT include discovery#8296
nv-yna merged 1 commit into
ai-dynamo:mainfrom
nv-yna:yna/dyn-2715

Conversation

@nv-yna
Copy link
Copy Markdown
Contributor

@nv-yna nv-yna commented Apr 17, 2026

Summary

Install pip into the TRT-LLM runtime venv so TRT-LLM's NVRTC JIT path can discover its own install location at runtime.

Root cause

On Blackwell (sm_100a), TRT-LLM 1.3.0rc11 must JIT-compile fmhaSm100aKernel_* via NVRTC because FMHA cubins are only pre-compiled for sm_90 (Hopper). TRT-LLM's JIT wrapper (cpp/include/tensorrt_llm/deep_gemm/compiler.cuh:151, getJitIncludeDirs()) discovers its install path by shelling out to pip show tensorrt_llm. From there it adds <install>/tensorrt_llm/include as an NVRTC -I option — the path that ships cuda.h and the kernel sources.

Dynamo's runtime venv is built with uv pip install, which does not install pip inside the venv. $PATH still has $VIRTUAL_ENV/bin first, but there is no pip binary there, so the subprocess falls through to /usr/bin/pip (system Python's pip), which cannot see uv-managed site-packages. pip show tensorrt_llm returns "Package(s) not found", getJitIncludeDirs() returns empty, and TRT-LLM calls nvrtcCompileProgram with zero -I options:

[E] [CudaRunner.cpp:458]: Failed to preprocess kernel fmhaSm100aKernel_Qkv...PersistentSwapsAbForGen:
    Compilation failed: NVRTC_ERROR_COMPILATION
    fmhaSm100aKernel_...(4): catastrophic error: could not open source file "cuda.h"
    (no directories in search list)
    #include <cuda.h>

Hopper is unaffected because the pre-compiled sm_90 cubins skip the JIT path entirely.

Fix

Add pip to the uv pip install line in both the runtime and dev/local-dev branches of container/templates/trtllm_runtime.Dockerfile. pip show tensorrt_llm now resolves to /opt/dynamo/venv/bin/pip (installed in the venv), which can see the dist-info — TRT-LLM gets the correct include path and NVRTC JIT succeeds.

Alternatives considered and rejected

  • CPATH=/usr/local/cuda/include — NVRTC does not honor GCC's CPATH.
  • ln -s /usr/local/cuda/include/cuda.h /usr/include/cuda.h — NVRTC does not search /usr/include; it only uses -I options.
  • apt install cuda-cudart-dev-13-1 — redundant; headers are already present. The bug is discovery, not file absence.
  • Patching getJitIncludeDirs() upstream — correct long-term fix but requires a TRT-LLM release; the pip-in-venv workaround unblocks Dynamo 1.1.0 immediately.

Test plan

Reproduced and fixed on GB200 (gb200nvl4, NVIDIA GB200 sm_100a, ARM64) using the pre-built CI arm64 image gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo-ci:652d692a...-48651348-trtllm-arm64:

  • Baseline: trtllm-serve serve Qwen/Qwen3-0.6B --backend pytorch fails with the NVRTC error above.
  • Fix validated inline: docker exec ... uv pip install pippip show tensorrt_llm resolves → trtllm-serve starts, /v1/models returns Qwen/Qwen3-0.6B, /v1/chat/completions returns a valid completion, nvidia-smi shows 171 GB used by the worker.
  • Template change renders without diff drift (verified via python3 container/render.py --framework=trtllm --target=runtime --cuda-version=13.1 --platform=arm64).
  • Fresh docker build from this branch on GB200 produces dyn-2715-main-fix:latest (28.5 GB arm64 image). trtllm-serve serve Qwen/Qwen3-0.6B --backend pytorch starts clean, /v1/models returns the model, /v1/chat/completions returns a valid completion, nvidia-smi shows 167 GB of GPU memory used by the worker process. Zero NVRTC errors in /tmp/serve.log.

Fixes DYN-2715.

Summary by CodeRabbit

  • Chores
    • Improved runtime environment configuration to ensure proper package discovery and dependency resolution.

…nstall

TRT-LLM's NVRTC JIT path (FMHA kernel compilation on Blackwell sm_100a)
discovers its install location at runtime by shelling out to
`pip show tensorrt_llm`. The runtime venv is built with `uv pip install`,
which does not place `pip` inside the venv, so the subprocess resolves
to the system `/usr/bin/pip` and cannot see uv-managed site-packages.
`pip show` then returns "Package(s) not found" and TRT-LLM passes zero
`-I` options to NVRTC, failing the FMHA JIT with:

  NVRTC_ERROR_COMPILATION ... could not open source file "cuda.h"
  (no directories in search list)

The failure only surfaces on Blackwell because sm_90 (Hopper) ships
pre-compiled cubins and never invokes NVRTC.

Fixes DYN-2715.

Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
@nv-yna nv-yna requested review from a team as code owners April 17, 2026 04:22
@github-actions github-actions Bot added the fix label Apr 17, 2026
@github-actions
Copy link
Copy Markdown
Contributor

👋 Hi nv-yna! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7c0c209a-d6d5-4618-90b7-717e31c3cde8

📥 Commits

Reviewing files that changed from the base of the PR and between b0f7d8a and 4aede83.

📒 Files selected for processing (1)
  • container/templates/trtllm_runtime.Dockerfile

Walkthrough

Updates the Dockerfile to explicitly install the pip package into the virtual environment during uv pip install commands for both dev and non-dev targets. Adds inline comments explaining that pip is required at runtime for tensorrt_llm NVRTC JIT discovery via pip show command.

Changes

Cohort / File(s) Summary
Infrastructure/Docker Configuration
container/templates/trtllm_runtime.Dockerfile
Added explicit pip package installation to venv during wheel installation for both dev and non-dev targets. Added clarifying comments that pip is required at runtime for tensorrt_llm NVRTC JIT discovery affecting FMHA kernel JIT on sm_100a.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically summarizes the main change: installing pip into the runtime venv to fix NVRTC JIT discovery for TRT-LLM.
Description check ✅ Passed The description fully addresses the template requirements with comprehensive sections: clear summary, detailed root cause analysis, concrete fix explanation, alternatives considered, and thorough test plan with verification steps.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nv-yna nv-yna requested review from dillon-cullinan, ranrubin and tanmayv25 and removed request for ranrubin April 17, 2026 06:56
@nv-yna nv-yna merged commit 94ee2aa into ai-dynamo:main Apr 17, 2026
70 checks passed
richardhuo-nv pushed a commit that referenced this pull request Apr 17, 2026
…overy (#8296)

Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
indrajit96 pushed a commit that referenced this pull request Apr 20, 2026
…overy (#8296)

Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

container external-contribution Pull request is from an external contributor fix size/S

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants