fix(trtllm): install pip into runtime venv for NVRTC JIT include discovery#8296
Conversation
…nstall TRT-LLM's NVRTC JIT path (FMHA kernel compilation on Blackwell sm_100a) discovers its install location at runtime by shelling out to `pip show tensorrt_llm`. The runtime venv is built with `uv pip install`, which does not place `pip` inside the venv, so the subprocess resolves to the system `/usr/bin/pip` and cannot see uv-managed site-packages. `pip show` then returns "Package(s) not found" and TRT-LLM passes zero `-I` options to NVRTC, failing the FMHA JIT with: NVRTC_ERROR_COMPILATION ... could not open source file "cuda.h" (no directories in search list) The failure only surfaces on Blackwell because sm_90 (Hopper) ships pre-compiled cubins and never invokes NVRTC. Fixes DYN-2715. Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com>
|
👋 Hi nv-yna! Thank you for contributing to ai-dynamo/dynamo. Just a reminder: The 🚀 |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughUpdates the Dockerfile to explicitly install the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…overy (#8296) Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com>
…overy (#8296) Signed-off-by: Yuewei Na <nv-yna@users.noreply.github.com> Co-authored-by: Yuewei Na <nv-yna@users.noreply.github.com> Signed-off-by: Indrajit Bhosale <iamindrajitb@gmail.com>
Summary
Install
pipinto the TRT-LLM runtime venv so TRT-LLM's NVRTC JIT path can discover its own install location at runtime.Root cause
On Blackwell (sm_100a), TRT-LLM 1.3.0rc11 must JIT-compile
fmhaSm100aKernel_*via NVRTC because FMHA cubins are only pre-compiled for sm_90 (Hopper). TRT-LLM's JIT wrapper (cpp/include/tensorrt_llm/deep_gemm/compiler.cuh:151,getJitIncludeDirs()) discovers its install path by shelling out topip show tensorrt_llm. From there it adds<install>/tensorrt_llm/includeas an NVRTC-Ioption — the path that shipscuda.hand the kernel sources.Dynamo's runtime venv is built with
uv pip install, which does not installpipinside the venv.$PATHstill has$VIRTUAL_ENV/binfirst, but there is nopipbinary there, so the subprocess falls through to/usr/bin/pip(system Python's pip), which cannot see uv-managed site-packages.pip show tensorrt_llmreturns "Package(s) not found",getJitIncludeDirs()returns empty, and TRT-LLM callsnvrtcCompileProgramwith zero-Ioptions:Hopper is unaffected because the pre-compiled sm_90 cubins skip the JIT path entirely.
Fix
Add
pipto theuv pip installline in both the runtime and dev/local-dev branches ofcontainer/templates/trtllm_runtime.Dockerfile.pip show tensorrt_llmnow resolves to/opt/dynamo/venv/bin/pip(installed in the venv), which can see the dist-info — TRT-LLM gets the correct include path and NVRTC JIT succeeds.Alternatives considered and rejected
CPATH=/usr/local/cuda/include— NVRTC does not honor GCC'sCPATH.ln -s /usr/local/cuda/include/cuda.h /usr/include/cuda.h— NVRTC does not search/usr/include; it only uses-Ioptions.apt install cuda-cudart-dev-13-1— redundant; headers are already present. The bug is discovery, not file absence.getJitIncludeDirs()upstream — correct long-term fix but requires a TRT-LLM release; the pip-in-venv workaround unblocks Dynamo 1.1.0 immediately.Test plan
Reproduced and fixed on GB200 (gb200nvl4, NVIDIA GB200 sm_100a, ARM64) using the pre-built CI arm64 image
gitlab-master.nvidia.com:5005/dl/ai-dynamo/dynamo-ci:652d692a...-48651348-trtllm-arm64:trtllm-serve serve Qwen/Qwen3-0.6B --backend pytorchfails with the NVRTC error above.docker exec ... uv pip install pip→pip show tensorrt_llmresolves →trtllm-servestarts,/v1/modelsreturnsQwen/Qwen3-0.6B,/v1/chat/completionsreturns a valid completion,nvidia-smishows 171 GB used by the worker.python3 container/render.py --framework=trtllm --target=runtime --cuda-version=13.1 --platform=arm64).docker buildfrom this branch on GB200 producesdyn-2715-main-fix:latest(28.5 GB arm64 image).trtllm-serve serve Qwen/Qwen3-0.6B --backend pytorchstarts clean,/v1/modelsreturns the model,/v1/chat/completionsreturns a valid completion,nvidia-smishows 167 GB of GPU memory used by the worker process. Zero NVRTC errors in/tmp/serve.log.Fixes DYN-2715.
Summary by CodeRabbit