System Info
x86, H100, Ubuntu
Who can help?
No response
Information
Tasks
Reproduction
On a system running CUDA 12.3 and H100, I installed the dependencies by running scripts referred to by Dockerfile.multi:
|
# https://www.gnu.org/software/bash/manual/html_node/Bash-Startup-Files.html |
|
# The default values come from `nvcr.io/nvidia/pytorch` |
|
ENV BASH_ENV=${BASH_ENV:-/etc/bash.bashrc} |
|
ENV ENV=${ENV:-/etc/shinit_v2} |
|
SHELL ["/bin/bash", "-c"] |
|
|
|
FROM base as devel |
|
|
|
COPY docker/common/install_base.sh install_base.sh |
|
RUN bash ./install_base.sh && rm install_base.sh |
|
|
|
COPY docker/common/install_cmake.sh install_cmake.sh |
|
RUN bash ./install_cmake.sh && rm install_cmake.sh |
|
|
|
COPY docker/common/install_ccache.sh install_ccache.sh |
|
RUN bash ./install_ccache.sh && rm install_ccache.sh |
|
|
|
# Download & install internal TRT release |
|
ARG TRT_VER |
|
ARG CUDA_VER |
|
ARG CUDNN_VER |
|
ARG NCCL_VER |
|
ARG CUBLAS_VER |
|
COPY docker/common/install_tensorrt.sh install_tensorrt.sh |
|
RUN bash ./install_tensorrt.sh \ |
|
--TRT_VER=${TRT_VER} \ |
|
--CUDA_VER=${CUDA_VER} \ |
|
--CUDNN_VER=${CUDNN_VER} \ |
|
--NCCL_VER=${NCCL_VER} \ |
|
--CUBLAS_VER=${CUBLAS_VER} && \ |
|
rm install_tensorrt.sh |
|
|
|
# Install latest Polygraphy |
|
COPY docker/common/install_polygraphy.sh install_polygraphy.sh |
|
RUN bash ./install_polygraphy.sh && rm install_polygraphy.sh |
|
|
|
# Install mpi4py |
|
COPY docker/common/install_mpi4py.sh install_mpi4py.sh |
|
RUN bash ./install_mpi4py.sh && rm install_mpi4py.sh |
|
|
|
# Install PyTorch |
|
ARG TORCH_INSTALL_TYPE="skip" |
|
COPY docker/common/install_pytorch.sh install_pytorch.sh |
|
RUN bash ./install_pytorch.sh $TORCH_INSTALL_TYPE && rm install_pytorch.sh |
by setting ENV to
~/.bashrc.
This allowed me to run the following command to build TensorRT-LLM from source code:
pip install -e . --extra-index-url https://pypi.nvidia.com
The building process is very fast, which does not look right, because it usually takes 40 minutes for build_wheel.py to build everything.
After the building, pip list shows that tensorrt-llm is installed.
$ pip list | grep tensorrt
tensorrt 9.2.0.post12.dev5
tensorrt-llm 0.9.0.dev2024020600 /root/TensorRT-LLM
However, importing it would error:
$ pythonon3 -c 'import tensorrt_llm'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/root/TensorRT-LLM/tensorrt_llm/__init__.py", line 44, in <module>
from .hlapi.llm import LLM, ModelConfig
File "/root/TensorRT-LLM/tensorrt_llm/hlapi/__init__.py", line 1, in <module>
from .llm import LLM, ModelConfig
File "/root/TensorRT-LLM/tensorrt_llm/hlapi/llm.py", line 17, in <module>
from ..executor import (GenerationExecutor, GenerationResult,
File "/root/TensorRT-LLM/tensorrt_llm/executor.py", line 11, in <module>
import tensorrt_llm.bindings as tllm
ModuleNotFoundError: No module named 'tensorrt_llm.bindings'
Expected behavior
My project requires me to build the main branch of TensorRT-LLM. It would be great if pip install could work, so I could declare TensorRT-LLM as a dependency in my project's pyproject.toml file.
actual behavior
I had to build TensorRT-LLM by invoking build_wheel.py as in https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/build_from_source.md#build-tensorrt-llm
additional notes
I was able to build vLLM with the CUDA kernels using pip -e .. Not sure if we could take their build setup as a reference.
System Info
x86, H100, Ubuntu
Who can help?
No response
Information
Tasks
examplesfolder (such as GLUE/SQuAD, ...)Reproduction
On a system running CUDA 12.3 and H100, I installed the dependencies by running scripts referred to by Dockerfile.multi:
TensorRT-LLM/docker/Dockerfile.multi
Lines 8 to 51 in 0ab9d17
~/.bashrc.This allowed me to run the following command to build TensorRT-LLM from source code:
pip install -e . --extra-index-url https://pypi.nvidia.comThe building process is very fast, which does not look right, because it usually takes 40 minutes for build_wheel.py to build everything.
After the building,
pip listshows thattensorrt-llmis installed.However, importing it would error:
Expected behavior
My project requires me to build the main branch of TensorRT-LLM. It would be great if pip install could work, so I could declare TensorRT-LLM as a dependency in my project's pyproject.toml file.
actual behavior
I had to build TensorRT-LLM by invoking build_wheel.py as in https://github.com/NVIDIA/TensorRT-LLM/blob/main/docs/source/build_from_source.md#build-tensorrt-llm
additional notes
I was able to build vLLM with the CUDA kernels using
pip -e .. Not sure if we could take their build setup as a reference.