Check for required environment before attempting installation#5609
Check for required environment before attempting installation#5609
Conversation
- Add _detect_system_cuda() helper to detect nvcc version via subprocess - Validate PyTorch CUDA major version matches system CUDA major version - Error if major versions don't match (e.g., PyTorch CUDA 13 + system CUDA 12) - Warn if minor versions don't match (e.g., PyTorch 12.1 + system 12.5) - Error if nvcc not found with CUDA toolkit install instructions - Update success message to show both PyTorch and system CUDA versions Tested scenarios: - PyTorch CUDA 12.1 vs system 12.5: warning (passes) - PyTorch CUDA 12.6 vs system 12.5: warning (passes) - PyTorch CUDA 13.0 vs system 12.5: error (major mismatch) This completes the PyTorch validation to properly check CUDA compatibility between PyTorch and the system CUDA toolkit used for building.
… -r requirements.txt
|
Review updated until commit e1a390b Description
|
| Relevant files | |||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Enhancement | 18 files
| ||||||||||||||||||||||||||||||||||||
| Documentation | 1 files
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 PR contains tests |
| ⚡ Recommended focus areas for review |
Missing Error Handling
|
Test failures
-
(Low, 1)
Minor numerical mismatch in Thunder vs Torch instance_norm nvFuser CUDA tests on float32 (dlcluster_h100).Test Name H100 Source thunder.tests.test_ops.test_core_vs_torch_consistency_instance_norm_nvfuser_cuda_thunder.dtypes.float32 ❌
… for distributed builds
…and earlier have missing Float8_e8m0fnu type causing build errors.
Single source of truth for all version constants, reducing scattered version mentions to one location.
…rerequisite-validation
|
!build |
|
!build |
There was a problem hiding this comment.
Additional Comments (1)
-
setup.py, line 5-63 (link)style: Missing NVFUSER_BUILD_SKIP_VALIDATION documentation in header comments. Should add after line 63:
# NVFUSER_BUILD_SKIP_VALIDATION=1 # Skip prerequisite validation (for CI/custom setups)Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
20 files reviewed, 2 comments
| pip_note = "" | ||
| pip_inc, pip_lib = _get_pip_nccl_paths() | ||
| if pip_inc is None: |
There was a problem hiding this comment.
logic: Logic gap: checks pip_inc but should check if both header and library are missing from pip, since both are required for build success
|
!test |
| except ImportError as e: | ||
| # Prerequisite validation not available (shouldn't happen in dev) | ||
| print(f"WARNING: Could not import prerequisite validation: {e}") |
There was a problem hiding this comment.
logic: ImportError swallowed with warning defeats PR goal. Per the PR description, this feature exists to "replace cryptic errors with actionable guidance." If the import fails and validation is skipped, users get exactly the cryptic CMake/linker errors this PR aims to prevent.
The ImportError should only occur if tools.prereqs package is missing, which shouldn't happen in a proper git clone. If it does, that's a critical environment issue that should fail the build, not continue silently.
Is this handling intended for pip-installed tarballs where the validation module might be stripped? If so, document why ImportError is acceptable and under what conditions.
|
I started seeing this build error: Reverting this PR seems to make the error gone. |
|
@naoyam What system are you on and where is your llvm version installed? |
|
Also, you can disable the checks with: NVFUSER_BUILD_SKIP_VALIDATION=1 @rdspring1 could you please add how to disable the check to all the error messages |
|
@wujingyue @mdavis36 @xwang233 I’ll let you take over this PR since you know the build systems better than I do. |
# Summary This PR aims to aid new users in setting up and installing nvFuser successfully. This is done by providing user a comprehensive python report (based on #5609) as early as possible in the build process: - Clear nvFuser library dependencies & constraints - Minimum version (when applicable) - Optional vs Required dependencies - Actionable output to user on why a requirement is enforced and how to rectify it. ## Differences (This PR vs #5609) The outcome of the report is **determined by CMake's evaluation of the constraints** we place on requirements. The report has **no effect** on the ability to build nvFuser. All failure logic is define by the CMake system. The report scripts are used to aid in formatting and printing pertinent information to the user. This is done by directly referencing CMake variables in python and allowing python to handle complicated string manipulation and formatting (which CMake is really bad at...). The contents of the help messages largely remains the same as #5609. Giving user guidance based on their build platform. ## CMake Changes - `cmake/DependencyRequirements.cmake` is the single source of truth for version requirements, components and the state of `OPTIONAL` for each dependency. - Option `NVFUSER_ENABLE_DEPENDENCY_REPORT` is by default `ON`. If this is set `OFF` then dependencies will be evaluated as "normal" in CMake and the build configuration will exit of the **first failure**. - Each requirements logic is defined in it's own `cmake/deps/handle_<name>.cmake` file for some organization/clarity. ### Success Case - CMake dependency evaluation happens silently and is written to buffer. - Python report is generated as early as possible. - **On first run**: CMake will always look for compilers for the `LANGUAGES` the project is built for first - this cant be skipped AFAIK. - **On subsequent runs**: the python report is displayed immediately (compiler information is cached). - CMake output is dumped to the user for detailed reporting (this is the same as when running with `NVFUSER_ENABLE_DEPENDECY_REPORT=Off`) <img width="869" height="1562" alt="image" src="https://github.com/user-attachments/assets/7c4fddc5-2409-473d-bab9-0203e66fa11c" /> ### Failure Case (example : pybind11 version too low) Report fails with installation instructions for users. - Does not `FATAL_ERROR` when pybind11 mismatches. - CMake still dumps the output evaluating **ALL** dependencies - CMake exits after reporting detailed output. <img width="869" height="1440" alt="image" src="https://github.com/user-attachments/assets/9d1c5134-31d8-4050-9c0d-5ae2ad71dc71" /> --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Jingyue Wu <wujingyue@gmail.com>
Summary
Validates build prerequisites before CMake runs, replacing cryptic errors with actionable guidance.
Impact: Users get clear installation instructions instead of confusing CMake/linker errors.
What's Validated
Platform
Python 3.8+
CMake 3.18+
Ninja
pybind11[global]>=2.0
PyTorch 2.0+ with CUDA 12.8+
System CUDA toolkit (major version match)
Git submodules initialized
GCC 13+
LLVM 18.1+
Skip option:
NVFUSER_BUILD_SKIP_VALIDATION=1(for CI/custom setups)Changes by File (Review Order)
Integration & Orchestration
python/setup.py(+20 lines)validate_prerequisites()before buildNVFUSER_BUILD_SKIP_VALIDATIONdocspython/tools/prereqs/validate.py(new, 126 lines)Dependency Management
requirements.txt(+3 lines)pip install -r requirements.txtpython/pyproject.toml(+1 line)Validation Modules (9 new files, ~1300 lines)
python/tools/prereqs/__init__.py(new, 56 lines)python/tools/prereqs/exceptions.py(new, 23 lines)PrerequisiteMissingErrorexception typepython/tools/prereqs/platform.py(new, 116 lines)python/tools/prereqs/python_version.py(new, 115 lines)python/tools/prereqs/build_tools.py(new, 149 lines)python/tools/prereqs/python_packages.py(new, 334 lines)nvcc --version, enforces major version matchpython/tools/prereqs/git.py(new, 142 lines)python/tools/prereqs/gcc.py(new, 165 lines)#include <format>(not just version check)python/tools/prereqs/llvm.py(new, 249 lines).llvm/.llvm/)CMakeLists.txt.llvm/install