[Cherry-pick][CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #17

silentCoder-dev · 2025-12-26T07:43:27Z

…ep dim (apache#18583) ## Issue 1: Without Dim ### Summary: In _sum function (BaseFXGraphImporter), after retrieve_args, args[1] = [] and still pass into relax.op.sum so the result is incorrect. ### Steps to Reproduce - Module ``` class SumWithoutDim(nn.Module): def forward(self, x): return torch.sum(x) ``` ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((2, 3), dtype="float32")): with R.dataflow(): lv: R.Tensor((2, 3), dtype="float32") = R.sum(x, axis=[], keepdims=False) gv: R.Tuple(R.Tensor((2, 3), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: Input: tensor([[1., 1., 1.], [1., 1., 1.]]) Torch output: tensor(6.) Torch output shape: torch.Size([]) TVM output: [[1. 1. 1.] [1. 1. 1.]] TVM output shape: (2, 3) ### Expected ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((), dtype="float32")): with R.dataflow(): lv: R.Tensor((), dtype="float32") = R.sum(x, axis=None, keepdims=False) gv: R.Tuple(R.Tensor((), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: TVM output: 6.0; TVM output shape: () ## Issue 2: Keep Dim ### Summary: In _sum function (BaseFXGraphImporter), previously keepdim value get only from node.kwargs and no pass into relax.op.sum. Now keepdim get more from args[2] and pass into. ### Steps to Reproduce - Module ``` class SumKeepDim(nn.Module): def forward(self, x): return torch.sum(x, dim=1, keepdim=True) ``` ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((2,), dtype="float32")): with R.dataflow(): lv: R.Tensor((2,), dtype="float32") = R.sum(x, axis=[1], keepdims=False) gv: R.Tuple(R.Tensor((2,), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: Input: tensor([[1., 1., 1.], [1., 1., 1.]]) Torch output: tensor([[3.], [3.]]) Torch output shape: torch.Size([2, 1]) TVM VM output: [3. 3.] TVM VM output shape: (2,) ### Expected ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((2, 1), dtype="float32")): with R.dataflow(): lv: R.Tensor((2, 1), dtype="float32") = R.sum(x, axis=[1], keepdims=True) gv: R.Tuple(R.Tensor((2, 1), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: TVM output: [[3.] [3.]] ;TVM output shape: (2, 1)

…empty vector (apache#18586) As per title.

The ACOS operator was producing incorrect results for boundary values due to poor precision of ASIN's Taylor series expansion near x=±1.0. Root cause: - ASIN used a 6-term Taylor series that converges slowly near boundaries - ACOS was implemented as acos(x) = π/2 - asin(x), inheriting ASIN errors - At x=1.0, ASIN error of 0.354874 (22.6%) caused ACOS to output 0.354874 instead of 0.0 Solution: - Modified ASIN to use system library function (asinf) for |x| >= 0.9 - Modified ACOS to use system library function (acosf) for |x| >= 0.9 - For |x| < 0.9, continue using Taylor series (accurate in this range) This ensures high precision for boundary values while maintaining the existing behavior for values in the middle range. Fixes apache#18580

…avoid undefined symbol on non-QCOM runtimes (apache#18589) This PR is a re-open of apache#18581 The previous PR was created while Jenkins CI was experiencing a disk space issue and the CI job did not trigger. ## PR Description Recent OpenCL-Headers update (KhronosGroup/OpenCL-Headers#277 ) added QCOM perf-hint definitions (`CL_CONTEXT_PERF_HINT_QCOM`, `clSetPerfHintQCOM`) to `cl_ext.h`. These macros are now defined even on platforms whose OpenCL runtimes (e.g., PoCL, ICD loaders) do not implement the QCOM extension. TVM previously enabled the perf-hint code path solely based on the presence of `CL_CONTEXT_PERF_HINT_QCOM`, causing link errors such as: ``` undefined symbol: clSetPerfHintQCOM ``` This PR guards the QCOM perf-hint logic behind `USE_OPENCL_EXTN_QCOM`, matching the behavior of other QCOM-specific OpenCL paths (e.g., `SetNativePtr`). ## Effects Prevents accidental linking against unsupported QCOM symbols on non-QCOM runtimes. Keeps QCOM builds fully functional when `USE_OPENCL_EXTN_QCOM` is explicitly enabled. Aligns TVM’s extension handling across OpenCL code paths. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

## How - Implemented InferLayoutRepeat function that: - Preserves layout when axis is specified (with axis transformation) - Returns 1D layout when axis is not specified (flatten mode) - Transforms the axis parameter based on layout changes (e.g., NCHW axis=1 → NHWC axis=3)

## Why - LOG(WARNING) is the standard and correct approach throughout the TVM codebase - The existing pattern is used consistently in all relax ops (see test_op_manipulate.py, index.cc, etc.) - Added test coverage for previously untested scenarios

@mshr-h

…pache#18591) Hi @mshr-h @tlopex, This PR is trying to fix issue: DeprecationWarning: invalid escape sequence `\` Any suggestions would be appreciated if you are available. ### Root Cause The backslashes(`\`) inside the docstring <img width="1318" height="455" alt="image" src="https://github.com/user-attachments/assets/ca05ac7d-c598-4ec8-8bd3-a182994cbf9b" /> ### Solution Use a raw docstring(`r"""`) Co-authored-by: cchung100m <cchung100m@users.noreply.github.com>

…ult'] (apache#18574) ## Summary Happen error when create module from exported_program have torch.mean without dim. ## Reproduce - Module: ``` class MeanModule(nn.Module): def forward(self, x): return torch.mean(x) ... # Export → Relax ep = torch_export(m, (x,)) mod = from_exported_program(ep) ``` - Error log: ``` --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[2], line 13 11 # Export → Relax 12 ep = torch_export(m, (x,)) ---> 13 mod = from_exported_program(ep) 15 mod.show() 17 target = "llvm" File ~/Programming/tvm/python/tvm/relax/frontend/torch/exported_program_translator.py:1783, in from_exported_program(exported_program, keep_params_as_input, unwrap_unit_return_tuple, no_bind_return_tuple, run_ep_decomposition) 1780 if run_ep_decomposition: 1781 exported_program = exported_program.run_decompositions() -> 1783 return ExportedProgramImporter().from_exported_program( 1784 exported_program, 1785 keep_params_as_input, 1786 unwrap_unit_return_tuple, 1787 no_bind_return_tuple, 1788 ) File ~/Programming/tvm/python/tvm/relax/frontend/torch/exported_program_translator.py:1642, in ExportedProgramImporter.from_exported_program(self, exported_program, keep_params_as_input, unwrap_unit_return_tuple, no_bind_return_tuple) 1639 nodes: List[fx.Node] = exported_program.graph.nodes 1641 # Find all the missing function types -> 1642 self._check_unsupported_func_type(nodes) 1644 with self.block_builder.function( 1645 name=func_name, params=list(inputs_vars.values()).copy(), attrs=func_attrs 1646 ): 1647 output = None File ~/Programming/tvm/python/tvm/relax/frontend/torch/base_fx_graph_translator.py:182, in BaseFXGraphImporter._check_unsupported_func_type(self, nodes) 174 def _check_unsupported_func_type(self, nodes: List[fx.Node]): 175 missing_func_types = list( 176 { 177 node.target.__name__ (...) 180 } 181 ) --> 182 assert not missing_func_types, f"Unsupported function types {missing_func_types}" AssertionError: Unsupported function types ['mean.default'] ``` ## Resolve: - Add "mean.default" into create_convert_map in class ExportedProgramImporter.

…) in TVM CUDA FFI

locnd182644 and others added 11 commits December 13, 2025 02:29

[LLVM][Codegen] Avoid segfault when arith::GetVScaleValues returns …

f2930d5

…empty vector (apache#18586) As per title.

[CUDA][FFI] Add support for Programmatic Dependent Kernel Launch (PDL…

3a10765

…) in TVM CUDA FFI

tir: add launch param tag for programmatic dependent launch

c3d28b2

tir: add param tag for cuLaunchCooperativeKernel

490f6a0

silentCoder-dev closed this Dec 26, 2025

silentCoder-dev deleted the pr-18604 branch December 26, 2025 08:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cherry-pick][CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #17

[Cherry-pick][CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #17

Uh oh!

silentCoder-dev commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

[Cherry-pick][CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #17

[Cherry-pick][CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #17

Uh oh!

Conversation

silentCoder-dev commented Dec 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants