forked from apache/tvm
-
Notifications
You must be signed in to change notification settings - Fork 15
[Cherry-pick][CUDA][FFI] Extend kernel launch config to support Programmatic Dependent Launch and cuLaunchCooperativeKernel #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ep dim (apache#18583) ## Issue 1: Without Dim ### Summary: In _sum function (BaseFXGraphImporter), after retrieve_args, args[1] = [] and still pass into relax.op.sum so the result is incorrect. ### Steps to Reproduce - Module ``` class SumWithoutDim(nn.Module): def forward(self, x): return torch.sum(x) ``` ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((2, 3), dtype="float32")): with R.dataflow(): lv: R.Tensor((2, 3), dtype="float32") = R.sum(x, axis=[], keepdims=False) gv: R.Tuple(R.Tensor((2, 3), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: Input: tensor([[1., 1., 1.], [1., 1., 1.]]) Torch output: tensor(6.) Torch output shape: torch.Size([]) TVM output: [[1. 1. 1.] [1. 1. 1.]] TVM output shape: (2, 3) ### Expected ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((), dtype="float32")): with R.dataflow(): lv: R.Tensor((), dtype="float32") = R.sum(x, axis=None, keepdims=False) gv: R.Tuple(R.Tensor((), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: TVM output: 6.0; TVM output shape: () ## Issue 2: Keep Dim ### Summary: In _sum function (BaseFXGraphImporter), previously keepdim value get only from node.kwargs and no pass into relax.op.sum. Now keepdim get more from args[2] and pass into. ### Steps to Reproduce - Module ``` class SumKeepDim(nn.Module): def forward(self, x): return torch.sum(x, dim=1, keepdim=True) ``` ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((2,), dtype="float32")): with R.dataflow(): lv: R.Tensor((2,), dtype="float32") = R.sum(x, axis=[1], keepdims=False) gv: R.Tuple(R.Tensor((2,), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: Input: tensor([[1., 1., 1.], [1., 1., 1.]]) Torch output: tensor([[3.], [3.]]) Torch output shape: torch.Size([2, 1]) TVM VM output: [3. 3.] TVM VM output shape: (2,) ### Expected ``` class Module: def main(x: R.Tensor((2, 3), dtype="float32")) -> R.Tuple(R.Tensor((2, 1), dtype="float32")): with R.dataflow(): lv: R.Tensor((2, 1), dtype="float32") = R.sum(x, axis=[1], keepdims=True) gv: R.Tuple(R.Tensor((2, 1), dtype="float32")) = (lv,) R.output(gv) return gv ``` - Result: TVM output: [[3.] [3.]] ;TVM output shape: (2, 1)
…empty vector (apache#18586) As per title.
The ACOS operator was producing incorrect results for boundary values due to poor precision of ASIN's Taylor series expansion near x=±1.0. Root cause: - ASIN used a 6-term Taylor series that converges slowly near boundaries - ACOS was implemented as acos(x) = π/2 - asin(x), inheriting ASIN errors - At x=1.0, ASIN error of 0.354874 (22.6%) caused ACOS to output 0.354874 instead of 0.0 Solution: - Modified ASIN to use system library function (asinf) for |x| >= 0.9 - Modified ACOS to use system library function (acosf) for |x| >= 0.9 - For |x| < 0.9, continue using Taylor series (accurate in this range) This ensures high precision for boundary values while maintaining the existing behavior for values in the middle range. Fixes apache#18580
…avoid undefined symbol on non-QCOM runtimes (apache#18589) This PR is a re-open of apache#18581 The previous PR was created while Jenkins CI was experiencing a disk space issue and the CI job did not trigger. ## PR Description Recent OpenCL-Headers update (KhronosGroup/OpenCL-Headers#277 ) added QCOM perf-hint definitions (`CL_CONTEXT_PERF_HINT_QCOM`, `clSetPerfHintQCOM`) to `cl_ext.h`. These macros are now defined even on platforms whose OpenCL runtimes (e.g., PoCL, ICD loaders) do not implement the QCOM extension. TVM previously enabled the perf-hint code path solely based on the presence of `CL_CONTEXT_PERF_HINT_QCOM`, causing link errors such as: ``` undefined symbol: clSetPerfHintQCOM ``` This PR guards the QCOM perf-hint logic behind `USE_OPENCL_EXTN_QCOM`, matching the behavior of other QCOM-specific OpenCL paths (e.g., `SetNativePtr`). ## Effects Prevents accidental linking against unsupported QCOM symbols on non-QCOM runtimes. Keeps QCOM builds fully functional when `USE_OPENCL_EXTN_QCOM` is explicitly enabled. Aligns TVM’s extension handling across OpenCL code paths. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
## How - Implemented InferLayoutRepeat function that: - Preserves layout when axis is specified (with axis transformation) - Returns 1D layout when axis is not specified (flatten mode) - Transforms the axis parameter based on layout changes (e.g., NCHW axis=1 → NHWC axis=3)
## Why - LOG(WARNING) is the standard and correct approach throughout the TVM codebase - The existing pattern is used consistently in all relax ops (see test_op_manipulate.py, index.cc, etc.) - Added test coverage for previously untested scenarios
…pache#18591) Hi @mshr-h @tlopex, This PR is trying to fix issue: DeprecationWarning: invalid escape sequence `\` Any suggestions would be appreciated if you are available. ### Root Cause The backslashes(`\`) inside the docstring <img width="1318" height="455" alt="image" src="https://github.com/user-attachments/assets/ca05ac7d-c598-4ec8-8bd3-a182994cbf9b" /> ### Solution Use a raw docstring(`r"""`) Co-authored-by: cchung100m <cchung100m@users.noreply.github.com>
…ult'] (apache#18574) ## Summary Happen error when create module from exported_program have torch.mean without dim. ## Reproduce - Module: ``` class MeanModule(nn.Module): def forward(self, x): return torch.mean(x) ... # Export → Relax ep = torch_export(m, (x,)) mod = from_exported_program(ep) ``` - Error log: ``` --------------------------------------------------------------------------- AssertionError Traceback (most recent call last) Cell In[2], line 13 11 # Export → Relax 12 ep = torch_export(m, (x,)) ---> 13 mod = from_exported_program(ep) 15 mod.show() 17 target = "llvm" File ~/Programming/tvm/python/tvm/relax/frontend/torch/exported_program_translator.py:1783, in from_exported_program(exported_program, keep_params_as_input, unwrap_unit_return_tuple, no_bind_return_tuple, run_ep_decomposition) 1780 if run_ep_decomposition: 1781 exported_program = exported_program.run_decompositions() -> 1783 return ExportedProgramImporter().from_exported_program( 1784 exported_program, 1785 keep_params_as_input, 1786 unwrap_unit_return_tuple, 1787 no_bind_return_tuple, 1788 ) File ~/Programming/tvm/python/tvm/relax/frontend/torch/exported_program_translator.py:1642, in ExportedProgramImporter.from_exported_program(self, exported_program, keep_params_as_input, unwrap_unit_return_tuple, no_bind_return_tuple) 1639 nodes: List[fx.Node] = exported_program.graph.nodes 1641 # Find all the missing function types -> 1642 self._check_unsupported_func_type(nodes) 1644 with self.block_builder.function( 1645 name=func_name, params=list(inputs_vars.values()).copy(), attrs=func_attrs 1646 ): 1647 output = None File ~/Programming/tvm/python/tvm/relax/frontend/torch/base_fx_graph_translator.py:182, in BaseFXGraphImporter._check_unsupported_func_type(self, nodes) 174 def _check_unsupported_func_type(self, nodes: List[fx.Node]): 175 missing_func_types = list( 176 { 177 node.target.__name__ (...) 180 } 181 ) --> 182 assert not missing_func_types, f"Unsupported function types {missing_func_types}" AssertionError: Unsupported function types ['mean.default'] ``` ## Resolve: - Add "mean.default" into create_convert_map in class ExportedProgramImporter.
…) in TVM CUDA FFI
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cherry-pick of apache/pull/18604.