-
Notifications
You must be signed in to change notification settings - Fork 3.8k
[Unity][Pass] BindSymVar pass
#15246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Collaborator
|
Thanks for contributing to TVM! Please refer to the contributing guidelines https://tvm.apache.org/docs/contribute/ for useful information and tips. Please request code reviews from Reviewers by @-ing them in a comment.
Generated by tvm-bot |
Contributor
Author
|
@tvm-bot rerun |
Contributor
|
Failed to re-run CI in https://github.com/apache/tvm/actions/runs/5543328554 Detailswith response |
This PR introduces the operator root mean square, `rms_norm`, into TOPI and relax, and its legalize transform.
* allow capturing input parameters in a cuda graph * remove unnecessary cudaGraphLaunch * support cuda graph for cutlass * add test * add test for cutlass * revert LiftTransformParams change * comment * update test * update builtin * update * delete exec properly * run cuda graph twice in the test to make sure cached launch works
This PR adds these features to the gradient system:
- Checkpointing for gradient pass
- `tvm.relax.testing.nn.checkpoint`
- `tvm.relax.op.grad.start_checkpoint` and `tvm.relax.op.grad.start_checkpoint`
- Support in the Gradient pass
- Fix several minor problems in op_gradient
…casting (apache#15330) This PR fixes a bug of the previous decode-GeMV dlight scheduling. Previously, when the inner dimension of the largest tensor is spatial, in the end the fused epilogue block was not bound to any thread axis, which is wrong and will generate wrong GPU code with wrong numerical results. That is because after doing reverse-compute-at of the epilogue block, there are at lease one remaining spatial axis, and such axis is supposed to be bound to threadIdx. This PR fixes this issue, and add three test cases which can cover both the reduction-inner and spatial-inner cases with or without broadcasting.
039bfd0 to
1b70941
Compare
Contributor
Author
|
Closed as it is merged with another PR #15509 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR introduces an utility pass that binds symbolic variables to user-provided integer values.
For example, say we have a following IRModule.
We can conveniently bind the symbolic variable by applying
After = relax.transform.BindSymVars("main", {"m": 10, "n": 10})(Before).This would be useful when specializing shape and providing compile-time shape info (e.g., model params or batch sizes) by eliminating the need to rewrite the model.
cc. @tqchen @psrivas2