Subchannel Block quantized GEMM#1545
Conversation
3d8547b to
b40a601
Compare
4560226 to
1194699
Compare
|
@ptrendx here is a mirror of the review with only the GEMM related changes in scope. kwyss-nvidia#1 |
eee37bf to
ce4ca80
Compare
ce4ca80 to
5ebc93a
Compare
cd3e414 to
f1e9e62
Compare
GEMM test cases included in pytorch integration. Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
|
/te-ci |
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
|
/te-ci |
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
32799ab to
861c870
Compare
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Configure A and B matrices separately. Have separate code path for each scaling mode. Signed-off-by: Tim Moon <tmoon@nvidia.com>
for more information, see https://pre-commit.ci
|
/te-ci L1 |
|
Looking into diagnosing the CI test failures:
|
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
|
/te-ci pytorch |
| torch.testing.assert_close(y, y_ref, atol=atol, rtol=rtol) | ||
|
|
||
|
|
||
| def cublas_gemm_test_constraint_enforced( |
There was a problem hiding this comment.
What is the reason for this test? Maybe I'm reading this wrong but it seems to enforce that cuBLAS does not support some parameters - is this to raise awareness once cuBLAS actually starts supporting them?
There was a problem hiding this comment.
If we haven't verified the results of a branch, it seems better for that branch to return a descriptive error than silently succeed but possibly with bad data. This is checking that the gemm API returns an error for the cases that it shouldn't be called with.
| (inputA->scaling_mode == NVTE_BLOCK_SCALING_2D)) { | ||
| NVTE_CHECK((epilogue == CUBLASLT_EPILOGUE_DEFAULT || epilogue == CUBLASLT_EPILOGUE_BIAS || | ||
| epilogue == CUBLASLT_EPILOGUE_DGELU), | ||
| "Epilogue requested outside of the available and tested cuBLAS functionality for " |
There was a problem hiding this comment.
It there an available but untested functionality :-)?
There was a problem hiding this comment.
Not as far as I know (yet). ;)
* Add GEMM logic for blockwise quantized tensors. GEMM test cases included in pytorch integration. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Update NVTE_BLOCK_SCALING for GEMM. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Gate feature on CUDA 12.9 Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Gemm typo. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Remove unecessary type converter change. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Reflect epilogue availability and test supported epilogues. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * GEMM simplifications from recipe branch. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Format py code. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Update GEMM DGelu tests to match support depending on output dtype. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Force pow2Scales in GEMM Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Add GEMM test to pytorch test suite. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Add copyright to GEMM test. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Update import for GEMM test. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Add license. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Update test gemm supported predicate. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Use sgemm like interfaces and naming. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Rewrite GEMM comment. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * MR Feedback. Signed-off-by: Keith Wyss <kwyss@nvidia.com> * Refactor GEMM param canonicalization Configure A and B matrices separately. Have separate code path for each scaling mode. Signed-off-by: Tim Moon <tmoon@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Prune number of tests. Signed-off-by: Keith Wyss <kwyss@nvidia.com> --------- Signed-off-by: Keith Wyss <kwyss@nvidia.com> Signed-off-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: Tim Moon <tmoon@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com> Signed-off-by: Peter Dykas <wdykas@nvidia.com>
Description
Integrates GEMM scaling modes for subchannel/block quantization.
Type of change
Changes
Please list the changes introduced in this PR:
Previous bias tests were flaky due to know issue in CUBLAS upstream. Tested zero tolerance against recent build.
Would like to enable BGRADB.
Depends on quantization changes in related MR: #1513
Checklist: