Skip to content

Subchannel Block quantized GEMM#1545

Merged
timmoon10 merged 23 commits intoNVIDIA:mainfrom
kwyss-nvidia:kwyss/cublas_gemm_github_mr
Apr 7, 2025
Merged

Subchannel Block quantized GEMM#1545
timmoon10 merged 23 commits intoNVIDIA:mainfrom
kwyss-nvidia:kwyss/cublas_gemm_github_mr

Conversation

@kwyss-nvidia
Copy link
Collaborator

@kwyss-nvidia kwyss-nvidia commented Mar 6, 2025

Description

Integrates GEMM scaling modes for subchannel/block quantization.

Type of change

  • Documentation change (change only to the documentation, either a fix or a new content)
  • Bug fix (non-breaking change which fixes an issue)
  • [ x] New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Infra/Build change
  • Code refactoring

Changes

Please list the changes introduced in this PR:

  • GEMM dispatch in generic_gemm for scaling modes of GEMMs.
  • Tests for GEMM numerics

Previous bias tests were flaky due to know issue in CUBLAS upstream. Tested zero tolerance against recent build.

Would like to enable BGRADB.

Depends on quantization changes in related MR: #1513

Checklist:

  • I have read and followed the contributing guidelines
  • The functionality is complete
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

@kwyss-nvidia
Copy link
Collaborator Author

@ptrendx here is a mirror of the review with only the GEMM related changes in scope. kwyss-nvidia#1

@kwyss-nvidia kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch 4 times, most recently from eee37bf to ce4ca80 Compare March 17, 2025 17:24
@kwyss-nvidia kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from ce4ca80 to 5ebc93a Compare March 19, 2025 22:42
@kwyss-nvidia kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch 6 times, most recently from cd3e414 to f1e9e62 Compare April 4, 2025 01:17
GEMM test cases included in pytorch integration.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
@kwyss-nvidia
Copy link
Collaborator Author

/te-ci

Signed-off-by: Keith Wyss <kwyss@nvidia.com>
@kwyss-nvidia
Copy link
Collaborator Author

/te-ci

@timmoon10 timmoon10 self-requested a review April 4, 2025 22:15
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
@kwyss-nvidia kwyss-nvidia force-pushed the kwyss/cublas_gemm_github_mr branch from 32799ab to 861c870 Compare April 5, 2025 00:59
@timmoon10 timmoon10 self-requested a review April 5, 2025 01:21
Signed-off-by: Keith Wyss <kwyss@nvidia.com>
@timmoon10 timmoon10 self-requested a review April 5, 2025 04:02
timmoon10 and others added 3 commits April 6, 2025 01:11
Configure A and B matrices separately. Have separate code path for each scaling mode.

Signed-off-by: Tim Moon <tmoon@nvidia.com>
@timmoon10
Copy link
Collaborator

/te-ci L1

@kwyss-nvidia
Copy link
Collaborator Author

Looking into diagnosing the CI test failures:

OperatorTest/CTDBiasDGeluTestSuite.TestCTDBiasDgelu/float32Xfloat8e5m2X256X65536 - A100 cppunittest
test_numerics.py test_comm_gemm_overlap.py - H100 pytorch distributed unittest

kwyss-nvidia and others added 2 commits April 7, 2025 13:20
@timmoon10
Copy link
Collaborator

/te-ci pytorch

Copy link
Collaborator

@timmoon10 timmoon10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@timmoon10 timmoon10 merged commit db2aaa9 into NVIDIA:main Apr 7, 2025
11 of 12 checks passed
torch.testing.assert_close(y, y_ref, atol=atol, rtol=rtol)


def cublas_gemm_test_constraint_enforced(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this test? Maybe I'm reading this wrong but it seems to enforce that cuBLAS does not support some parameters - is this to raise awareness once cuBLAS actually starts supporting them?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we haven't verified the results of a branch, it seems better for that branch to return a descriptive error than silently succeed but possibly with bad data. This is checking that the gemm API returns an error for the cases that it shouldn't be called with.

(inputA->scaling_mode == NVTE_BLOCK_SCALING_2D)) {
NVTE_CHECK((epilogue == CUBLASLT_EPILOGUE_DEFAULT || epilogue == CUBLASLT_EPILOGUE_BIAS ||
epilogue == CUBLASLT_EPILOGUE_DGELU),
"Epilogue requested outside of the available and tested cuBLAS functionality for "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It there an available but untested functionality :-)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not as far as I know (yet). ;)

@timmoon10 timmoon10 mentioned this pull request Apr 7, 2025
13 tasks
wdykas pushed a commit to wdykas/TransformerEngine that referenced this pull request Apr 14, 2025
* Add GEMM logic for blockwise quantized tensors.

GEMM test cases included in pytorch integration.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Update NVTE_BLOCK_SCALING for GEMM.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Gate feature on CUDA 12.9

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Gemm typo.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Remove unecessary type converter change.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Reflect epilogue availability and test supported epilogues.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* GEMM simplifications from recipe branch.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Format py code.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Update GEMM DGelu tests to match support depending on output dtype.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Force pow2Scales in GEMM

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Add GEMM test to pytorch test suite.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Add copyright to GEMM test.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Update import for GEMM test.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Add license.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Update test gemm supported predicate.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Use sgemm like interfaces and naming.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Rewrite GEMM comment.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* MR Feedback.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

* Refactor GEMM param canonicalization

Configure A and B matrices separately. Have separate code path for each scaling mode.

Signed-off-by: Tim Moon <tmoon@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Prune number of tests.

Signed-off-by: Keith Wyss <kwyss@nvidia.com>

---------

Signed-off-by: Keith Wyss <kwyss@nvidia.com>
Signed-off-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: Tim Moon <tmoon@nvidia.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Tim Moon <4406448+timmoon10@users.noreply.github.com>
Signed-off-by: Peter Dykas <wdykas@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants