Switch axis we use to compute swizzled_tiles by jacobhinkle · Pull Request #4311 · NVIDIA/Fuser

jacobhinkle · 2025-04-24T19:58:29Z

#4242 turned on "grid traversal factor" which is a good thing. However, it exposed a bug in how we limit that factor to prevent overrun in case the swizzled axis has fewer tiles than the factor. This led to a regression from 58% to 35% geomean perf compared to eager on H200.

This PR swaps the axes used to compute the number of swizzled tiles and takes us from a geomean of 35% to 65% on benchmarks/python/test_matmul.py on H200.

jacobhinkle · 2025-04-24T19:58:38Z

!build

github-actions · 2025-04-24T19:59:16Z

Description

Switched axis for computing swizzled tiles to improve performance
Added memory check to skip large test cases in benchmarks

Changes walkthrough 📝

Relevant files

Enhancement

matmul_utils.cpp `Switched swizzled_tiles axis logic` csrc/scheduler/matmul_utils.cpp Changed the logic to determine the swizzled_tiles axis	+1/-1
test_matmul.py `Added memory check for large test cases` benchmarks/python/test_matmul.py Added memory check to skip large test cases	+6/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
⚡ Recommended focus areas for review Swizzled Tiles Calculation The change in the calculation of `swizzled_tiles` may have unintended consequences on performance or correctness. Ensure that this change does not introduce any edge cases or regressions. int64_t swizzled_tiles = Mtiles >= Ntiles ? Ntiles : Mtiles; Memory Check The added memory check in the benchmark tests is a good practice to prevent OOM errors. Ensure that the threshold of 20GiB is appropriate and that no important test cases are being skipped unnecessarily. if (m * k + n * k + m * n) * 2 > 20 * (230): pytest.skip("Case takes more than 20GiB. Skipping to avoid OOM") Memory Check** The added memory check in the benchmark tests is a good practice to prevent OOM errors. Ensure that the threshold of 20GiB is appropriate and that no important test cases are being skipped unnecessarily. if (m * k + n * k + m * n) * 2 > 20 * (2**30): pytest.skip("Case takes more than 20GiB. Skipping to avoid OOM")

jacobhinkle · 2025-04-24T19:59:28Z

benchmarks/python/test_matmul.py

+    if (m * k + n * k + m * n) * 2 > 20 * (2**30):
+        pytest.skip("Case takes more than 20GiB. Skipping to avoid OOM")
+


Limiting mem use to 20GB. This is conservative but we don't expect problem sizes bigger than this for DL at this time, and it prevents OOM on most devices.

Switch axis we use to compute swizzled_tiles

05ee6c6

jacobhinkle requested a review from rdspring1 April 24, 2025 19:58

jacobhinkle commented Apr 24, 2025

View reviewed changes

rdspring1 approved these changes Apr 24, 2025

View reviewed changes

jacobhinkle merged commit fadfde5 into main Apr 25, 2025
16 checks passed

jacobhinkle deleted the jh/transpose_grid_traversal_limit branch April 25, 2025 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch axis we use to compute swizzled_tiles#4311

Switch axis we use to compute swizzled_tiles#4311
jacobhinkle merged 1 commit intomainfrom
jh/transpose_grid_traversal_limit

jacobhinkle commented Apr 24, 2025

Uh oh!

jacobhinkle commented Apr 24, 2025

Uh oh!

github-actions bot commented Apr 24, 2025

Uh oh!

jacobhinkle Apr 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if (m * k + n * k + m * n) * 2 > 20 * (2**30):
		pytest.skip("Case takes more than 20GiB. Skipping to avoid OOM")

Conversation

jacobhinkle commented Apr 24, 2025

Uh oh!

jacobhinkle commented Apr 24, 2025

Uh oh!

github-actions bot commented Apr 24, 2025

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

jacobhinkle Apr 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants