Skip to content

CTA Swizzles#90

Merged
mmigdal-nv merged 1 commit intomainfrom
MM/grid_swizzle
Mar 29, 2023
Merged

CTA Swizzles#90
mmigdal-nv merged 1 commit intomainfrom
MM/grid_swizzle

Conversation

@mmigdal-nv
Copy link
Collaborator

Cherry-pick of the changes made in #87 into main.

Adds a CTA swizzle to change the order in which the tiles of the output
matrix are processed.
This swizzle increases data reuse from A and B, when iterating over
gridDim.x. Turns out that CTAs are launched in practice by iterating
over gridDim.x first (order is unspecified though, it just happens to
behave the same). As a result, the current wave will contain CTAs that
compute square sub-matrices of C, and so, increase L2 hit rate.

Best factor seems to be 4. This will be part of the heuristics. Setting
the factor to 1 disables this swizzle.

On a 8192x8192x8192 matmul with default config, the speedup is about
20%.

An extreme example is following case: `MNK = 6144 6144 6144, layout=NT
stages=0, cta_tile = 32 32 128, warp_tile = 16 16 128, instruction_tile
= 16 16 16` where the runtime drops from 12.4 ms to 7.28ms !

Thank you @zasdfgbnm for the help.
Values measured on NVIDIA A100 SXM4 80 GB

---------

Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>
@mmigdal-nv mmigdal-nv changed the title CTA Swizzles (#87) CTA Swizzles Mar 29, 2023

// Applies swizzle factor on C
if (params.grid_swizzle_factor != 1) {
int factor = std::max(1, params.grid_swizzle_factor); // must be >=1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't consider this type of tiling to be a swizzle. We consider swizzles to be non affine transformations. Can we just get a different word for this optimization? Tile shuffle or something similar might be a good name.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already using the word swizzle for both affine and non-affine transformations. See note [WarpMmaSwizzler].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with CUTLASS's naming. What about grid scaling ? Or I'm fine with tile shuffle. So should I update @zasdfgbnm @csarofeen ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's fine then, we should probably have a non-affine vs affine swizzle rename, but doesn't seem to make a difference right now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge this PR as is so it is consistent with tracking-matmul.

Copy link
Collaborator

@csarofeen csarofeen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me, we should just be able to merge it in? CC @zasdfgbnm

@drzejan2
Copy link
Contributor

@mmigdal-nv thanks for heads up, when this is promoted to main I will update #23 to incorporate this functionality in matmul scheduler in segmenter.

@mmigdal-nv mmigdal-nv merged commit 07092a5 into main Mar 29, 2023
@mmigdal-nv mmigdal-nv deleted the MM/grid_swizzle branch March 29, 2023 18:01
zasdfgbnm added a commit that referenced this pull request Mar 29, 2023
z-shape swizzle was not used in tracking-matmul, and now we have a
better swizzle implemented in #90
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants