Skip to content

Matmul default scheduling #1775

Merged
Priya2698 merged 33 commits intomainfrom
pm/mma_default
Mar 1, 2024
Merged

Matmul default scheduling #1775
Priya2698 merged 33 commits intomainfrom
pm/mma_default

Conversation

@Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented Feb 16, 2024

PR #1743 was reverted due to the following issues:

  1. Matmul scheduler does not support all architectures: This caused some tests to fail on V100. This is fixed by returning early in getMatmulHeuristics and only setting the required parameters.
  2. Errors in GpuLower::analysis: validateMma and PredicateElimination are modified to skip expressions and outputs marked for expression evaluator.

@Priya2698
Copy link
Collaborator Author

!build

[&fusion](Val* out) {
return (fusion->getOutputAlias(out).type != AllocationType::Evaluate);
});
traverseTo(outs_requiring_codegen);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This removes outputs marked for EE from predicate elimination to avoid errors in lowering analysis.


void MatmulScheduler::schedule(Fusion* fusion) {
FUSER_PERF_SCOPE("Schedule Matmul Fusion");
// Skip scheduling if Matmul will be expression evaluated.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this from scheduleMatmul to schedule.

auto params = std::make_shared<MatmulParams>();

// Set kernel index mode
params->cparams.index_type = runtime_info.getIndexType();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

index_type is needed in compileFusion and needs to be set before returning. This will avoid errors on architectures not supported by matmul scheduler (see getMmaOp function.)

@Priya2698
Copy link
Collaborator Author

!build

csrc/options.h Outdated
MemoryPromotion, //! Enable promotion of memory types for non-pointwise ops
StaticFusionCount, //! Enable using single static count in kernel name
WarnRegisterSpill, //! Enable warnings of register spill
MatmulExprEval, //! Enable ATen evaluation for Matmul
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make a louder statement here regarding the impact of this flag.

Enabling MatmulExprEval means that we are running the entire fusion containing a matmul with expression evaluation, not just the matmul portion.

@protonu protonu requested review from protonu and removed request for protonu February 22, 2024 20:39
@Priya2698
Copy link
Collaborator Author

!build

1 similar comment
@Priya2698
Copy link
Collaborator Author

!build

@Priya2698 Priya2698 marked this pull request as ready for review February 23, 2024 23:36
@Priya2698
Copy link
Collaborator Author

!build

@Priya2698 Priya2698 changed the title [WIP] Matmul default scheduling Matmul default scheduling Feb 23, 2024
@@ -90,10 +90,14 @@ enum class AllocationType : int {
// For example, the tensor storing BatchNorm's running mean. The output EMA is
// updated in place.
InplaceUpdate,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renaming NoAlias->New and InplaceUpdate->ReuseBuffer will be done in a separate PR to avoid extraneous changes in this PR.

@Priya2698
Copy link
Collaborator Author

!build

Copy link
Collaborator

@wujingyue wujingyue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with comments

Comment on lines +1017 to +1018
default:
NVF_ERROR(false, "Unrecognized AllocationType.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt this is needed. Compilers today should be smart enough to figure out the above cases have covered all possible options.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this to avoid the error: error: control reaches end of non-void function even though the switch-case is completely handled.

@Priya2698
Copy link
Collaborator Author

!build

@Priya2698 Priya2698 merged commit 5cdbfc5 into main Mar 1, 2024
@Priya2698 Priya2698 deleted the pm/mma_default branch March 1, 2024 05:53
@Priya2698 Priya2698 mentioned this pull request Mar 1, 2024
@Priya2698 Priya2698 mentioned this pull request Mar 25, 2024
Priya2698 added a commit that referenced this pull request Apr 2, 2024
Adds ATen evaluation for Matmul and Matmul + Bias. Based on PR #1921,
when evaluating a `castOp`, we _look back_ to see if there is a
preceding MmaOp and evaluate them together.

Issue #1775.
Priya2698 added a commit that referenced this pull request Apr 26, 2024
This PR resolves Issue #1812.
1. Removes the workarounds added to support ATen evaluation for matmuls
in PR #1775 (in `predicate_elimination.cpp` and `validation.cpp` (see
comments in the PR)
2. `FusionExecutor::fusion_` is initialized only when compilation is
skipped. So only one of `lowered_` or `fusion_` is non-null.
3.
`FusionKernelRuntime/FusionExecutorCache/FusionExecutor::isCompiled()`
indicates whether compilation was attempted (either marked for EE or via
nvFuser).

---------
Co-authored-by: Ryan Spring <rspring@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants