Conversation
70fcc57 to
76c8b67
Compare
|
!build |
| [&fusion](Val* out) { | ||
| return (fusion->getOutputAlias(out).type != AllocationType::Evaluate); | ||
| }); | ||
| traverseTo(outs_requiring_codegen); |
There was a problem hiding this comment.
This removes outputs marked for EE from predicate elimination to avoid errors in lowering analysis.
|
|
||
| void MatmulScheduler::schedule(Fusion* fusion) { | ||
| FUSER_PERF_SCOPE("Schedule Matmul Fusion"); | ||
| // Skip scheduling if Matmul will be expression evaluated. |
There was a problem hiding this comment.
Moving this from scheduleMatmul to schedule.
| auto params = std::make_shared<MatmulParams>(); | ||
|
|
||
| // Set kernel index mode | ||
| params->cparams.index_type = runtime_info.getIndexType(); |
There was a problem hiding this comment.
index_type is needed in compileFusion and needs to be set before returning. This will avoid errors on architectures not supported by matmul scheduler (see getMmaOp function.)
|
!build |
csrc/options.h
Outdated
| MemoryPromotion, //! Enable promotion of memory types for non-pointwise ops | ||
| StaticFusionCount, //! Enable using single static count in kernel name | ||
| WarnRegisterSpill, //! Enable warnings of register spill | ||
| MatmulExprEval, //! Enable ATen evaluation for Matmul |
There was a problem hiding this comment.
I think we need to make a louder statement here regarding the impact of this flag.
Enabling MatmulExprEval means that we are running the entire fusion containing a matmul with expression evaluation, not just the matmul portion.
|
!build |
1 similar comment
|
!build |
|
!build |
| @@ -90,10 +90,14 @@ enum class AllocationType : int { | |||
| // For example, the tensor storing BatchNorm's running mean. The output EMA is | |||
| // updated in place. | |||
| InplaceUpdate, | |||
There was a problem hiding this comment.
Renaming NoAlias->New and InplaceUpdate->ReuseBuffer will be done in a separate PR to avoid extraneous changes in this PR.
|
!build |
| default: | ||
| NVF_ERROR(false, "Unrecognized AllocationType."); |
There was a problem hiding this comment.
I doubt this is needed. Compilers today should be smart enough to figure out the above cases have covered all possible options.
There was a problem hiding this comment.
Added this to avoid the error: error: control reaches end of non-void function even though the switch-case is completely handled.
|
!build |
This PR resolves Issue #1812. 1. Removes the workarounds added to support ATen evaluation for matmuls in PR #1775 (in `predicate_elimination.cpp` and `validation.cpp` (see comments in the PR) 2. `FusionExecutor::fusion_` is initialized only when compilation is skipped. So only one of `lowered_` or `fusion_` is non-null. 3. `FusionKernelRuntime/FusionExecutorCache/FusionExecutor::isCompiled()` indicates whether compilation was attempted (either marked for EE or via nvFuser). --------- Co-authored-by: Ryan Spring <rspring@nvidia.com>
PR #1743 was reverted due to the following issues:
getMatmulHeuristicsand only setting the required parameters.GpuLower::analysis:validateMmaandPredicateEliminationare modified to skip expressions and outputs marked for expression evaluator.