Skip to content

ATen scheduler for the new Matmul/LinearOp IR nodes#2209

Merged
Priya2698 merged 35 commits intomainfrom
pm/aten_scheduler
May 14, 2024
Merged

ATen scheduler for the new Matmul/LinearOp IR nodes#2209
Priya2698 merged 35 commits intomainfrom
pm/aten_scheduler

Conversation

@Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented May 7, 2024

  1. Adds a new scheduler -- ExprEvalScheduler that accepts the MatmulOp and LinearOp (next PR) for ATen evaluation.
  2. Modify the matmul input generator to test for all cases supported by Thunder.
  3. The eagerMatmul API is renamed and replaces the existing matmul API. fd.ops.matmul now creates a MatmulOp (except in a few special cases such as scalar dot product, for eg: [K] x [K].

Issue #2149, #2092.

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great. Sprinkle in a few tests once you merge #2175 and we'll be on our way.

@Priya2698 Priya2698 force-pushed the pm/aten_scheduler branch 2 times, most recently from 2a2786e to 314ab22 Compare May 9, 2024 22:30
@Priya2698 Priya2698 force-pushed the pm/aten_scheduler branch from 314ab22 to 1ad32ea Compare May 9, 2024 22:37
@Priya2698 Priya2698 marked this pull request as ready for review May 13, 2024 23:57
@Priya2698
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. I am just slightly concerned about removing stuff from PairwiseRootDomainMap::map.

//! Define a schedule table to loop over all the heuristics in priority order.
constexpr std::array<ScheduleHeuristic, 8> all_heuristics_in_priority_order = {
constexpr std::array<ScheduleHeuristic, 9> all_heuristics_in_priority_order = {
ScheduleHeuristic::ExprEval,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should NoOp come before ExprEval?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some cases get accepted by NoOp scheduler, which is why I prioritized ExprEval scheduler.

We may need to change the heuristics of NoOp if we want to switch the order.

Copy link
Collaborator

@jacobhinkle jacobhinkle May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Thanks for mentioning that. Does NoOp scheduler accept the cases where you have a single scalar output? Because it seems to me that it would do so based on this code:

Fuser/csrc/fusion.cpp

Lines 341 to 359 in 8c18701

bool Fusion::isNoOp() {
if (exprs().empty()) {
return true;
}
for (auto out_tv : ir_utils::filterByType<TensorView>(outputs())) {
const std::vector<IterDomain*>& root_dom =
TensorDomain::noReductions(out_tv->getMaybeRFactorDomain());
const bool size_zero =
std::any_of(root_dom.begin(), root_dom.end(), [](IterDomain* id) {
return id->extent()->isConstScalar() && id->extent()->evaluate() == 0;
});
if (!size_zero) {
return false;
}
}
return true;
}

We should add a special case for zero-dimensional outputs there On second look, it seems like size_zero would be false in the case that root_dom.empty(). However, the code below might not properly handle zero-dimensional outputs:
// Check that all outputs are either broadcast or ignored reduction.
for (auto out_tv : ir_utils::filterByType<TensorView>(fusion->outputs())) {
auto concrete_dimension = TensorDomain::noReductions(
TensorDomain::noBroadcasts(out_tv->getLeafDomain()));
if (!concrete_dimension.empty()) {
scheduler_debug_utils::canScheduleRejectReason(
heuristicType(), "output has a concrete dimension");
return false;
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[  FAILED  ] 6 tests, listed below:
[  FAILED  ] ATenNodesParametrizedTest.MatmulNodeConcrete/2, where GetParam() = ({ 32 }, { 32, 1 })
[  FAILED  ] ATenNodesParametrizedTest.MatmulNodeConcrete/8, where GetParam() = ({ 1, 32 }, { 32 })
[  FAILED  ] ATenNodesParametrizedTest.MatmulNodeConcrete/10, where GetParam() = ({ 1, 32 }, { 32, 1 })
[  FAILED  ] ATenNodesParametrizedTest.MatmulNodeSymbolic/2, where GetParam() = ({ 32 }, { 32, 1 })
[  FAILED  ] ATenNodesParametrizedTest.MatmulNodeSymbolic/8, where GetParam() = ({ 1, 32 }, { 32 })
[  FAILED  ] ATenNodesParametrizedTest.MatmulNodeSymbolic/10, where GetParam() = ({ 1, 32 }, { 32, 1 })

It is likely because there no reductions identified since we use ATen, and all the dimensions in the output are broadcast dimensions. So the cases where M/N = 1 get picked by NoOp

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after tests pass and you add some broadcasts in test.

@Priya2698
Copy link
Collaborator Author

!build

@Priya2698 Priya2698 merged commit dfba77a into main May 14, 2024
@Priya2698 Priya2698 deleted the pm/aten_scheduler branch May 14, 2024 02:42
@jjsjann123
Copy link
Collaborator

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants