Skip to content

Add scheduling support for SDPAOp#2361

Merged
Priya2698 merged 22 commits intomainfrom
pm/sdpa_schedule
Jun 11, 2024
Merged

Add scheduling support for SDPAOp#2361
Priya2698 merged 22 commits intomainfrom
pm/sdpa_schedule

Conversation

@Priya2698
Copy link
Collaborator

@Priya2698 Priya2698 commented Jun 6, 2024

Stacked on #2294.

  1. Adds the producer-consumer mapping to root domain map.
  2. Adds SDPAOp to ExprEvalScheduler.
  3. Modifies ExprEvalSched::canSchedule to skip computeAt checks and only use the compile time check since expression evaluator scheduler will only accept segments with a single expression of type MatmulOp / LinearOp / SdpaOp.,

Issue #2278

@Priya2698 Priya2698 changed the title [DO NOT MERGE] Add scheduling support for SDPAOp Add scheduling support for SDPAOp Jun 10, 2024
@Priya2698 Priya2698 requested review from jacobhinkle and naoyam June 10, 2024 21:28
@Priya2698 Priya2698 marked this pull request as ready for review June 10, 2024 21:28
@Priya2698
Copy link
Collaborator Author

!build

Copy link
Collaborator

@naoyam naoyam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the root map change required for the scheduling support?

@Priya2698
Copy link
Collaborator Author

Is the root map change required for the scheduling support?

Yes. Otherwise, we noticed with the earlier Matmul and LinearOps, that the exactMappedExtentSubstitution presegmentation pass runs into error since the producer and consumers are not trivially aligned. I will verify that the error persists if this change is removed for SDPA as well.

@jacobhinkle
Copy link
Collaborator

Is the root map change required for the scheduling support?

I suspect there might be problems with ExpressionEvaluator::propagateBoundValuesThroughExactMaps if we did not map any IterDomains for this node (or MatmulOporLinearOp`), but maybe it would be OK since we always re-use in the input extents directly so no mapping is needed?

@naoyam
Copy link
Collaborator

naoyam commented Jun 10, 2024

Thanks. Looks good to me. I'll let @jacobhinkle give a stamp.

@Priya2698
Copy link
Collaborator Author

Is the root map change required for the scheduling support?

I suspect there might be problems with ExpressionEvaluator::propagateBoundValuesThroughExactMaps if we did not map any IterDomains for this node (or MatmulOporLinearOp`), but maybe it would be OK since we always re-use in the input extents directly so no mapping is needed?

It would still not work since, the third dimension (L/S) only maps from query to output.

C++ exception with description "known_size == this_size INTERNAL ASSERT FAILED at "/opt/pytorch/nvfuser/csrc/expr_evaluator.cpp":302, please report a bug with repro script to NVFuser at https://github.com/NVIDIA/Fuser/issues. Conflicting sizes: 128, 64
Exception raised from propagateBoundValuesThroughExactMaps at /opt/pytorch/nvfuser/csrc/expr_evaluator.cpp:302 (most recent call first):

@Priya2698
Copy link
Collaborator Author

!build

Copy link
Collaborator

@jacobhinkle jacobhinkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test looks good. My comments are mostly minor


// Map N, H from any input (query/key/value)
for (auto idx : c10::irange(consumer_root.size())) {
if (idx < 2) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For idx == 0 and consumer_tv_->sameAs(op->output(2)) || consumer_tv_->sameAs(op->output(3)), these should not map since the extents differ by 1. I would just put a check before this that consumer is the output or logsumexp. Btw you might want to add accessors like op->logSumExp() to make this new condition more readable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if I use resize?
I saw an error that there was no mapped iterdomain from producer.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh ok so you are mapping the consumer root then you have an rfactor domain for that consumer which uses a Resize?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is right then.

Priya2698 and others added 3 commits June 10, 2024 17:25
Co-authored-by: Jacob Hinkle <1454944+jacobhinkle@users.noreply.github.com>
Co-authored-by: Jacob Hinkle <1454944+jacobhinkle@users.noreply.github.com>
Co-authored-by: Jacob Hinkle <1454944+jacobhinkle@users.noreply.github.com>
@Priya2698
Copy link
Collaborator Author

!build

@Priya2698 Priya2698 merged commit aa1f4eb into main Jun 11, 2024
@Priya2698 Priya2698 deleted the pm/sdpa_schedule branch June 11, 2024 03:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants