Handling allocation domain of the input TensorViews in the matmul scheduler by protonu · Pull Request #2309 · NVIDIA/Fuser

protonu · 2024-05-28T21:15:26Z

In this PR we extend the matmul scheduler to support inputs with allocation domains.

To the fusion (with inputs tv_a and tv_b), we add two LoadStoreOps to both inputs.
The first Op corresponds to a load to shared memory, where we propagate the allocation domain. The second op corresponds to reading to registers, where we don't propagate the allocation domain since the scheduler takes charge of setting the allocation domain in the registers. Based on the difference in the (maybe)allocation domain of the producer and consumer of the second LoadStoreOp, we may do transposed load when reading to registers.

See also #2315.

csrc/scheduler/mma_utils.cpp

jacobhinkle

It seems there are a few changes here:

Don't propagate allocation domain when using cacheAfter on smem buffers, since we need the loaded register buffers to have allocation domains matching their root domains.
Change scheduleLdMatrix to check consumer/producer innermost allocation ID to see whether transpose is needed. Previously this was signalled by the LoadStoreOpType on that op.
Use allocation domain instead of root domain for orderTiledConcreteIdAsRoot. This is called only on shared memory TVs, and we now have possibly non-trivial allocation domains on those tensors.

I think this generally seems fine. I do have a question that we can address in the future: how should we handle cases where there is a transposed operand which has a prologue that comes before the transpose? It seems that we still rely on using the smem->register load for transposing but in such a case that will come after the prologue.

jacobhinkle · 2024-05-29T00:22:57Z

csrc/scheduler/mma_utils.cpp

+      {consumer->getMaybeAllocationDomain().back()});
+
+  auto ids = ir_utils::filterByType<IterDomain>(vals);
+  auto idsOnPath = std::vector<IterDomain*>(ids.begin(), ids.end());


I don't think this line is needed is it? Just use ids instead of idsOnPath. Also a nit: const on all these variables.

I sort of based it on this:

Fuser/csrc/ir/utils.cpp

Lines 738 to 748 in 5e0c89b

std::vector<IterDomain*> allIDsOf(const TensorView* tv) {

const auto& root_domain = tv->getRootDomain();

const auto& domain = tv->getLeafDomain();

// Grab all values in the history of the tensor view's domain

auto all_vals = DependencyCheck::getAllValsBetween(

{root_domain.begin(), root_domain.end()}, {domain.begin(), domain.end()});

// Filter so we only have iteration domains (ignore Ints used in split)

auto all_ids = ir_utils::filterByType<IterDomain>(all_vals);

return std::vector<IterDomain*>(all_ids.begin(), all_ids.end());

}

// Filter so we only have iteration domains (ignore Ints used in split)
auto all_ids = ir_utils::filterByType(all_vals);
return std::vector<IterDomain*>(all_ids.begin(), all_ids.end());

csrc/scheduler/mma_utils.cpp

jacobhinkle · 2024-05-29T00:25:57Z

csrc/scheduler/mma_utils.cpp

+  // Get all the IDs from the innermost ID of the allocation domain of
+  // the consumer to the root domain of the consumer.
+  auto vals = DependencyCheck::getAllValsBetween(
+      {consumer->getRootDomain().begin(), consumer->getRootDomain().end()},


Do you need to filter out broadcast and reduction domains?

csrc/scheduler/matmul.cpp

tests/cpp/test_matmul_scheduler.cpp

zasdfgbnm · 2024-06-03T17:27:22Z

This PR needs rebase so that changes in #2315 is excluded from the diff of this PR.

…as alloc (n,k)

…ose needs work

protonu · 2024-06-03T18:24:01Z

!build

protonu · 2024-06-03T19:26:07Z

!build

protonu · 2024-06-03T19:50:20Z

!build

…eduler (#2309) In this PR we extend the matmul scheduler to support inputs with allocation domains. To the fusion (with inputs tv_a and tv_b), we add two LoadStoreOps to both inputs. The first Op corresponds to a load to shared memory, where we propagate the allocation domain. The second op corresponds to reading to registers, where we don't propagate the allocation domain since the scheduler takes charge of setting the allocation domain in the registers. Based on the difference in the (maybe)allocation domain of the producer and consumer of the second LoadStoreOp, we may do transposed load when reading to registers. ![image](https://github.com/NVIDIA/Fuser/assets/10635897/89395990-9b85-4ce1-8e7d-006e43a86b85) See also #2315.

protonu requested review from jacobhinkle, jjsjann123, kevinstephano, naoyam and zasdfgbnm May 28, 2024 21:15

zasdfgbnm reviewed May 28, 2024

View reviewed changes

csrc/scheduler/mma_utils.cpp Outdated Show resolved Hide resolved

jacobhinkle reviewed May 28, 2024

View reviewed changes

csrc/scheduler/mma_utils.cpp Outdated Show resolved Hide resolved

csrc/scheduler/mma_utils.cpp Outdated Show resolved Hide resolved

jacobhinkle reviewed May 29, 2024

View reviewed changes

jacobhinkle mentioned this pull request May 30, 2024

Translate MatmulOp and LinearOp #2236

Merged

protonu force-pushed the pbasu_experiment_alloc_domai branch from 5e0c89b to 32f15f7 Compare May 31, 2024 22:29

protonu added 19 commits June 3, 2024 17:44

changing func signature and cleaning up test

3967011

clean up

69e0944

getting (m,k,n) case with unit stride K running

ed2dd8d

getting a matmul with the dim ordering (m,n,k) running when input B h…

3512038

…as alloc (n,k)

running a matmul with inputs A[M,K], B[N,K] with no allocation domains

240c793

adding more tests for matmul (they run correctly) -- has inner transp…

81fc44a

…ose needs work

clean up the tests

4aabc06

clean up

9aeafc0

modify test for vec params

562a8ae

comments

1d17f64

format

94a2647

changing check for transpose

cb26bcf

tests

e82404a

determining when to transpose

560ae9f

cacheAfter without allocation domain propagation

eb82a00

modifying tests - needs more work

7a540a1

minor edits

6b0cbde

adding a few more tests

22ae689

rebase

57ec488

protonu added 3 commits June 3, 2024 17:48

rebase and some clean up

5a64f88

modifying tests

87a4e54

rebase

7b0fd07

protonu force-pushed the pbasu_experiment_alloc_domai branch from 32f15f7 to 7b0fd07 Compare June 3, 2024 18:05

protonu added 3 commits June 3, 2024 18:09

minor edit

07ea439

comments

d285e1b

Merge branch 'main' into pbasu_experiment_alloc_domai

d8cb71e

Merge branch 'main' into pbasu_experiment_alloc_domai

8e41824

protonu added 2 commits June 3, 2024 19:49

editing comment

79bb93d

Merge branch 'main' into pbasu_experiment_alloc_domai

747f65c

protonu changed the title ~~[WIP] Handling allocation domain of the input TensorViews in the matmul scheduler~~ Handling allocation domain of the input TensorViews in the matmul scheduler Jun 3, 2024

protonu requested review from jacobhinkle and zasdfgbnm June 3, 2024 19:50

protonu marked this pull request as ready for review June 3, 2024 19:51

zasdfgbnm approved these changes Jun 4, 2024

View reviewed changes

protonu merged commit 4b427fb into main Jun 4, 2024

protonu deleted the pbasu_experiment_alloc_domai branch June 4, 2024 05:22

liqiangxl mentioned this pull request Jul 16, 2025

don't propage allocation domain to cached inputs in normalization scheduler #4723

Closed

	std::vector<IterDomain> allIDsOf(const TensorView tv) {
	const auto& root_domain = tv->getRootDomain();
	const auto& domain = tv->getLeafDomain();
	// Grab all values in the history of the tensor view's domain
	auto all_vals = DependencyCheck::getAllValsBetween(
	{root_domain.begin(), root_domain.end()}, {domain.begin(), domain.end()});

	// Filter so we only have iteration domains (ignore Ints used in split)
	auto all_ids = ir_utils::filterByType<IterDomain>(all_vals);
	return std::vector<IterDomain*>(all_ids.begin(), all_ids.end());
	}

Conversation

protonu commented May 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jacobhinkle left a comment

Choose a reason for hiding this comment

Uh oh!

jacobhinkle May 29, 2024

Choose a reason for hiding this comment

Uh oh!

protonu May 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jacobhinkle May 29, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zasdfgbnm commented Jun 3, 2024

Uh oh!

protonu commented Jun 3, 2024

Uh oh!

protonu commented Jun 3, 2024

Uh oh!

protonu commented Jun 3, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

protonu commented May 28, 2024 •

edited

Loading