demo how to schedule allocation domain and get domains to be allocated by liqiangxl · Pull Request #4791 · NVIDIA/Fuser

liqiangxl · 2025-07-17T13:33:20Z

Following #4792
This PR added a test to manually schedule allocation domain and use IdModel to detect mapping between scheduled allocation domain and loop domain.
Auto schedule is added in a following PR at #4795

github-actions · 2025-07-17T13:34:02Z

Review updated until commit 8763303

Description

Added test for scheduling allocation domain with CpAsyncBulk1d
Enhanced getAllocationDomainsAndContiguity to use IdModel for mapping
Included necessary headers and cleaned up code

Changes walkthrough 📝

Relevant files

Enhancement

allocation.cpp `Use IdModel for ID mapping in allocation domain setup` csrc/device_lower/pass/allocation.cpp Added logic to use IdModel for mapping excluded IDs in `getAllocationDomainsAndContiguity`	+10/-0
test_allocation_domain.cpp `Add CpAsyncBulk1d test and clean up` tests/cpp/test_allocation_domain.cpp Added new test case `CpAsyncBulk1d` for scheduling allocation domain Included necessary header for inlining Cleaned up code and comments	+77/-0

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests

⚡ Recommended focus areas for review

IdModel Usage

Ensure that the use of IdModel is appropriate and that it correctly identifies mappings between scheduled allocation domains and loop domains.

// Fallback: use IdModel to check if any excluded ID is mapped
if (GpuLower::current()->hasIdModel()) {
  const auto& exact_graph =
      GpuLower::current()->idModel().idGraph(IdMappingMode::EXACT);
  for (auto exclude_id : exclude_ca_ids) {
    if (exact_graph.disjointValSets().strictAreMapped(exclude_id, id)) {
      return exclude_id;
    }
  }
}

Test Coverage

Verify that the test case covers all necessary scenarios and edge cases for the allocation domain scheduling.

TEST_F(AllocationDomainTest, CpAsyncBulk1d) {
  NVFUSER_TEST_CUDA_ARCH_GUARD(9, 0);
  auto fusion = std::make_unique<Fusion>();
  FusionGuard fg(fusion.get());
  int64_t x = 2L, y = 12L, z = 16L;
  auto tv0 = makeContigConcreteTensor({x, y, z});
  fusion->addInput(tv0);
  std::vector<IterDomain*> tv0_dom = {tv0->axis(1), tv0->axis(0), tv0->axis(2)};
  tv0->setAllocationDomain(tv0_dom, true);
  auto tv2 = add(tv0, tv0);
  fusion->addOutput(tv2);

  auto tv1 = tv0->cacheAfter(LoadStoreOpType::CpAsyncBulk);
  tv1->setMemoryType(MemoryType::Shared);
  tv1->axis(-1)->parallelize(ParallelType::Bulk);

  for (auto tv : fusion->allTvs()) {
    // [2, 3, 4, 16]
    tv->split(1, 4);
  }

  inlineSelectedAt({tv1}, tv1, /*reference_pos=*/2);

  // Before fix, we have:

  // T2_s_float[iS6{2}, iS11{3}, iS12{4}, iB8{16}] ca_pos( 2 )
  // logical domain : (iS6{2}, iS7{12}, iB8{16})
  // allocation domain : (iS7{12}, iS6{2}, iB8{16})
  // contiguity: t t t
  //  Split: iS7{12} by factor 4 -> iS11{3}, iS12{4}
  // loop domain : (iS6{2}, iS11{3}, iS12{4}, iB8{16})

  // T2 is computed at pos 2, we don't need to allocate domains iS6{2} and
  // iS11{3} nvFuser tries to exclude these two domains from the allocation
  // domain, however, iS11{3} doesn't exist in the allocation domain, so it's
  // not excluded and this is considered a failed case.

  // To fix, we can reaplay transforms on the allocation domain.
  // How to split the allocation domain?
  // Create AbstractTensor from current allocation domain
  // Apply the same split transformation to the allocation domain
  // Update the allocation domain
  AbstractTensor alloc_tensor(tv1->getAllocationDomain());
  alloc_tensor.split(0, 4);
  tv1->setAllocationDomain(alloc_tensor.as<IterDomain*>(), true);
  // after this change to allocation domain, we have:
  // T2_s_float[iS6{2}, iS11{3}, iS12{4}, iB8{16}] ca_pos( 2 )
  // logical domain : (iS6{2}, iS7{12}, iB8{16})
  // allocation domain : (iS15{3}, iS16{4}, iS6{2}, iB8{16})
  // contiguity: t t t t
  //  Split: iS7{12} by factor 4 -> iS15{3}, iS16{4}
  //  Split: iS7{12} by factor 4 -> iS11{3}, iS12{4}
  // loop domain : (iS6{2}, iS11{3}, iS12{4}, iB8{16})

  // Based on loop domain and compute pos, we don't need to allocate iS6{2} and
  // iS11{3}. However, the corresponding allocation domain of iS11{3} is
  // iS15{3}. How do we map them in getAllocationDomainsAndContiguity()? use
  // IdModel if pointer comparison fails IdModel maintains a disjointValSets
  // id_sets: disjoint sets{
  //   { iS3{2}; iS6{2}; iS0{2} }
  //   { iS4{12}; iS7{12}; iS1{12} }
  //   { iS13{3}; iS11{3}; iS15{3}; iS9{3} }
  //   { iS14{4}; iS12{4}; iS16{4}; iS10{4} }
  //   { iS5{16}; iB8{16}; iS2{16} }
  // }
  // where iS11{3} and iS15{3} are in the same set.

  auto options = at::TensorOptions().dtype(at::kFloat).device(at::kCUDA);
  // shape: (x, y, z), alloc: (y, x, z), stride: (z, x * z, 1)
  auto t0 = at::randn({x, y, z}, options).as_strided({x, y, z}, {z, x * z, 1});
  KernelExecutor ke;
  ke.compile(fusion.get(), {t0});
  auto outputs = ke.run({t0});
  testValidate(fusion.get(), outputs, {t0}, __LINE__, __FILE__);
}

liqiangxl · 2025-07-17T14:05:42Z

!test

liqiangxl · 2025-07-17T14:53:58Z

!test

liqiangxl · 2025-07-22T15:51:18Z

!test

jjsjann123

LGTM

jjsjann123 · 2025-07-23T21:05:11Z

tests/cpp/test_allocation_domain.cpp

+  // }
+  // where iS11{3} and iS15{3} are in the same set.
+
+  fusion->print();


nitpick: remove debug code.

jjsjann123 · 2025-07-23T21:06:43Z

tests/cpp/test_allocation_domain.cpp

+  // Update the allocation domain
+  AbstractTensor alloc_tensor(tv1->getAllocationDomain());
+  alloc_tensor.split(0, 4);
+  tv1->setAllocationDomain(alloc_tensor.as<IterDomain*>(), true);


IIUC, the replay on allocation domain needs to be done by the scheduler. So there's going to be another PR plumbing that?

Yes, it is here

Fuser/csrc/scheduler/normalization_inner_outer_tma_ws.cpp

Line 887 in e95d4a6

// replay loop domain transformations to allocation domain for shared memory

jjsjann123 · 2025-07-23T21:07:32Z

csrc/device_lower/pass/allocation.cpp

+          return exclude_id;
+        }
+      }
+    }


Looks like the existing comment for this function already contains this piece, well planned sir 😆

liqiangxl · 2025-07-24T00:47:48Z

!build

…loop domain (NVIDIA#4791) (1) added a test to manually schedule allocation domain. (2) use IdModel to detect mapping between scheduled allocation domain and loop domain.

liqiangxl changed the base branch from main to llu/refactor_getAllocationDomainsAndContiguity July 17, 2025 14:02

liqiangxl force-pushed the llu/get_domains_should_be_allocated branch from de0c7fb to 1338ae0 Compare July 17, 2025 14:53

refactor getAllocationDomainsAndContiguity

3a06741

liqiangxl force-pushed the llu/refactor_getAllocationDomainsAndContiguity branch from 040e349 to 3a06741 Compare July 17, 2025 18:14

use idmodel, schedule allocation domain

5e7592e

liqiangxl force-pushed the llu/get_domains_should_be_allocated branch from 1338ae0 to 5e7592e Compare July 17, 2025 19:18

liqiangxl mentioned this pull request Jul 17, 2025

replay loop domain transforms to allocation domain #4795

Merged

liqiangxl marked this pull request as ready for review July 22, 2025 01:09

liqiangxl mentioned this pull request Jul 22, 2025

refactor getAllocationDomainsAndContiguity #4792

Merged

Base automatically changed from llu/refactor_getAllocationDomainsAndContiguity to main July 22, 2025 12:15

Merge branch 'main' into llu/get_domains_should_be_allocated

b1f54a5

liqiangxl requested a review from jjsjann123 July 22, 2025 15:52

jjsjann123 approved these changes Jul 23, 2025

View reviewed changes

liqiangxl added 2 commits July 23, 2025 17:37

Merge branch 'main' into llu/get_domains_should_be_allocated

953fed9

clean

8763303

liqiangxl merged commit b7afdc0 into main Jul 24, 2025
17 checks passed

liqiangxl deleted the llu/get_domains_should_be_allocated branch July 24, 2025 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo how to schedule allocation domain and get domains to be allocated#4791

demo how to schedule allocation domain and get domains to be allocated#4791
liqiangxl merged 5 commits intomainfrom
llu/get_domains_should_be_allocated

liqiangxl commented Jul 17, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

liqiangxl commented Jul 17, 2025

Uh oh!

liqiangxl commented Jul 17, 2025

Uh oh!

liqiangxl commented Jul 22, 2025

Uh oh!

jjsjann123 left a comment

Uh oh!

jjsjann123 Jul 23, 2025

Uh oh!

jjsjann123 Jul 23, 2025

Uh oh!

liqiangxl Jul 24, 2025

Uh oh!

jjsjann123 Jul 23, 2025

Uh oh!

liqiangxl commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

liqiangxl commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes walkthrough 📝

PR Reviewer Guide 🔍

Uh oh!

liqiangxl commented Jul 17, 2025

Uh oh!

liqiangxl commented Jul 17, 2025

Uh oh!

liqiangxl commented Jul 22, 2025

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

liqiangxl Jul 24, 2025

Choose a reason for hiding this comment

Uh oh!

jjsjann123 Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

liqiangxl commented Jul 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liqiangxl commented Jul 17, 2025 •

edited

Loading

github-actions bot commented Jul 17, 2025 •

edited

Loading