Skip to content

refactor usePresetAllocationDomain()#4843

Open
liqiangxl wants to merge 38 commits intomainfrom
llu/refactor_usePresetAllocationDomain
Open

refactor usePresetAllocationDomain()#4843
liqiangxl wants to merge 38 commits intomainfrom
llu/refactor_usePresetAllocationDomain

Conversation

@liqiangxl
Copy link
Collaborator

Optimize allocation domain exclusion using group-based lookups

Changes Made

  • Inlined getExcludedAllocationDomain() helper function directly into usePresetAllocationDomain() processing loop

  • Expensive disjointValSets().strictAreMapped() is replaced with efficient group-based lookups

🛠️ Implementation Details

Old approach:

// Expensive nested loop with graph operations
for (auto exclude_id : exclude_ca_ids) {
  if (exact_graph.disjointValSets().strictAreMapped(exclude_id, id)) {
    excluded_id = exclude_id;
    break;
  }
}

New approach:

// Efficient group-based lookup
const auto excluded_ca_groups = exact_graph.toGroups(exclude_ca_ids);
std::unordered_map<ValGroup, IterDomain*> group_to_exclude_id;

auto id_group = exact_graph.toGroup(id);
if (excluded_ca_groups.has(id_group)) {
  exclude_ca_ids.erase(group_to_exclude_id[id_group]);
}

liqiangxl and others added 30 commits July 17, 2025 06:32
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@liqiangxl liqiangxl marked this pull request as ready for review July 24, 2025 22:11
@github-actions
Copy link

Description

  • Refactored usePresetAllocationDomain() by inlining getExcludedAllocationDomain()

  • Replaced expensive disjointValSets().strictAreMapped() with group-based lookups

  • Improved allocation domain exclusion efficiency using val groups


Changes walkthrough 📝

Relevant files
Enhancement
allocation.cpp
Refactor and optimize allocation domain exclusion               

csrc/device_lower/pass/allocation.cpp

  • Removed getExcludedAllocationDomain() helper function
  • Inlined its logic into usePresetAllocationDomain()
  • Replaced disjointValSets().strictAreMapped() with group-based lookups
  • Added val group conversion and mapping for efficient exclusion checks
  • +30/-44 

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The exclude_ca_ids.erase(group_to_exclude_id[id_group]); line might cause undefined behavior if group_to_exclude_id[id_group] is not found in exclude_ca_ids. Ensure that group_to_exclude_id[id_group] is always a valid key in exclude_ca_ids.

    exclude_ca_ids.erase(group_to_exclude_id[id_group]);
    Performance Concern

    The code assumes that GpuLower::current()->hasIdModel() is always true, which could lead to runtime errors if this condition is not met. Consider adding a more robust check or handling the case where hasIdModel() returns false.

    NVF_ERROR(GpuLower::current()->hasIdModel());
    Code Clarity

    The new approach introduces additional complexity with group-based lookups. Ensure that the comments and documentation clearly explain the rationale and benefits of this change to maintain code readability and maintainability.

    // Convert excluded CA IDs to groups for efficient lookup and create mapping
    NVF_ERROR(GpuLower::current()->hasIdModel());
    const auto& exact_graph =
        GpuLower::current()->idModel().idGraph(IdMappingMode::EXACT);
    const auto excluded_ca_groups = exact_graph.toGroups(exclude_ca_ids);
    std::unordered_map<ValGroup, IterDomain*> group_to_exclude_id;
    for (auto exclude_id : exclude_ca_ids) {
      group_to_exclude_id[exact_graph.toGroup(exclude_id)] = exclude_id;
    }
    
    for (auto [idx, id] : enumerate(tv->getAllocationDomain())) {
      // Check if an allocation domain should be excluded based on allocation
      // position. An allocation domain should be excluded if it is not required
      // to be allocated. For example: T2_s_float[iS6{2}, iS11{3}, iS12{4},
      // iB8{16}] ca_pos( 2 ) logical domain : (iS6{2}, iS7{12}, iB8{16})
      // allocation domain : (iS15{3}, iS16{4}, iS6{2}, iB8{16})
      // contiguity: t t t t
      //  Split: iS7{12} by factor 4 -> iS15{3}, iS16{4}
      //  Split: iS7{12} by factor 4 -> iS11{3}, iS12{4}
      // loop domain : (iS6{2}, iS11{3}, iS12{4}, iB8{16})
      // Based on loop domain and compute pos, we don't need to allocate iS6{2}
      // and iS11{3}. Then the corresponding allocation domains iS6{2} and
      // iS15{3} should be excluded.
      auto id_group = exact_graph.toGroup(id);
      if (excluded_ca_groups.has(id_group)) {
        exclude_ca_ids.erase(group_to_exclude_id[id_group]);
        continue;
      }
    
      // Excluded based on memory partitioning
      if (ir_utils::isMemoryPartitionedAcross(
              tv->getMemoryType(), id->getParallelType())) {
        continue;
      }
    
      // Need to allocate the id
      allocation_domains.push_back(id);
      contiguity.push_back(tv->domain()->contiguity()[idx]);
    }

    Base automatically changed from llu/selfReplayLoopToAllocation to main August 8, 2025 13:33
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    1 participant