Skip direct-dominating live-in optimization for loop headers#281
Skip direct-dominating live-in optimization for loop headers#281
Conversation
…ment - findBestPlacement now tries rectangular shapes first, then non-rectangular connected shapes, then falls back to k-1 CGRAs (down to 1). - Removed outdated TODO comment about MapTaskOnCgraPass not supporting multi-CGRA placement. - Added assert for empty tile_shape offsets. - Cleaned up USER COMMENT annotations.
…p comments - findBestPlacement tries rect then non-rect shapes for requested cgra_count. - If placement fails, caller falls back to cgra_count-1 (reject extra CGRA). - Normalize /// to // comment style throughout MapTaskOnCgraPass. - Remove outdated TODO comments.
- SRAM centroid now includes ALL CGRA positions of multi-CGRA tasks, not just placement[0]. - SSA proximity scoring uses min distance between two multi-CGRA placements (minDistToPlacement) instead of only comparing to the other task's primary position.
|
Commits from the previous unmerged pr is shown here. This pr should start from 1f60123 (the first failed commit). |
There was a problem hiding this comment.
Pull request overview
This PR fixes an SSA live-in canonicalization bug impacting nested loops by preventing the “direct-dominating live-in” optimization from firing on loop headers, enabling correct promotion to block arguments and subsequent inner-rate PHI_START creation during ctrl-to-data-flow lowering. It also renames/rewires mapping and task placement passes and updates affected tests/docs accordingly.
Changes:
- Update
CanonicalizeLiveInPassto skip the direct-dominating-live-in optimization for loop headers (fix for #270). - Rename
MapToAcceleratorPass→MapOperationOnTilePassand update pass registration, build files, docs, and tests. - Replace
MapTaskOnCgraPasswithAllocateCgraToTaskPassand invoke global placement fromResourceAwareTaskOptimizationPass, updating taskflow tests.
Reviewed changes
Copilot reviewed 43 out of 46 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| test/neura/steer_ctrl/loop_with_return_value.mlir | Updates disabled mapping pass invocation name in test pipeline comments. |
| test/neura/fusion/test.mlir | Updates mapping pass name and adjusts FileCheck IDs after mapping changes. |
| test/neura/for_loop/relu_test.mlir | Updates mapping pass flag name in RUN line. |
| test/neura/ctrl/branch_for.mlir | Updates mapping pass flag name in RUN lines. |
| test/multi-cgra/taskflow/resource-heavy/resource-heavy.mlir | Splits long RESOPT attribute checks for readability/stability. |
| test/multi-cgra/taskflow/resnet/simple_resnet_tosa.mlir | Splits long RESOPT attribute checks for readability/stability. |
| test/multi-cgra/taskflow/parallel-nested/parallel-nested.mlir | Switches to --allocate-cgra-to-task and updates RESOPT/placement checks. |
| test/multi-cgra/taskflow/multi-nested/multi-nested.mlir | Switches to --allocate-cgra-to-task and updates placement + RESOPT expectations. |
| test/multi-cgra/taskflow/irregular-loop/irregular-loop.mlir | Switches to --allocate-cgra-to-task and splits RESOPT checks. |
| test/multi-cgra/kernel_mapping/relu/relu.mlir | Updates mapping pass flag name in RUN line. |
| test/multi-cgra/kernel_mapping/loop-in-kernel/loop-in-kernel.mlir | Updates mapping pass flag name in RUN line. |
| test/multi-cgra/kernel_mapping/fir/fir.mlir | Updates mapping pass flag name in RUN line. |
| test/mapping_quality/tiny_loop.mlir | Updates mapping pass flag name in RUN lines. |
| test/mapping_quality/branch_for.mlir | Updates mapping pass flag name in RUN lines. |
| test/honor_arch/fir_removed_tiles_test.mlir | Updates mapping pass flag name in RUN line. |
| test/e2e/relu/relu_kernel.mlir | Updates mapping pass flag name in RUN line. |
| test/e2e/histogram/histogram_kernel.mlir | Updates mapping pass flag name in RUN line. |
| test/e2e/gemv/gemv_kernel.mlir | Updates mapping pass flag name and refreshes extensive mapping/YAML/ASM FileCheck expectations. |
| test/e2e/fir/fir_kernel_vec.mlir | Updates mapping pass flag name in RUN line. |
| test/e2e/fir/fir_kernel.mlir | Updates mapping pass flag name in RUN line. |
| test/e2e/axpy/axpy_kernel.mlir | Updates mapping pass flag name in RUN line. |
| test/controflow_fuse/simple_loop_reduction/simple_loop_reduction.mlir | Updates mapping pass flag name in RUN line. |
| test/controflow_fuse/simple_loop/simple_loop.mlir | Updates mapping pass flag name in RUN line. |
| test/controflow_fuse/perfect_nested/perfect_nested.mlir | Updates mapping pass flag name in RUN line. |
| test/code_gen/test_code_generate.mlir | Updates mapping pass flag name in RUN line. |
| test/c2llvm2mlir/simple_loop/test.mlir | Updates mapping pass flag name in RUN line. |
| test/c2llvm2mlir/nested_loop/test.mlir | Updates mapping pass flag name in RUN line. |
| test/arch_spec/README.md | Updates documentation to use --map-operation-on-tile. |
| lib/TaskflowDialect/Transforms/Optimizations/ResourceAwareTaskOptimizationPass.cpp | Renames mapper references and invokes AllocateCgraToTask-based placement at convergence. |
| lib/TaskflowDialect/Transforms/CMakeLists.txt | Swaps in AllocateCgraToTaskPass.cpp and adds Optimizations subdirectory. |
| lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp | Implements new task+SRAM placement pass with multi-CGRA placement search. |
| lib/TaskflowDialect/CMakeLists.txt | Adjusts subdirectory structure so Optimizations is included via Transforms. |
| lib/NeuraDialect/Transforms/MapOperationOnTilePass.cpp | Renames mapper pass (class/argument/logs) and updates factory function. |
| lib/NeuraDialect/Transforms/CanonicalizeLiveInPass.cpp | Skips direct-dominating-live-in optimization when the using block is a loop header. |
| lib/NeuraDialect/Transforms/CMakeLists.txt | Replaces MapToAcceleratorPass.cpp with MapOperationOnTilePass.cpp. |
| lib/NeuraDialect/NeuraPasses.cpp | Updates conversion pipeline to run createMapOperationOnTilePass(). |
| include/TaskflowDialect/TaskflowPasses.td | Renames pass definition to allocate-cgra-to-task. |
| include/TaskflowDialect/TaskflowPasses.h | Updates pass factory name and adds runAllocateCgraToTask() helper declaration. |
| include/NeuraDialect/NeuraPasses.td | Renames pass definition to map-operation-on-tile and updates constructor name. |
| include/NeuraDialect/NeuraPasses.h | Renames pass factory API to createMapOperationOnTilePass. |
| include/NeuraDialect/Architecture/Architecture.h | Updates reference in comment to new mapper pass name. |
Comments suppressed due to low confidence (4)
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:285
placement.primary()returns {-1,-1} whenplacement.cgra_positionsis empty, but the code still pushes it intotask_node->placement. That can result in invalidtask_mapping_info(row/col = -1) and can skew the SRAM centroid computation in later iterations. Handle the no-placement case explicitly (e.g., signal pass failure / emit an error, or keep the task unannotated and skip SRAM assignment) before committing any placement.
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:281- When a placement can’t be found for
cgra_count, the pass falls back tocgra_count-1but does not update the task’scgra_count(ortile_shape) attribute accordingly. That can leave IR in an inconsistent state wherecgra_countdisagrees with the number of entries intask_mapping_info.cgra_positions, which downstream passes may rely on. Consider either failing the pass, or updating the attributes to match the chosen fallback placement (and possibly iterating fallback down to 1).
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:432 parseTileShapeOffsets()is introduced but never used, and the placement search currently ignores each task’stile_shapeattribute (even though other code paths/comments imply placement should respect it). Iftile_shapeis intended to constrain the physical shape, wire this helper intofindBestPlacement()(read the task’stile_shapeStringAttr, validate it matchescgra_count, then calltryPlaceShape()with those offsets). Otherwise, remove the dead helper / update comments to reflect that the allocator chooses shapes autonomously.
lib/TaskflowDialect/Transforms/AllocateCgraToTaskPass.cpp:553- The non-rectangular shape search uses a 64-bit bitmask (
1ULL << (nr * grid_cols_ + nc)). Ifgrid_rows * grid_cols >= 64, this shift becomes undefined and the mask cannot represent all cells. Add a guard/assert limiting the supported grid size (e.g., 4x4) or switch to a wider/dynamic bitset representation for visited states.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
lib/TaskflowDialect/Transforms/Optimizations/ResourceAwareTaskOptimizationPass.cpp
Show resolved
Hide resolved
Values crossing from outer blocks to inner loop headers were marked as direct-dominating live-ins, preventing block-arg promotion. This caused missing inner-rate PHI_STARTs in the dataflow IR, starving inner-loop operations of valid data each cycle. Added back-edge check: if the using block has a predecessor it dominates (i.e. it is a loop header), the value is promoted to a block argument.
1f60123 to
fa5fb36
Compare
Summary
This pr addresses issue #270.
Problem
In nested loop kernels, values defined in outer loop blocks and used in inner loop bodies were incorrectly classified as "direct dominating live-ins" by
CanonicalizeLiveInPass. This prevented them from being promoted to block arguments, which in turn causedTransformCtrlToDataFlowPassto not create inner-ratePHI_STARToperations for them.Fix
Before applying the direct-dominating optimization, check whether the using block is a loop header. If so, skip the optimization and let the value be promoted to a block argument.
Result
After the fix, the GEP pointer is correctly promoted to a block argument in
^bb4:And
TransformCtrlToDataFlowPasscreates an inner-ratePHI_STARTfor it:TODO
Have not updated all failed checks yet, will do so once the changes are reviewed and approved.
gemvhas been updated.