Conversation
|
!test --diff |
|
Review updated until commit cb755ef Description
|
| Relevant files | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Bug_fix |
|
PR Reviewer Guide
Here are some key observations to aid the review process:
| 🧪 No relevant tests |
| ⚡ Recommended focus areas for review |
Parameter Usage Consistency
ld_st_matrix parameter is added to lowerSrcIndex and lowerDstIndex functions but is only used in specific code paths (lines 2127, 2189, 2227, 2240). Verify that all call sites properly handle this parameter and that there are no missing usages where ldmatrix/stmatrix operations should be detected. |
| // stmatrix. Note that the explicit bool indicator of the expr is | ||
| // required to correctly determine it is a ldmatrix/stmatrix op since | ||
| // there can be an initialization op using the same output tensor | ||
| // after the allocation lowering pass. | ||
| bool shouldUseAlternateLoopDomain( |
There was a problem hiding this comment.
This is a bit ugly fix, but at the time of index lowering, the expr parameter doesn't necessarily mean it's the actual expression for the lowered operation. The state of the fusion program is not well defined here as we are still building the Kernel IR program. After the allocation lowering, a TensorView can have multiple defining expressions due to, e.g., initializations of buffers, and thus it's no longer SSA. What tv->definition() returns is the original expr, but we may be using it even when lowering the initialization.
That could cause a problem here since even though expr is a ldmatrix or stmatrix, it may not correspond to the actual op such as initializations. To find if the actual op is indeed ldmatrix or stmatrix, that information needs to be passed down from the indexing pass itself.
|
!test --diff |
|
!test --diff |
Greptile OverviewGreptile SummaryThis PR fixes a bug in the detection of ldmatrix/stmatrix operations for indexing. The issue was that the previous code would infer whether to use alternate loop domains based on the expression type, but this approach failed when a tensor defined by stmatrix was also initialized (e.g., for predicate elimination). The initialization operation would incorrectly use the alternate loop domain intended only for the actual ld/stmatrix operation. Key Changes:
This ensures that initialization ops on tensors that are also used by ld/stmatrix operations don't incorrectly use the alternate loop domain for indexing. Confidence Score: 5/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant IL as IndexLowering::handle(LoadStoreOp)
participant Lower as lowerSrcIndex/lowerDstIndex
participant Index as Index::getProducerIndex/getConsumerIndex
participant TI as TensorIndexer::getLinearIndex
participant Contig as getContigIndexFor
participant Check as shouldUseAlternateLoopDomain
IL->>IL: Determine ld_st_matrix = isLdMatrixOp || isStMatrixOp
IL->>Lower: Pass ld_st_matrix flag
Lower->>Index: Forward ld_st_matrix flag
Index->>TI: Forward ld_st_matrix flag
TI->>Contig: Forward ld_st_matrix flag
Contig->>Check: shouldUseAlternateLoopDomain(tv, expr, ld_st_matrix)
Check-->>Check: if !ld_st_matrix return false
Check-->>Check: NVF_ERROR validates expr matches flag
Check-->>Contig: Return whether to use alternate domain
Contig-->>TI: Return computed indices
TI-->>Index: Return linear index
Index-->>Lower: Return TensorIndex
Lower-->>IL: Return indexed value
|
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
rdspring1
left a comment
There was a problem hiding this comment.
LGTM.
Greptile flagged changing variable name from is_st_matrix to ls_st_matrix in csrc/index_compute.cpp
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
!build |
This was found while I was experimenting TensorIndexer with the matmul tests (#5574). Ldmatrix and stmatrix use a special domain as an alternative loop domain for indexing. IIUC, we should not use the alternate domains when initializing tensors. This happens, for example, a tensor is defined by an stmatrix op but is also initialized to zero for predicate elimination. Looks like the initialization should not be done at all, but I think that's a separate issue.
Please see https://github.com/NVIDIA/Fuser/pull/5645/files#r2600720284. The other changes are just due to this change.