Skip to content

Make sure unswitch predicates propagate maximum (or minimum) indices through merge-inner or split-outer domains#2689

Merged
naoyam merged 11 commits intomainfrom
idmodel_indexing_unswitch_fix
Jul 27, 2024
Merged

Make sure unswitch predicates propagate maximum (or minimum) indices through merge-inner or split-outer domains#2689
naoyam merged 11 commits intomainfrom
idmodel_indexing_unswitch_fix

Conversation

@naoyam
Copy link
Collaborator

@naoyam naoyam commented Jul 25, 2024

(Stacked on top of #2677)

The original issue is #681. It was addressed in #687.

This PR is NOT as comprehensive as #687, but my gut feeling is that this should be good enough, in particular since contig indexing would avoid backward traversals through merge in many cases.

I'll do final more comprehensive comparison with the legacy indexing once contig indexing is done.

Since the original PR and issue were reviewed by @zasdfgbnm, could you please review this too?

@naoyam naoyam requested a review from zasdfgbnm July 25, 2024 22:35
@naoyam naoyam added the idmodel label Jul 25, 2024
EXPECT_TRUE(tv->axis(0)->isBroadcast());
}

// Repro of unswitch predicate issue #681
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just moved to test_indexing.cpp

@naoyam naoyam force-pushed the idmodel_indexing_unswitch_fix branch from 5424eb8 to 40f8097 Compare July 26, 2024 04:06
naoyam added 3 commits July 25, 2024 21:44
Not as comprehensive as it's originally done but likely good enough
@naoyam naoyam force-pushed the idmodel_indexing_unswitch_fix branch from 7943d93 to 4852041 Compare July 26, 2024 05:05
@naoyam
Copy link
Collaborator Author

naoyam commented Jul 26, 2024

!build

Base automatically changed from idmodel_indexing_circular_buffer_predicate to main July 27, 2024 01:21
@naoyam naoyam merged commit 4004238 into main Jul 27, 2024
@naoyam naoyam deleted the idmodel_indexing_unswitch_fix branch July 27, 2024 03:19
return loop_promotion_map_it->second;
}

// Check if unswitching a given for-loop actually matters. For example,
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zasdfgbnm Forgot to add this code. I already merged this but let me know if you have any concern.

naoyam added a commit that referenced this pull request Nov 21, 2025
Enabling TensorIndexer for the transpose tests. Diff check revealed some
mismatches, but I believe they are benign. As far as I can see, all of
them are caused by some slight difference for issue #681.
TensorIndexer's workaround is simpler but not as precise as the legacy
indexer, as indicated in #2689.

For example, there is an index for a predicate of
`TransposeTest.FusionScheduleTransposeSimple`:

```
 ( ( ( blockIdx.x % ( ( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[2] ), 32) ) * ( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 32) ) ) ) / ( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 32) ) ) * 32 ) + (threadIdx.x * 4) % 32)
```

The above is generated by the legacy indexer. With TensorIndexer, it's
generated as:

```
( ( ( ( blockIdx.x % ( ( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[2] ), 32) ) * ( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 32) ) ) ) / ( ceilDiv(( (( (( getMetaData(T0) )).logical_size ))[1] ), 32) ) ) * 32 ) + 0 )
```

The difference is `threadIdx.x 4 % 32` vs. `0`. Note that this is a
predicate for the condition at the start position, i.e., they are
predicated with `>= 0`, and as such the index of TensorIndexer is more
conservative, so the legacy indexer may have a higher chance of using
the fast path created by the unswitch predicate. However, in practice,
both predicates should be always true, so it shouldn't matter. I don't
see any actual difference with stop predicates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants