JIT: Always compute loop iteration estimate in loop inversion if we have PGO data#116104
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR updates the loop inversion logic to skip inverting loops that are expected to iterate only a few times, based on profile weight data.
- Simplifies the iteration count estimation by using the likely weight of the test block and the called count.
- Removes the previous, more complex handling of profile weights and loop entry estimation.
|
cc @dotnet/jit-contrib, @AndyAyersMS PTAL. Diffs show large size decreases (with It's worth noting that I'm not cutting off any loops when we don't have PGO data. Since inversion currently runs before |
|
I think this is a tricky one to get right.
|
In this case, wouldn't we compute a high iteration count for the nested loop too (assuming the parent loop doesn't conditionally execute the child loop)? I agree that this approach isn't sensitive to the other cases you mentioned. The loop inversion diffs that inspired this change didn't necessarily involve loops with low iteration counts; rather, they were loops that are more likely to fall through than loop, or otherwise weren't likely to run more than once per method call. It feels a bit crude, but I could take a safer approach and skip loops that don't iterate at least twice on average -- in other words, it has to behave like a loop on average to be inverted. |
Ah, I should have looked more closely. You are computing a method-entry relative count, not a loop-entry relative count... I just assumed "iteration count" meant the latter. So yes what you are doing would handle the nested case ok. I'd like to see what a size-based heuristic looks like. I think that is perhaps less prone to mis-estimating importance or potential benefit from inversion (?). |
I was thinking of reusing the size heuristic you added for loop cloning: If a loop is too big to likely benefit from cloning, then it's probably not tight enough to benefit from inversion. Does that seem like a reasonable starting point? I don't think we can easily separate out the size heuristic change from #116017, since we need the loop data structures computed to easily compute the loop size. I can push a change to that PR with the size restriction and see how the diffs change. |
Sure, using the same size threshold seems reasonable. |
|
Based on my trial and error with different size limits for loop inversion (comment), I think we're unlikely to pursue a loop iteration heuristic for now. I'm going to remove the heuristic portion and just make this into a refactor of the loop iteration computation, so that we're at least always doing it. |
…manasifkhalid/runtime into loop-inversion-iteration-count
|
@AndyAyersMS I thought I'd revive this to cut down on my PR backlog. The only material change in this is we always try to estimate the loop iteration count when we have PGO data, even if the weights into the loop are inconsistent. Because we run profile repair right before loop inversion, we don't encounter inconsistency all that often. From what I've seen, most of the cases where block weights are still inconsistent are under OSR, which is known to trip up profile repair. Under OSR, we can assume the loop is very hot, so even if the loop iteration count loses some imprecision from the lack of profile consistency, I suspect the computed value is always more realistic than The diffs are small, and seem to be inflated by duplicate method contexts, according to the disasm summaries. Ex: |
AndyAyersMS
left a comment
There was a problem hiding this comment.
Yes, let's take this one.
Ensure loop inversion always comes up with a loop iteration estimate better than
BB_LOOP_WEIGHT_SCALEif we have PGO data.