Loop alignment issue in Array2 benchmark

A recent change #51901 leading to a regression in the Benchstone.BenchI.Array2 benchmark on Ubuntu (but not Windows): #52316.

The core of the benchmark is the `Bench` function inner loop:
```
for (; loop != 0; loop--) {
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 10; j++) {
            for (int k = 0; k < 10; k++) {
                d[i][j][k] = s[i][j][k];
            }
        }
    }
}
```

The code of this loop is almost equivalent, modulo register allocation, before and after #51901. The difference is loop alignment: before #51901, the loop fits in 2 32-byte chunks; after, it is in 3 32-byte chunks. On Ubuntu, this leads to about a 50% performance regression. Simply setting `COMPlus_JitAlignLoopAdaptive=0` changes the alignment such that the inner loop fits in 2 32-byte chunks, recovering the performance.

This is a high weight basic block; perhaps the alignment heuristics should "try harder" and be willing to insert more alignment padding in case it might be profitable?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Loop alignment issue in Array2 benchmark #54072

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Loop alignment issue in Array2 benchmark #54072

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions