Skip to content

Loop alignment issue in Array2 benchmark #54072

@BruceForstall

Description

@BruceForstall

A recent change #51901 leading to a regression in the Benchstone.BenchI.Array2 benchmark on Ubuntu (but not Windows): #52316.

The core of the benchmark is the Bench function inner loop:

for (; loop != 0; loop--) {
    for (int i = 0; i < 10; i++) {
        for (int j = 0; j < 10; j++) {
            for (int k = 0; k < 10; k++) {
                d[i][j][k] = s[i][j][k];
            }
        }
    }
}

The code of this loop is almost equivalent, modulo register allocation, before and after #51901. The difference is loop alignment: before #51901, the loop fits in 2 32-byte chunks; after, it is in 3 32-byte chunks. On Ubuntu, this leads to about a 50% performance regression. Simply setting COMPlus_JitAlignLoopAdaptive=0 changes the alignment such that the inner loop fits in 2 32-byte chunks, recovering the performance.

This is a high weight basic block; perhaps the alignment heuristics should "try harder" and be willing to insert more alignment padding in case it might be profitable?

Metadata

Metadata

Assignees

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIuntriagedNew issue has not been triaged by the area owner

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions