Fix PerfScore inconcistencies #56812

pentp · 2021-08-04T00:34:41Z

Hide write latency for memAccessKind == PERFSCORE_MEMORY_READ_WRITE like for PERFSCORE_MEMORY_WRITE.
Fix memory access latencies for many instructions that previously didn't add the instruction latency to memory access latency or overwrote memory latency with register access latency.
Adjust some instruction latencies for YMM register size.
Fix latencies for a lot of instructions by using more precise uops.info data.

kunalspathak · 2021-09-07T14:46:49Z

@dotnet/jit-contrib , @briansull

kunalspathak · 2021-09-07T16:10:37Z

                            result.insLatency = PERFSCORE_LATENCY_3C;

Need += here?

Refers to: src/coreclr/jit/emitxarch.cpp:14921 in 51c7ec6. [](commit_id = 51c7ec6, deletion_comment = False)

kunalspathak · 2021-09-07T16:11:17Z

            result.insLatency = PERFSCORE_LATENCY_23C;

+= ?

Refers to: src/coreclr/jit/emitxarch.cpp:15266 in 51c7ec6. [](commit_id = 51c7ec6, deletion_comment = False)

kunalspathak

Thank you for fixing. I have added few questions.

kunalspathak · 2021-09-07T14:52:47Z

src/coreclr/jit/emit.cpp

    assert(latency >= 0.0);

-    if (memAccessKind == PERFSCORE_MEMORY_WRITE)
+    if (memAccessKind == PERFSCORE_MEMORY_WRITE || memAccessKind == PERFSCORE_MEMORY_READ_WRITE)


Why we are hiding latency for PERFSCORE_MEMORY_READ_WRITE as well? Is the assumption in the comment below holds true even for PERFSCORE_MEMORY_READ_WRITE?

We're only hiding the write part of R/W here so that add [mem], x doesn't end up with a higher perfscore than mov reg, [mem]; add reg, x; mov [mem], reg.
The assumption below is now assumed to be true for both writes and I've adjusted some of the instruction latencies to reflect that.

kunalspathak · 2021-09-07T14:56:25Z

src/coreclr/jit/emitxarch.cpp

        case INS_setle:
        case INS_setg:
-            result.insLatency = PERFSCORE_LATENCY_1C;
+            result.insLatency += PERFSCORE_LATENCY_1C;


Good catch.

Similarly, need to also do it for xchg, call, fstp

Not sure how much memory latency affects call [mem], uops.info doesn't have any info on this. It's probably going to be less than the 3C from throughput, so not worth changing it here.

fld and fstp I'm also not sure about what's the correct value (again no data at uops.info).

kunalspathak · 2021-09-07T16:12:27Z

src/coreclr/jit/emitxarch.cpp

            {
                // ins   reg, mem
                result.insThroughput = PERFSCORE_THROUGHPUT_2X;
-                // insLatency is set above (see -  Model the memory latency)


Do we need to update Model the memory latency to not count this twice?

This isn't counted twice, this 2/3C latency is in addition to the modeled memory access base latency (i.e., mov latency). To put it another way, mov rax, [mem] has lower latency than movdqa ymm, [mem].

pentp · 2021-09-07T20:15:46Z

                            result.insLatency = PERFSCORE_LATENCY_3C;
Need += here?

Refers to: src/coreclr/jit/emitxarch.cpp:14921 in 51c7ec6. [](commit_id = 51c7ec6, deletion_comment = False)

This is for lea which should ignore (overwrite) memory access latency.

            result.insLatency = PERFSCORE_LATENCY_23C;
+= ?

Refers to: src/coreclr/jit/emitxarch.cpp:15266 in 51c7ec6. [](commit_id = 51c7ec6, deletion_comment = False)

xchg [mem], reg latency is complicated/bad, could be anywhere between 10 and 45 depending on the exact instruction form+sequence and CPU, so probably this is just an average and the instruction should be avoided if possible.

kunalspathak

LGTM. Thanks for your responses.

ghost added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member labels Aug 4, 2021

JulieLeeMSFT added this to the 7.0.0 milestone Aug 16, 2021

JulieLeeMSFT assigned pentp Aug 16, 2021

Fix PerfScore inconcistencies

51c7ec6

pentp force-pushed the perfscore-fixes branch from 286c5a4 to 51c7ec6 Compare August 22, 2021 22:15

kunalspathak reviewed Sep 7, 2021

View reviewed changes

kunalspathak approved these changes Sep 7, 2021

View reviewed changes

kunalspathak merged commit 404a89f into dotnet:main Sep 7, 2021

pentp deleted the perfscore-fixes branch September 7, 2021 20:41

ghost locked as resolved and limited conversation to collaborators Oct 7, 2021

Fix PerfScore inconcistencies #56812

Fix PerfScore inconcistencies #56812

Uh oh!

Conversation

pentp commented Aug 4, 2021

Uh oh!

kunalspathak commented Sep 7, 2021

Uh oh!

kunalspathak commented Sep 7, 2021

Uh oh!

kunalspathak commented Sep 7, 2021

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pentp commented Sep 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kunalspathak left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pentp commented Sep 7, 2021 •

edited

Loading