-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix PerfScore inconcistencies #56812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@dotnet/jit-contrib , @briansull |
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing. I have added few questions.
| assert(latency >= 0.0); | ||
|
|
||
| if (memAccessKind == PERFSCORE_MEMORY_WRITE) | ||
| if (memAccessKind == PERFSCORE_MEMORY_WRITE || memAccessKind == PERFSCORE_MEMORY_READ_WRITE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we are hiding latency for PERFSCORE_MEMORY_READ_WRITE as well? Is the assumption in the comment below holds true even for PERFSCORE_MEMORY_READ_WRITE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're only hiding the write part of R/W here so that add [mem], x doesn't end up with a higher perfscore than mov reg, [mem]; add reg, x; mov [mem], reg.
The assumption below is now assumed to be true for both writes and I've adjusted some of the instruction latencies to reflect that.
| case INS_setle: | ||
| case INS_setg: | ||
| result.insLatency = PERFSCORE_LATENCY_1C; | ||
| result.insLatency += PERFSCORE_LATENCY_1C; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly, need to also do it for xchg, call, fstp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how much memory latency affects call [mem], uops.info doesn't have any info on this. It's probably going to be less than the 3C from throughput, so not worth changing it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fld and fstp I'm also not sure about what's the correct value (again no data at uops.info).
| { | ||
| // ins reg, mem | ||
| result.insThroughput = PERFSCORE_THROUGHPUT_2X; | ||
| // insLatency is set above (see - Model the memory latency) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to update Model the memory latency to not count this twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't counted twice, this 2/3C latency is in addition to the modeled memory access base latency (i.e., mov latency). To put it another way, mov rax, [mem] has lower latency than movdqa ymm, [mem].
This is for
|
kunalspathak
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your responses.
memAccessKind == PERFSCORE_MEMORY_READ_WRITElike forPERFSCORE_MEMORY_WRITE.YMMregister size.Fixes #49647