Monitor.TryEnter asm timeout != 0 before spin#6951
Conversation
|
There is also portable C++ implementation in vm/jithelpers.cpp. Does it have the bug as well? I doubt that the hand-written assembly version has significant perf advantage. It may be best to switch to use the portable C++ implementation everywhere. Defining cc @rahku |
|
From what I'm understanding So its advantage is for the uncontended acquire where its calling semantics would be like an inlined intrinsic CAS for the caller; so with very little overhead. I thought the simplest route would be to introduce a new register like |
|
Actually... If I understand it correctly, I can do that. |
6306684 to
d9a2a72
Compare
|
Works with a twist.. |
|
@jkotas updated description as this now fixes the issue for both parts. |
| mov [rsp + 10h], rdx | ||
|
|
||
| push_nonvol_reg r12 ; rcx now at [rsp + 10h] | ||
|
|
There was a problem hiding this comment.
I do not think you need the extra register. The timeout is stored at [rsp + 10h] already, so the two places that need to check it can just do "cmp dword ptr [rsp + 10h], 0"
There was a problem hiding this comment.
Its stored at one of two places [rsp + 10h] or [rsp + MON_ENTER_STACK_SIZE_INLINEGETTHREAD + 10h] so code would be messier and use more ifdef?
Also it would be a stack compare for every loop spin rather than a register compare? (not sure if that would make a huge difference)
There was a problem hiding this comment.
I do not see a problem with adding extra ifdef in a few more places.
The shortest codepath through this helper is the non-contended case that acquires lock successfully. We should not be making this path longer. Once there is a contention - that will always burn a lot of cycles underneath, accessing stack instead of register is not a problem.
There was a problem hiding this comment.
If doing the cmp directly, I don't think I need to restore rdx before exit? (Its not done in the thinlock)
d9a2a72 to
7945608
Compare
|
Runs even faster without the extra register. Have updated timings. Corefx tests pass. FreeBSD error is "threading.WaitForSingleObject.WFSOExSemaphoreTest"
Which seems unusual but unrelated? Test doesn't use Monitor |
| ; In the Block case we've trashed RCX, restore it | ||
| ; In the Block case we've trashed RCX and RDX restore them | ||
| mov rcx, [rsp + 8h] | ||
| mov rdx, [rsp + 10h] |
There was a problem hiding this comment.
There is "mov rdx, [rsp + 10h]" instruction right before "jmp JITutil_MonTryEnter" that will take care of restoring rdx. This change should not be necessary.
There was a problem hiding this comment.
ah, and its already done the rsp restore so it is in the right place.
|
LGTM modulo last few comments. Thanks! |
7945608 to
616aca3
Compare
|
@dotnet-bot test Linux ARM Emulator Cross Release Build |
* JIT_MonTryEnter_Slow timeout != 0 before spin * JIT_MonTryEnter_InlineGetThread timeout != 0 before spin Commit migrated from dotnet/coreclr@4f60a81
With this change Monitor.TryEnter(o, 0) moves from the slowest opportunistic locking to the fastest (when using the inline asm path)
corefx tests pass, not sure how to trigger the
TRACK_SYNCflag?1M iterations:
Usage from apisof.net
Resolves https://github.com/dotnet/coreclr/issues/6950
Resolves aspnet/KestrelHttpServer#1068