Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Monitor.TryEnter asm timeout != 0 before spin#6951

Merged
jkotas merged 2 commits into
dotnet:masterfrom
benaadams:monitor-tryenter
Aug 30, 2016
Merged

Monitor.TryEnter asm timeout != 0 before spin#6951
jkotas merged 2 commits into
dotnet:masterfrom
benaadams:monitor-tryenter

Conversation

@benaadams
Copy link
Copy Markdown
Member

@benaadams benaadams commented Aug 27, 2016

With this change Monitor.TryEnter(o, 0) moves from the slowest opportunistic locking to the fastest (when using the inline asm path)

corefx tests pass, not sure how to trigger the TRACK_SYNC flag?

1M iterations:

method pre (ms) post (ms) improvement
Monitor.TryEnter(o, 0) 221,739.76 5.49 x 40389.8

Usage from apisof.net

method API port nuget
Monitor.TryEnter(object) 1.5% 0.8%
Monitor.TryEnter(object, bool) 1.2% 0.3%
Monitor.TryEnter(object, int) 2.1% 1.0%
Monitor.TryEnter(object, int, bool) 0.2% 0.1%
Monitor.TryEnter(object, timespan) 0.4% 0.3%
Monitor.TryEnter(object, timespan, bool) 0.1% 0.1%

Resolves https://github.com/dotnet/coreclr/issues/6950
Resolves aspnet/KestrelHttpServer#1068

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 27, 2016

There is also portable C++ implementation in vm/jithelpers.cpp. Does it have the bug as well?

I doubt that the hand-written assembly version has significant perf advantage. It may be best to switch to use the portable C++ implementation everywhere. Defining FEATURE_IMPLICIT_TLS in clrdefinitions.cmake should do the trick. And then do some perf measurements to understand the impact of switching to C++ implementation.

cc @rahku

@benaadams
Copy link
Copy Markdown
Member Author

benaadams commented Aug 28, 2016

JIT_MonTryEnter_Portable doesn't have the bug, and is fairly straightforward; after checking the timeout (and failing fast) it calls into ObjHeader::EnterObjMonitorHelperSpin in syncbl.inl which does the spinning.

From what I'm understanding JIT_MonTryEnter_InlineGetThread is an assembly tail call for the managed code, which then falls back to a call into AwareLock::TryEnter in syncblk.cpp via a chain of calls.

So its advantage is for the uncontended acquire where its calling semantics would be like an inlined intrinsic CAS for the caller; so with very little overhead.

I thought the simplest route would be to introduce a new register like rsi as JIT_MonTryEnter_Slow does but its prologue is a little more complicated so restoring its value would be also (i.e. it doesn't use rsp until exit)

@benaadams
Copy link
Copy Markdown
Member Author

benaadams commented Aug 28, 2016

Actually... If I understand it correctly, I can do that.

@benaadams benaadams force-pushed the monitor-tryenter branch 2 times, most recently from 6306684 to d9a2a72 Compare August 28, 2016 09:49
@benaadams
Copy link
Copy Markdown
Member Author

Works with a twist..

@benaadams benaadams changed the title JIT_MonTryEnter_Slow timeout != 0 before spin Monitor.TryEnter asm timeout != 0 before spin Aug 28, 2016
@benaadams
Copy link
Copy Markdown
Member Author

@jkotas updated description as this now fixes the issue for both parts.

mov [rsp + 10h], rdx

push_nonvol_reg r12 ; rcx now at [rsp + 10h]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think you need the extra register. The timeout is stored at [rsp + 10h] already, so the two places that need to check it can just do "cmp dword ptr [rsp + 10h], 0"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its stored at one of two places [rsp + 10h] or [rsp + MON_ENTER_STACK_SIZE_INLINEGETTHREAD + 10h] so code would be messier and use more ifdef?

Also it would be a stack compare for every loop spin rather than a register compare? (not sure if that would make a huge difference)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see a problem with adding extra ifdef in a few more places.

The shortest codepath through this helper is the non-contended case that acquires lock successfully. We should not be making this path longer. Once there is a contention - that will always burn a lot of cycles underneath, accessing stack instead of register is not a problem.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If doing the cmp directly, I don't think I need to restore rdx before exit? (Its not done in the thinlock)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree.

@benaadams
Copy link
Copy Markdown
Member Author

Runs even faster without the extra register. Have updated timings. Corefx tests pass.

FreeBSD error is "threading.WaitForSingleObject.WFSOExSemaphoreTest"

Expected thread to wait for 2000 ms (and get interrupted). Thread waited for 1912 ms! (Acceptable delta: 300)

Which seems unusual but unrelated? Test doesn't use Monitor

; In the Block case we've trashed RCX, restore it
; In the Block case we've trashed RCX and RDX restore them
mov rcx, [rsp + 8h]
mov rdx, [rsp + 10h]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is "mov rdx, [rsp + 10h]" instruction right before "jmp JITutil_MonTryEnter" that will take care of restoring rdx. This change should not be necessary.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, and its already done the rsp restore so it is in the right place.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Aug 29, 2016

LGTM modulo last few comments. Thanks!

@benaadams
Copy link
Copy Markdown
Member Author

@dotnet-bot test Linux ARM Emulator Cross Release Build
@dotnet-bot test OSX x64 Checked Build and Test

@jkotas jkotas merged commit 4f60a81 into dotnet:master Aug 30, 2016
@benaadams benaadams deleted the monitor-tryenter branch September 2, 2016 11:00
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
* JIT_MonTryEnter_Slow timeout != 0 before spin
* JIT_MonTryEnter_InlineGetThread timeout != 0 before spin


Commit migrated from dotnet/coreclr@4f60a81
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants