Skip to content

Conversation

@AaronRobinsonMSFT
Copy link
Member

Switching to a reader/writer lock improves throughput for faster RCW look up by 4x. This was observed in a private repro associated with #90964.

Fixes #90964

@AaronRobinsonMSFT
Copy link
Member Author

AaronRobinsonMSFT commented Aug 25, 2023

/cc @jkotas @dotnet/interop-contrib @manodasanW

@jkotas
Copy link
Member

jkotas commented Aug 25, 2023

Would it be possible to extract a repro into a microbenchmark in https://github.com/dotnet/performance?

Also, it would be good to check how native AOT does on this microbenchmark.

@AaronRobinsonMSFT
Copy link
Member Author

Would it be possible to extract a repro into a microbenchmark in https://github.com/dotnet/performance?

I will need to construct one. The repro was using C#/WinRT and would be non-trivial to recreate. I will see what I can do.

Also, it would be good to check how native AOT does on this microbenchmark.

The data structure being used in AOT is the ConditionalWeakTable. I don't think this has the same mutex issues that we had in CoreCLR.

@manodasanW
Copy link
Contributor

Also, it would be good to check how native AOT does on this microbenchmark.

The data structure being used in AOT is the ConditionalWeakTable. I don't think this has the same mutex issues that we had in CoreCLR.

For AOT, thoughts on the lock here for rcwCache.

@AaronRobinsonMSFT
Copy link
Member Author

Also, it would be good to check how native AOT does on this microbenchmark.

The data structure being used in AOT is the ConditionalWeakTable. I don't think this has the same mutex issues that we had in CoreCLR.

For AOT, thoughts on the lock here for rcwCache.

I was misinformed. Sad. Okay, that needs to change.

@AaronRobinsonMSFT
Copy link
Member Author

Would it be possible to extract a repro into a microbenchmark in https://github.com/dotnet/performance?

Also, it would be good to check how native AOT does on this microbenchmark.

@jkotas and @manodasanW I'm genuinely shocked at this, but converting NAOT to use a reader/writer lock makes NAOT slower for a microbenchmark. I was able to see the expected speed in the microbenchmark with CoreCLR and the reader/writer lock so it is capturing the scenario, but there is nothing to do here for NAOT.

I will work on pushing the benchmark into https://github.com/dotnet/performance.

@AaronRobinsonMSFT
Copy link
Member Author

NAOT

native_baseline\Perf.exe
Parallel.ForEach: 409 ms.
native\Perf.exe
Parallel.ForEach: 520 ms.

CoreCLR

Perf_baseline.cmd
Parallel.ForEach: 2060 ms.
Perf.cmd
Parallel.ForEach: 428 ms.

@AaronRobinsonMSFT
Copy link
Member Author

Benchmark PR - dotnet/performance#3298

@AaronRobinsonMSFT AaronRobinsonMSFT merged commit b09764f into dotnet:main Aug 25, 2023
@AaronRobinsonMSFT AaronRobinsonMSFT deleted the runtime_90964 branch August 25, 2023 23:23
@AaronRobinsonMSFT
Copy link
Member Author

/backport to release/8.0

@github-actions
Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5981298027

@ghost ghost locked as resolved and limited conversation to collaborators Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ComWrappers.GetOrCreateObjectForComInstance becomes expensive during the Get cached RCW scenario when called from multiple threads

3 participants