Skip to content

Conversation

@AustinWise
Copy link
Contributor

@AustinWise AustinWise commented May 30, 2023

This PR adds a test that on main passes for CoreCLR but segfaults for NativeAOT. It also includes fixes to make the test pass for NativeAOT.

The key problem is the conditional weak table will keep the ManagedObjectWrapperHolder alive but not prevent it from being finalized. This means that if the wrapped object is resurrected, calls to GetOrCreateComInterfaceForObject could return a pointer freed memory.

The solution this PR implements is to move the finialization of the ManagedObjectWrapper to a separate class. This class defers finialization until the wrapped object has been completely collected. The new ManagedObjectWrapperReleaser is referenced by the ManagedObjectWrapperHolder class, so it should not be finialized until the wrapped object becomes eligible for finialization.

The references between objects looks like this:

graph TD
WO[wrapped object];
MOW["ManagedObjectWrapper (unmanaged struct)"];
MOWH[ManagedObjectWrapperHolder];
Releaser[ManagedObjectWrapperReleaser];
WO -->|conditional weak table| MOWH;
MOWH --> WO;
MOWH --> MOW;
MOW -->|ref-counted handle| MOWH;
Releaser --> MOW;
MOWH --> Releaser;
Loading

Some other race conditions are also fixed by this PR:

  • Previously the ref-counted handle in the ManagedObjectWrapper was not set until after the ManagedObjectWrapperHolder was added to the CWT. Hypothetically if a holder was added to the CWT but not yet initialized with the ref-counted handle, another thread could read the object out of the CWT and observe the null handle. This is fixed by doing all the initialization of the holder in its constructor.
  • The GetOrCreateComInterfaceForObject was assuming that ConditionalWeakTable.GetValue always returns a new object and calling AddRef on the holder was not needed. However, GetValue can potentially call TryGetValue and return an existing instance. This means that if many threads called GetOrCreateComInterfaceForObject at the same time, the ref count on the ManagedObjectWrapper might not be as high as it should be. This is fixed by unconditionally calling AddRef on the holder.

@ghost ghost added community-contribution Indicates that the PR has been added by a community member area-Interop-coreclr labels May 30, 2023
@AustinWise

This comment was marked as resolved.

@AustinWise AustinWise force-pushed the austin/ComResurrection branch from 3ef9bd5 to 46508ec Compare June 10, 2023 21:40
@AustinWise

This comment was marked as resolved.

The WrappedObjectHolder will be used to keep the managed object alive when
the COM reference count is positive. And the ManagedObjectWrapper object
will be responsible for cleaning up the unmanaged MOW when the wrapped
object is truly dead.

The design is that as long as there is a postive reference count, the
WrappedObjectHolder should stay alive. Any time the ref count is increased
from the managed side from 0 to 1, the WrappedObjectHolder will be restored
if it was collected. It should not be possible for the MOW to observe a
null WrappedObjectHolder.
@AustinWise AustinWise force-pushed the austin/ComResurrection branch from 46508ec to 66245fc Compare June 11, 2023 17:40
I'm still not 100% sure that _exposed is thread safe.
@AustinWise AustinWise marked this pull request as ready for review June 12, 2023 02:57
@MichalStrehovsky
Copy link
Member

Cc @dotnet/interop-contrib

@ghost
Copy link

ghost commented Jun 13, 2023

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

This PR adds a test that on main passes for CoreCLR but fails for NativeAOT. It also includes fixes to make the test pass for NativeAOT.

The key problem is the conditional weak table will keep the ManagedObjectWrapperHolder alive but not prevent it from being finalized. This means that if the wrapped object is resurrected, calls to GetOrCreateComInterfaceForObject could return a pointer freed memory.

The solution this PR implements is to extend the lifetime of the ManagedObjectWrapperHolder and ManagedObjectWrapper as long as the wrapped object is alive. The responsibility for the keeping the wrapped object alive as long as the MOW has a non-zero refcount is separated out into a new WrappedObjectHolder class. This object is kept alive for the duration of the wrapped object's lifetime by adding a dependent handle to the to the ManagedObjectWrapperHolder. The ManagedObjectWrapperHolder also uses this dependent handle to detect whether the wrapped object is still alive and delay finalization until the wrapped object is completely dead.

The references between objects looks like this:

graph TD
WO[wrapped object];
MOW["ManagedObjectWrapper (unmanaged struct)"];
MOWH[ManagedObjectWrapperHolder];
WOH[WrappedObjectHolder];
WO -->|CWT| MOWH;
MOWH --> MOW;
MOW -->|ref-counted handle| WOH;
WOH --> WO;
WO -->|dependant handle owned by MOWH| WOH;
Loading

Some other race conditions are also fixed by this PR:

  • Previously the ref-counted handle in the ManagedObjectWrapper was not set until after the ManagedObjectWrapperHolder was added to the CWT. Hypothetically if a holder was added to the CWT but not yet initialized with the ref-counted handle, another thread could read the object out of the CWT and observe the null handle. This is fixed by doing all the initialization of the holder in its constructor.
  • The GetOrCreateComInterfaceForObject was assuming that ConditionalWeakTable.GetValue always returns a new object and calling AddRef on the holder was not needed. However, GetValue can potentially call TryGetValue and return an existing instance. This means that if many threads called GetOrCreateComInterfaceForObject at the same time, the ref count on the ManagedObjectWrapper might not be as high as it should be. This is fixed by unconditionally calling AddRef on the holder and removing the superfluous call to TryGetValue in GetOrCreateComInterfaceForObject.
Author: AustinWise
Assignees: -
Labels:

community-contribution, area-NativeAOT-coreclr

Milestone: -

@AaronRobinsonMSFT
Copy link
Member

I will need to take some time to look at this. Based on a quick look at the test I'm not sure I understand what this is validating.

@AustinWise
Copy link
Contributor Author

Based on a quick look at the test I'm not sure I understand what this is validating.

On the main branch, the test passes on CoreClr but segfaults on NativeAOT. This is because the ManagedObjectWrapperHolder gets finalized and frees the ManagedObjectWrapper too early. I have updated the description of the PR to be a little more clear about this.

@AustinWise
Copy link
Contributor Author

I updated how the responsibility is split for keeping objects alive and collecting them. I think this new design is much easier to understand. The description of the PR has been updated.

The main problem this PR does not address is the issue of extra ManagedObjectWrappers hanging around for the lifetime of a wrapped object when there is a race condition adding values to the _ccwTable.

This saves a handle. The refcounted handle behaves like a long-weak,
so this should not get cleared out until the conditional weak table
is also cleared out.
@AustinWise
Copy link
Contributor Author

I realized that I did not need to use an extra handle to monitor the lifetime of the holder object from the releaser object. Instead I could just use the ref counted handle. That saves allocating an extra GC handle.

Copy link
Member

@jkoritzinsky jkoritzinsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, but I want @AaronRobinsonMSFT to take another look.

@AaronRobinsonMSFT
Copy link
Member

The key problem is the conditional weak table will keep the ManagedObjectWrapperHolder alive but not prevent it from being finalized.

My biggest issue here is the above statement. If an object is "alive", I assume "reachable", then it will not be finalized. The GC only adds objects to the finalizer queue if they are no longer reachable so I am confused by this statement.

@AaronRobinsonMSFT
Copy link
Member

The key problem is the conditional weak table will keep the ManagedObjectWrapperHolder alive but not prevent it from being finalized.

My biggest issue here is the above statement. If an object is "alive", I assume "reachable", then it will not be finalized. The GC only adds objects to the finalizer queue if they are no longer reachable so I am confused by this statement.

Oh right. The conditional weak table doesn't keep it alive, merely a reference to acquire. Its state is indeterminant in this case and that is the issue you are attempting to mitigate. This is simply loose terminology.

@jkoritzinsky jkoritzinsky merged commit 9d7c9d0 into dotnet:main Jun 22, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jul 23, 2023
@AustinWise AustinWise deleted the austin/ComResurrection branch March 25, 2024 02:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants