Skip to content

Conversation

@saurabh500
Copy link
Contributor

The guid for ITransactionLocal was incorrectly coded. The change fixes the Guid and also adds a test to verify this.

The test was throwing an exception before the fix.

The guid was verified from oledb.h header and also was reported in #32405

Fixes #31177
Fixes #32405

@danmoseley
Copy link
Member

Servicing candidate?

@jkotas
Copy link
Member

jkotas commented Feb 20, 2020

System.Data.OleDb.Tests are failing with this change.

@saurabh500
Copy link
Contributor Author

@jkotas I am looking into the test failure. I haven't been able to replicate the failure on my dev box. The tests are passing for me. I am testing using dotnet build /t:RebuildAndTest /p:ArchGroup=x64

This could be a test issue, and my testing pattern differs from the rest of the test class, because I am not reusing the connection, but creating a new one. I will submit a change to try out the hypothesis. Potentially because there are parallel connections being opened for a shared resource viz a csv file.

@jkotas
Copy link
Member

jkotas commented Feb 20, 2020

Note that the tests are failing on Win7 and Win81 only. Perhaps that may be a factor?

@saurabh500
Copy link
Contributor Author

I could get a repro while executing tests from Visual Studio. However I couldn't repro the same call stack from Command Line. From command line I started seeing AVs. (which isn't good either)

While looking into the code I realized that there are duplicate instances of ITransactionLocal object with the same GUID (after I made this change). This causes failure to execute the IDataInitializeGetDataSource delegate declared in src\libraries\System.Data.OleDb\src\UnsafeNativeMethods.cs
I could remove one of the declarations of ITransacactoinLocal and get the tests to execute fine.

The existing duplicate interface is at

[Guid("0C733A5F-2A1C-11CE-ADE5-00AA0044773D"), InterfaceType(ComInterfaceType.InterfaceIsIUnknown), ComImport, SuppressUnmanagedCodeSecurity]
internal interface ITransactionLocal
{
[Obsolete("not used", true)] void Commit(/*deleted parameter signature*/);
[Obsolete("not used", true)] void Abort(/*deleted parameter signature*/);
[Obsolete("not used", true)] void GetTransactionInfo(/*deleted parameter signature*/);
[Obsolete("not used", true)] void GetOptionsObject(/*deleted parameter signature*/);
[PreserveSig]
System.Data.OleDb.OleDbHResult StartTransaction(
[In] int isoLevel,
[In] int isoFlags,
[In] IntPtr pOtherOptions,
[Out] out int pulTransactionLevel);
}

Since the Guid for the two transaction types were different, there were no problems being surfaced during test execution. However after fixing the Guid in the duplicate ITrasnsactionLocal, there seems to be some problem with calling the delegate above which either surfaces as an AV or with E_FAIL HRESULT. At this point, I have not dug deeper into what causes the AV or the E_FAIL.

The duplicate interface is

[Guid("0c733a93-2a1c-11ce-ade5-00aa0044773d")]
[InterfaceType(ComInterfaceType.InterfaceIsIUnknown)]
[ComImport]
internal unsafe interface ITransactionLocal : ITransaction
{
[PreserveSig]
new int Commit
(
[In] bool fRetaining,
[In] uint grfTC,
[In] uint grfRM
);
[PreserveSig]
new int Abort
(
[In] IntPtr pboidReason,
[In] bool fRetaining,
[In] bool fAsync
);
[PreserveSig]
new int GetTransactionInfo
(
[Out] IntPtr pinfo
);
[PreserveSig]
int GetOptionsObject(
[Out, Optional] IntPtr ppOptions
);
[PreserveSig]
int StartTransaction(
[In] int isoLevel,
[In] uint isoFlags,
[In, Optional] IntPtr pOtherOptions,
[Out, Optional] uint* pulTransactionLevel
);
}

The duplicate interface was added after the port to Core to replicate the behavior of what Managed C++ part of System.Data.dll used to handle, which interacting with the ITransactionLocal interface.

I will update the PR with the findings and continue to investigate in parallel to see if I can make sense of why the failures are happening. I have very limited knowledge of COM and OleDb APIs to quickly find a solution here. However I will update the PR with the proposed solution and see if there is any feedback.

@saurabh500 saurabh500 self-assigned this Feb 21, 2020
@danmoseley
Copy link
Member

@AaronRobinsonMSFT any thoughts come to mind?

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Feb 21, 2020

This is a huge mess. The GUID on ITransactionLocal is really for IChapteredRowset. To further complicate things the ICharteredRowSet inherits directly from IUnknown so the interface definition is simple and only has the two slots. However, the actual ITransactionLocal inherits so 3 new slots must be added before the actual ones the interface defines.

I can imagine various assumptions with how these are validated. Let me take a look at some of the failures when/if they come back.

@AaronRobinsonMSFT
Copy link
Member

I am going to have to dig into this implementation tomorrow in order to help.

It would appear that the OleDB API implementation decided to circumvent the entire RCW/CCW support and manually handle all COM interop - this seems like a bad idea, but fine. What this means is somewhere there is a offset issue or some assumption about the layout of a specific interface that is no longer valid now that the type in question has been updated.

@FreddyDgh
Copy link
Contributor

Since there haven't been any updates on this PR for a week, does that mean the changes won't be merged anytime soon? It'd be really nice to have this fixed.

@AaronRobinsonMSFT
Copy link
Member

I completely forgot about this issue. Fixing this is clearly the correct thing here, but understanding the failure is important - especially considering the failure is consistent. I will try to look into this today.

@AaronRobinsonMSFT
Copy link
Member

@saurabh500 I am unable to build this branch locally. I am getting the following:

System\Diagnostics\ProcessStartInfo.Win32.cs(21,42): error CS8600: Converting null literal or possible null value to non-nullable type. [D:\runtime\src\libraries\System.Diagnostics.Process\src\System.Diagnostics.Process.csproj]
System\Diagnostics\ProcessStartInfo.Win32.cs(30,49): error CS8600: Converting null literal or possible null value to non-nullable type. [D:\runtime\src\libraries\System.Diagnostics.Process\src\System.Diagnostics.Process.csproj]
System\Diagnostics\PerformanceCounterLib.cs(116,37): error CS8600: Converting null literal or possible null value to non-nullable type. [D:\runtime\src\libraries\System.Diagnostics.Process\src\System.Diagnostics.Process.csproj]
System\Diagnostics\PerformanceCounterLib.cs(118,37): error CS8600: Converting null literal or possible null value to non-nullable type. [D:\runtime\src\libraries\System.Diagnostics.Process\src\System.Diagnostics.Process.csproj]
System\Diagnostics\PerformanceCounterLib.cs(240,32): error CS8600: Converting null literal or possible null value to non-nullable type. [D:\runtime\src\libraries\System.Diagnostics.Process\src\System.Diagnostics.Process.csproj]
System\Diagnostics\PerformanceCounterLib.cs(241,32): error CS8603: Possible null reference return. [D:\runtime\src\libraries\System.Diagnostics.Process\src\System.Diagnostics.Process.csproj]
D:\runtime\src\libraries\Common\src\System\Diagnostics\TraceListenerHelpers.Windows.cs(28,38): error CS0103: The name 'Process' does not exist in the current context [D:\runtime\src\libraries\System.Diagnostics.TraceSource\src\System.Diagnostics.TraceSource.csproj]
    0 Warning(s)
    7 Error(s)

@jkotas
Copy link
Member

jkotas commented Feb 27, 2020

@AaronRobinsonMSFT This is a build break than sneaked through the CI. Merge current master into the branch to fix it.

@saurabh500
Copy link
Contributor Author

@AaronRobinsonMSFT this is strange. I didn't touch the path where your build is failing.
I followed the Workflow and Building guide to get this done.

Are you able to build the master? Perhaps a rebase of the branch with master might help ?

@AaronRobinsonMSFT
Copy link
Member

@jkotas I just merged in origin/master and now am getting the following error. Please tell me this isn't another known issue.

0 arguments [D:\runtime\artifacts\obj\coreclr\Windows_NT.x64.Checked\src\jit\linuxnonjit\linuxnonjit.vcxproj]
D:\runtime\src\coreclr\src\jit\emit.h(197,10): message : see declaration of 'emitLocation::Print' (compiling source fil
e D:\runtime\src\coreclr\src\jit\codegencommon.cpp) [D:\runtime\artifacts\obj\coreclr\Windows_NT.x64.Checked\src\jit\li
nuxnonjit\linuxnonjit.vcxproj]
  gcdecode.cpp
  pefile.cpp
D:\runtime\src\coreclr\src\jit\codegencommon.cpp(11311,33): error C2660: 'emitLocation::Print': function does not take
0 arguments [D:\runtime\artifacts\obj\coreclr\Windows_NT.x64.Checked\src\jit\linuxnonjit\linuxnonjit.vcxproj]

@AaronRobinsonMSFT
Copy link
Member

Scratch that. Nonsense happening locally.

@AaronRobinsonMSFT
Copy link
Member

AaronRobinsonMSFT commented Feb 27, 2020

Actually no. This is from dotnet/runtime/master. Boo.

see #32927

@AaronRobinsonMSFT
Copy link
Member

A brief update. This test appears to require Access 2016. The failure is in x64, but I can't install that on my machine because it can't be installed if Office (32-bit) is already installed. I hope it doesn't offend anyone that I am not going to uninstall my version of Office simply to attempt a local repro.

Looking through the code locally, there are some points of interest to investigate. Based on an offline conversation with @saurabh500, the function throwing the exception is propagating an HRESULT returned from:

hr = Initialize(base.handle);

  1. We should ensure the VTable is correct and we are calling what we think we are (i.e. IDbInitialize::Initialize()).

  2. We need to step into the IDbInitialize::Initialize() call and see why the implementation is returning E_FAIL.

Since OleDB is manually handling all their COM calls/marshaling this is entirely in the test owners control and nothing to do with built-in COM interop support. Since other tests seem to be working I am going to assume that the VTable is okay which means we need good ol' mixed-mode debugging. We should step through the Access 2016 implementation of IDbInitialize::Initialize() and figure out why E_FAIL is being returned for this test.

@FreddyDgh
Copy link
Contributor

FreddyDgh commented Feb 28, 2020

I spent some time looking at this, and I couldn't pinpoint the exact problem, but my findings may be helpful to you.

On .NET Framework, when an OleDbConnection is opened, MS Access creates a record-locking information file (.laccdb) in the same directory as the database. When the OleDbConnection is closed/disposed, the .laccdb file is deleted. However, on .NET Core, if you begin a transaction on a connection, the locking info file is created on open, but does not get deleted on close. (If you don't begin a transaction, the locking info file behaves normally.)

On .NET Core, what I observed is that if you rapidly open several connections, the .laccdb file seems to grow in size each time, and it appears that the file has a hard size limit (I found to be 4KB). In my tests, once the file reached that maximum size, any future calls to OleDbConnection.Open() would fail.

I noticed that when @saurabh500 merged the duplicate interfaces, the signature he used for ITransactionLocal.StartTransaction(...) has the pulTransactionLevel parameter as an int, instead of a uint*. I don't think this is the issue, but might be worth double-checking the proper signature.
Edit: Nevermind, he has as an 'out int', not just int, which I believe makes it okay, if I recall properly.

Furthermore, I found that if I insert an explicit call to GC.Collect() on line 337 of OleDbTransaction.cs (right after _transaction.Dispose();), that seems to fix the issue: the record-locking file gets deleted when the connection is closed, and my version of the tests passed. This doesn't seem like the RIGHT solution (although, any port in a storm?), but I'd imagine knowing this could be helpful to you. It seems to me if I put the GC.Collect() any earlier, the tests fail, which leads me to believe that the problem lies somewhere with the WrappedTransaction not having something released properly, but I'm not familiar enough to say for sure.

@FreddyDgh
Copy link
Contributor

FreddyDgh commented Mar 4, 2020

I'm not sure if anyone can confirm that doing a manual GC.Collect() between test runs causes the tests to pass, but that was my experience, so I believe I have found the problem. It looks like the issue is reference counting on the COM interfaces. The documentation for Marshal.GetObjectForIUnknown(IntPtr) states:

This method wraps IUnknown in a managed object. This has the effect of incrementing the reference count of the COM component. The reference count will be decremented when the runtime performs garbage collection on the managed object that represents the COM object.

In SafeHandles.cs, there are three calls to Marshal.GetObjectForIUnknown() that create local variables that disappear into the ether and never have Marshal.ReleaseComObject() called on them (until garbage collection). In my tests, adding a Marshal.ReleaseComObject() call to each of these will fix the issue. For example:

internal static unsafe OleDbHResult ITransactionAbort(System.IntPtr ptr)
        {
            OleDbHResult hr = OleDbHResult.E_UNEXPECTED;
            ITransactionLocal transactionLocal = null;
            RuntimeHelpers.PrepareConstrainedRegions();
            try
            { }
            finally
            {
                Guid IID_ITransactionLocal = typeof(ITransactionLocal).GUID;
                hr = (OleDbHResult)Marshal.QueryInterface(ptr, ref IID_ITransactionLocal, out var pTransaction);
                if (pTransaction != IntPtr.Zero)
                {
                    transactionLocal = (ITransactionLocal)Marshal.GetObjectForIUnknown(pTransaction);
                    hr = (OleDbHResult)transactionLocal.Abort(IntPtr.Zero, false, false);

                    Marshal.ReleaseComObject(transactionLocal); // INSERT THIS LINE HERE

                    Marshal.Release(pTransaction);
                }
            }
            return hr;
        }

Like I said, I believe this fixes the issue (presumably, the right way) and should allow this PR to be merged. Please let me know your thoughts. It'd be great if we could get this merged soon. Thanks.

@saurabh500

@saurabh500
Copy link
Contributor Author

@FreddyD-GH Thanks for the investigation and providing the potential fix.
I will look into incorporating your recommendations in this PR to see if I can fix the issue overall.

@FreddyDgh
Copy link
Contributor

@saurabh500 Just to clarify the changes I'm proposing: simply adding these 3 lines to the rest of your PR allowed things to work in my test environment:
https://github.com/FreddyD-GH/runtime/commit/6ebc323812cc9f3697c89d4fb93997350ab37be4#diff-d1b2f88061cc8f2e9fa31b0d2b46012a

@saurabh500
Copy link
Contributor Author

I did just that and now I have new failures in the tests that I am running locally.
So I am investigating those

@FreddyDgh
Copy link
Contributor

FreddyDgh commented Mar 5, 2020

Thanks for the update. I just took a look at my tests (including the original tests in the repo) to double check that they all passed after the change (they did), and I realized that that I had forgotten the System.Data.OleDb library uses the text/csv format for its tests, not the accdb format. I apologize for probably sounding like a crazy person before when I was talking about .laccdb files and what not.

The only way I could observe bugs from the changed transaction code was to write my own tests, and to do so, I used the driver with an accdb file (since this is my intended scenario). I assume using the driver with text/csv will have a similar issue due to not being disposed properly, but it could be that there are two different bugs. Hopefully, that's not the case, but it is a possibility.

@saurabh500
Copy link
Contributor Author

@FreddyD-GH Looks like the test issue was specific to my environment. The tests are passing now for me and the CI is green as well.

@maryamariyan @jkotas and @stephentoub there are changes since you folks last reviewed this. Can you take a look and let me know if the changes look good ?

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@saurabh500 saurabh500 merged commit 33b3b9a into dotnet:master Mar 8, 2020
@MaceWindu
Copy link

MaceWindu commented Mar 8, 2020

Cool. Will it go into 5.0 or there are chances to see it in 2.1/3.1 ?

@jader1313
Copy link

I need this in 3.1, please!

@lauxjpn
Copy link

lauxjpn commented Mar 10, 2020

Servicing candidate?

I need this in 3.1, please!

For EntityFrameworkCore.Jet's EF Core 3.1 support we will need this as part of a service release targeting .NET Core 3.1 as well.

@saurabh500 saurabh500 deleted the fixGuidForTransaction branch March 10, 2020 13:02
@saurabh500
Copy link
Contributor Author

saurabh500 commented Mar 10, 2020

The PR for porting this fix into servicing is at dotnet/corefx#42878
It has been approved for 3.1.x

@jader1313 You can consume the nightly build of System.Data.OleDb to verify if your tests pass with this change. It is shipped as an out of box package and shouldn't need a runtime update.

@jader1313
Copy link

@saurabh500 thank you very much, it worked on my tests.

@lauxjpn
Copy link

lauxjpn commented Mar 21, 2020

@saurabh500 Works for EntityFrameworkCore.Jet as well.

@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[System.Data.OleDb] Wrong IID of ITransactionLocal interface OleDbTransaction.Commit Exception ''SQLNCLI11' failed with no error message available

10 participants