Skip to content

Delete VirtualUnwindToFirstManagedCallFrame#123507

Open
am11 wants to merge 10 commits intodotnet:mainfrom
am11:feature/deterministic-unwinding3
Open

Delete VirtualUnwindToFirstManagedCallFrame#123507
am11 wants to merge 10 commits intodotnet:mainfrom
am11:feature/deterministic-unwinding3

Conversation

@am11
Copy link
Member

@am11 am11 commented Jan 22, 2026

Currently based on #123307 (shall rebase once that one is merged).

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jan 22, 2026
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @mangod9
See info in area-owners.md if you want to be subscribed.

@am11 am11 force-pushed the feature/deterministic-unwinding3 branch from 994c384 to 4c9de0e Compare January 22, 2026 21:51
@am11 am11 marked this pull request as ready for review January 23, 2026 02:19
@am11 am11 requested review from janvorli and jkotas January 23, 2026 02:19
@am11
Copy link
Member Author

am11 commented Jan 23, 2026

Ported TrailingEpilogueInstructionsCount from NativeAOT. The signature is a bit different, so it's a copy for now.

@jkotas
Copy link
Member

jkotas commented Jan 23, 2026

Ported TrailingEpilogueInstructionsCount from NativeAOT.

Why is that better than depending on RtlUnwind?

We have hard dependency on Windows unwind format and I do not think you are trying to "fix" that, so depending on RtlUnwind should not be a problem.

@jkotas
Copy link
Member

jkotas commented Jan 23, 2026

Also, unwinding by disassembling the instruction stream is dirty. There are newer versions of the Windows x64 unwind info format that avoid disassembling the instruction stream. I expect we will upgrade at some point.

@am11
Copy link
Member Author

am11 commented Jan 23, 2026

The goal was to match NativeAOT's approach of not relying on any OS unwinder for these paths. If we're okay with keeping RtlVirtualUnwind here, that works.

@janvorli
Copy link
Member

I would also prefer relying on the RtlVirtualUnwind. I don't want to get into the problems we've hit couple of times on x86 in the past when a compiler change has broken unwinding for some native methods.

TADDR framePtr = GetFP(pUnwindContext);
for (int i = 0; i < kMaxUnwindIterations && framePtr != 0 && !unwound; i++)
{
TADDR returnAddr = *(TADDR*)(framePtr + sizeof(void*));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be a lot less reliable compared to what we have here today. It is common to have stale left-over addresses in the stack memory that are no longer part of the current call chain.

(I do not have good ideas about what to do about it.)

Copy link
Member Author

@am11 am11 Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a best-effort fallback for stackoverflow occurred in a frameless FCall. In case of broken chain, we see Stack overflow. without the stack trace.

Alternatively, we could have two fields in thread TADDR m_pFCallCallerIP, m_pFCallCallerSP; // 16-bytes, populated on FCall entry to make it fully deterministic. That may or may not overlap with the approach we were thinking for QCall #123482?

Copy link
Member

@jkotas jkotas Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand that this is best effort. I do not think we want to take the diagnostic experience regression compared to current state. We have been getting regular feedback about poor stackoverflow diagnostic experience before @janvorli implemented what we have currently.

In case of broken chain, we see Stack overflow. without the stack trace.

Or this will crash with seg fault by trying to walk broken chain, and you won't see any error.

Alternatively, we could have two fields in thread TADDR m_pFCallCallerIP, m_pFCallCallerSP; // 16-bytes, populated on FCall entry to make it fully deterministic.

This would cause perf regressions (performance is the only reason why FCalls exist), it would not be reliable (the stackwalker needs more than just IP/SP in general case), and it would not handle PInvokes with SuppressGCTransition that have the same problem as FCalls.

Copy link
Member Author

@am11 am11 Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed based on this repro:

using System;
using System.Runtime.CompilerServices;
using System.Threading;

class Program
{
    static void Main()
    {
        Console.WriteLine("FCALL Stack Overflow Test");
        Console.WriteLine("=========================");
        Console.WriteLine();
        Console.WriteLine("This test will cause a stack overflow in an FCALL (GC.MaxGeneration)");
        Console.WriteLine("The stack trace should show the managed call chain leading to the FCALL.");
        Console.WriteLine();
        Console.WriteLine("Make sure DOTNET_TestFCallStackOverflow=1 is set to enable the test!");
        Console.WriteLine();
        
        // Use a larger stack size to allow normal operations, but still small enough
        // that the recursive FCALL will overflow quickly
        Console.WriteLine("Starting test on a new thread with 256KB stack...");
        Console.WriteLine();
        
        Thread testThread = new Thread(RunTest, maxStackSize: 256 * 1024);
        testThread.Start();
        testThread.Join();
        
        Console.WriteLine();
        Console.WriteLine("Test completed (should not reach here if SO occurred).");
    }
    
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void RunTest()
    {
        Console.WriteLine("Test thread started");
        Level1();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Level1()
    {
        Console.WriteLine("Entering Level1");
        Level2();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Level2()
    {
        Console.WriteLine("Entering Level2");
        Level3();
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Level3()
    {
        Console.WriteLine("Entering Level3 - about to call GC.MaxGeneration (FCALL)");
        
        // GC.MaxGeneration is an FCALL - direct call to native with no Frame
        // When DOTNET_TestFCallStackOverflow=1, the native FCALL will overflow the stack
        // This tests whether HandleFatalStackOverflow produces proper stack traces for FCALLs
        int maxGen = GC.MaxGeneration;
        
        Console.WriteLine("GC.MaxGeneration returned: " + maxGen.ToString());
    }
}

with:

diff --git a/src/coreclr/vm/comutilnative.cpp b/src/coreclr/vm/comutilnative.cpp
index 27ea6fa75b4..69be4e717cb 100644
--- a/src/coreclr/vm/comutilnative.cpp
+++ b/src/coreclr/vm/comutilnative.cpp
@@ -872,10 +872,66 @@ extern "C" void QCALLTYPE GCInterface_WaitForPendingFinalizers()
 **Arguments: None
 **Exceptions: None
 ==============================================================================*/
+
+
+#if defined(_MSC_VER)
+  #define NOINLINE __declspec(noinline)
+#elif defined(__GNUC__) || defined(__clang__)
+  #define NOINLINE __attribute__((noinline))
+#else
+  #define NOINLINE
+#endif
+NOINLINE
+int FCallStackOverflowTestHelper(void) {
+    // ~542 MiB on the stack
+    volatile unsigned char buf[555512u * 1024u];
+
+    // Touch first/last and walk by page to ensure full probing/touch
+    size_t size = sizeof(buf);
+    buf[0] = 1;
+    buf[size - 1] = 1;
+
+    // Walk by 4096-byte pages to force page faults/page commits
+    volatile uint64_t sum = 0;
+    for (size_t i = 0; i < size; i += 4096) {
+        sum += buf[i];
+    }
+
+    return (int)(sum + buf[0] + buf[size - 1]);
+}
+
+
+void log(const char *msg) {
+    FILE *f = fopen("/tmp/foo.txt", "a");
+    if (!f) {
+        perror("fopen");
+        return;
+    }
+
+    fprintf(f, "%s\n", msg);
+    fclose(f);
+}
+
+
 FCIMPL0(int, GCInterface::GetMaxGeneration)
 {
     FCALL_CONTRACT;

+    // TEST ONLY: Check for environment variable to trigger stack overflow test
+    // Set DOTNET_TestFCallStackOverflow=1 to enable
+    static int testMode = -1;
+    if (testMode == -1)
+    {
+        const char* env = getenv("DOTNET_TestFCallStackOverflow");
+        testMode = (env != nullptr && env[0] == '1') ? 1 : 0;
+    }
+    if (testMode == 1)
+    {
+        log("here we are\n");
+        return FCallStackOverflowTestHelper();
+    }
+
+        log("not really\n");
     return(INT32)GCHeapUtilities::GetGCHeap()->GetMaxGeneration();
 }
 FCIMPLEND
> $env:DOTNET_TestFCallStackOverflow=1; .\corerun.exe tester.dll
FCALL Stack Overflow Test
=========================

This test will cause a stack overflow in an FCALL (GC.MaxGeneration)
The stack trace should show the managed call chain leading to the FCALL.

Make sure DOTNET_TestFCallStackOverflow=1 is set to enable the test!

Starting test on a new thread with 256KB stack...

Test thread started
Entering Level1
Entering Level2
Entering Level3 - about to call GC.MaxGeneration (FCALL)
Stack overflow.
   at Program.Level3()
   at Program.Level2()
   at Program.Level1()
   at Program.RunTest()
   at System.Threading.Thread.StartCallback()

(matches main behavior)

before 64cc6cf I wasn't getting anything after Stack overflow. line.

else
{
Frame* pFrame = pThread->GetFrame();
if (pFrame != FRAME_TOP && InlinedCallFrame::FrameHasActiveCall(pFrame))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This condition will often cause the interesting part of the stacktrace to be omitted. For example:

using System;
using System.Runtime.InteropServices;
using System.Threading;

unsafe
{
    delegate* unmanaged<void> pfn = &F;
    pfn();
}

[UnmanagedCallersOnly]
static void F()
{
    M().ToString();
}

static long M()
{
    Thread.SpinWait(1);
    long ret = M();
    Thread.SpinWait(1);
    return ret;
}

Before this PR:

C:\runtime\artifacts\bin\coreclr\windows.x64.Release>corerun test.exe
Stack overflow.
   at System.Threading.Thread.SpinWait(Int32)
Repeated 24115 times:
--------------------------------
   at Program.<<Main>$>g__M|0_1()
--------------------------------
   at Program.<<Main>$>g__F|0_0()
   at Program.<Main>$(System.String[])

With this PR:

Stack overflow.
   at Program.<Main>$(System.String[])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I wasn't able to repro it on win-x64, but on linux-x64 (wsl), it was reproduced. With 747051b, it matches main.

const int kMaxUnwindIterations = 64;

#if defined(TARGET_ARM)
// On ARM32, the return address is in LR, not on the stack.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems arm64 would behave the same way as arm w.r.t. this.

// GetFP() returns the appropriate register for each architecture:
// x64: RBP, ARM64: X29, ARM32: R7, LoongArch64/RISC-V64: FP
TADDR framePtr = GetFP(pUnwindContext);
for (int i = 0; i < kMaxUnwindIterations && framePtr != 0 && !unwound; i++)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is still a case when this won't work as before - when the stack overflow occurs while storing the FP in a FCALL. Then the FP would contain the value from the caller, which doesn't have to be a frame pointer on Windows. And on Unix, it would contain the FP from the managed caller, so it seems this would skip one managed frame.
Maybe we can detect and handle it here? It seems that this would only happen when the stack overflow happens at the first instruction of a FCALL. dladdr could be used for checking if the address is the first instruction in a function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that there is still a case when this won't work as before

I think there are many different cases where this won't work as before. You cannot really make assumption about the shape of the native code. Note that this path applies to P/Invokes with SuppressGCTransition as well that can be arbitrary used provided native code.

}
}

if (unwound && ExecutionManager::IsManagedCode(GetIP(pUnwindContext)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to call the ExecutionManager::IsManagedCode here? It seems that unwound is set only when we know the pUnwindContext points to managed frame.

// This implies that we should have got a personality routine returned from the call to
// RtlVirtualUnwind above.
//
// However, if the ControlPC happened to be in the prolog or epilog of a managed method,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the comment seems to be deleted by accident.

// Move to caller's frame
framePtr = *(TADDR*)framePtr;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if the issue fixed in #81770 and #81956 is still a problem. If it is, we should also check the g_hostingApiReturnAddress like we did in the UnwindToFirstManagedFrame.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area-VM-coreclr community-contribution Indicates that the PR has been added by a community member

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants