[clr-interp] Support for breakpoints and stepping#123251
[clr-interp] Support for breakpoints and stepping#123251matouskozak wants to merge 18 commits intodotnet:mainfrom
Conversation
ed5ce4c to
dce1adc
Compare
There was a problem hiding this comment.
Pull request overview
This work-in-progress pull request adds support for managed debugger breakpoints in the CoreCLR interpreter. The changes extend the existing user breakpoint support (e.g., Debugger.Break()) to support IDE breakpoints and enable setting breakpoints when the program is stopped.
Changes:
- Adds interpreter single-step thread state flag and supporting methods
- Introduces new
INTOP_SINGLESTEPopcode for step-over operations - Implements
InterpreterWalkerto analyze interpreter bytecode for debugger stepping - Modifies breakpoint execution logic to distinguish between IDE breakpoints and step-out breakpoints
- Enables JIT completion notifications for interpreter code
- Pre-inserts IL offset 0 entry in the IL-to-native map to support method entry breakpoints
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| src/coreclr/vm/threads.h | Adds TSNC_InterpreterSingleStep thread state flag and related methods |
| src/coreclr/vm/jitinterface.cpp | Removes interpreter code exclusion from JITComplete notifications |
| src/coreclr/vm/interpexec.cpp | Implements breakpoint and single-step handling with opcode replacement |
| src/coreclr/vm/codeman.h | Adds IsInterpretedCode() helper method |
| src/coreclr/interpreter/intops.h | Adds helper functions to classify interpreter opcodes |
| src/coreclr/interpreter/inc/intops.def | Defines INTOP_SINGLESTEP opcode |
| src/coreclr/interpreter/compiler.cpp | Pre-inserts IL offset 0 mapping for method entry breakpoints |
| src/coreclr/debug/ee/interpreterwalker.h | Declares InterpreterWalker class for bytecode analysis |
| src/coreclr/debug/ee/interpreterwalker.cpp | Implements bytecode walker for debugger stepping operations |
| src/coreclr/debug/ee/functioninfo.cpp | Uses GetInterpreterCodeFromInterpreterPrecodeIfPresent for code address |
| src/coreclr/debug/ee/executioncontrol.h | Defines BreakpointInfo structure and GetBreakpointInfo method |
| src/coreclr/debug/ee/executioncontrol.cpp | Implements INTOP_SINGLESTEP patch support and breakpoint info retrieval |
| src/coreclr/debug/ee/controller.h | Includes interpreterwalker.h header |
| src/coreclr/debug/ee/controller.cpp | Implements TrapStep for interpreter using InterpreterWalker |
| src/coreclr/debug/ee/CMakeLists.txt | Adds interpreterwalker source files to build |
src/coreclr/debug/ee/controller.cpp
Outdated
| { | ||
| // Indirect call (CALLVIRT, CALLI, CALLDELEGATE) - cannot determine target statically | ||
| // Use JMC backstop to catch method entry | ||
| // TODO: Could we do better? Why we can't use StubManagers to trace indirect calls? |
There was a problem hiding this comment.
You can StubManagers if there is one that recognizes the code pattern being used to make the indirect call. If the indirect call doesn't need too many instructions to reach the destination you could also enable single-stepping and get there that way.
| _ASSERTE(pExecControl != NULL); | ||
| return pExecControl->ApplyPatch(patch); | ||
| } | ||
| IExecutionControl* pExecControl = codeInfo.GetJitManager()->GetExecutionControl(); |
There was a problem hiding this comment.
I'm happy for this to be lower priority refactoring, but ideally we could demonstrate the ExecutionControl abstraction is a good one by having native code use it too. Right now we effectively have the native implementation of the abstraction inlined everywhere. As an end goal it would be nice to have a code path that looked similar to this:
bool DebuggerController::ApplyPatch(...)
{
...
// no interpretter ifdef here
IExecutionControl* pExecControl = g_pNativeCodeExecutionControl;
EECodeInfo codeInfo((PCODE)patch->address);
if(codeInfo.IsValid())
{
pExecControl = codeInfo.GetJitManager()->GetExecutionControl();
}
_ASSERTE(pExecControl != NULL);
return pExecControl->ApplyPatch(patch);
}
There was a problem hiding this comment.
I agree, that was my plan as well #120842. Do you prefer to make it in the same PR or as a separate one? (similar to #123251 (comment))
There was a problem hiding this comment.
totally fine if its separate, just wanted to make sure we agreed on the direction 👍
|
|
||
| #ifdef FEATURE_INTERPRETER | ||
| // For interpreter code, native offset 0 is within the bytecode header area and cannot | ||
| // have a breakpoint. Use the first sequence map entry's native offset instead. |
There was a problem hiding this comment.
Having a field called m_addrOfCode that doesn't point to the first instruction is unexpected and breaks the invariant all the other jitted code has. Typically in the runtime any type that is PCODE or methods/fields expressly named "Code" with a neutral type such CORDB_ADDRESS, TADDR, void* means a pointer to an opcode. When the runtime wants to point to a header in front of those opcodes we use a different explicit type like CodeHeader.
struct CodeHeader
{
PTR_RealCodeHeader pRealCodeHeader;
...
TADDR GetCodeStartAddress()
{
return dac_cast<PCODE>(dac_cast<PTR_CodeHeader>(this) + 1);
}We should strongly consider using the address of the opcodes and not the address of the header everywhere we refer to the abstraction as being 'code' or 'bytecode'. In addition to the potential for the confusion to cause runtime bugs and interpreter special casing, these pointers also get exported outside the runtime via PerfMaps, ETW events, EventPipe events, ICorProfiler APIs, and ICorDebug APIs. Its entirely possible that some diagnostic tools will start looking at the memory present at the code pointer which means the header becomes part of the de-facto diagnostics contract for interpreter code and it is difficult to change. We usually try to keep these implicit interfaces between diagnostic tools and the runtime as free of extraneous implementation details as possible.
There was a problem hiding this comment.
That's a good point. I've created an issue for it #123998 and I'll do the changes in a separate PR because there might be a lot of places which needs adjusting. It might be worth doing this change + support for breakpoints in one PR + the single-stepping logic in another.
src/coreclr/debug/ee/controller.cpp
Outdated
| { | ||
| #ifdef FEATURE_INTERPRETER | ||
| // Interpreter patches don't need DebuggerPatchSkip - the interpreter | ||
| // uses GetOriginalOpcode() to read the saved opcode from the patch table |
There was a problem hiding this comment.
How does the interpreter know when it should be generating the breakpoint behavior vs generating the original opcode behavior? I think we'd want to have that as part of an explicit contract so that the debugger can control when it happens.
There was a problem hiding this comment.
I added the thread SetInterpreterBypass to set the bypass opcode if we need to (in ActivatePatchSkip).
| CONSISTENCY_CHECK_MSGF(context != NULL, ("Can't apply ss flag to thread 0x%p b/c it's not in a safe place.\n", thread)); | ||
| _ASSERTE(context != NULL); | ||
|
|
||
| #ifdef FEATURE_INTERPRETER |
There was a problem hiding this comment.
We should probably figure out how we want to handle single-stepping first, but this seems like another area similar to breakpoints where we might have IExecutionControl APIs that handle it differently depending on the code type.
| patch->opcode = currentOpcode; // Save original opcode | ||
|
|
||
| // Check if this is a single-step patch by looking at the controller's thread's interpreter SS flag. | ||
| Thread* pThread = patch->controller->GetThread(); |
There was a problem hiding this comment.
I think of this interface as being an abstraction that mirrors how the debugger would normally interact with hardware. By default it would be ApplyPatch(address) and the native implementation would be something like:
VirtualProtect(address, ...); // make it writable
CORDbgInsertBreakpoint(address);
VirtualProtect(address, ...); // make it readonly again
Maybe we'd make it a little more sophisticated so that ApplyPatch is also responsible for saving the old opcode in an output parameter. I would not expect this method to get access to the complete DebuggerPatch object or for the implementation to look at fields like patch->controller. If the debugger needs to do single stepping I'd expect either:
- The debugger uses an explicit API different from this one to turn single stepping on. The interpreter could use special opcodes as an implementation detail if it wanted to, but it would be responsible for doing all the bookkeeping on its own (without using DebuggerPatchTable).
- The debugger emulates single-stepping behavior using an IExecutionControl API that only has breakpoints. In that case the debugger would call this API to set a INTOP_BREAKPOINT instruction. Later when the interpretter sends back the BreakpointHit callback the debugger code would be responsible for looking up the patch at that address and interpretting the breakpoint as the completion of a single step operation. Other architectures do that single step emulation by copying the instruction to a separate buffer but it could also be done inline if the interpretter is going to have a special feature that enables executing the original opcode on a per-thread basis.
There was a problem hiding this comment.
The debugger emulates single-stepping behavior using an IExecutionControl API that only has breakpoints. In that case the debugger would call this API to set a INTOP_BREAKPOINT instruction.
Makes sense, I refactored the code to only use INTOP_BREAKPOINT.
Later when the interpretter sends back the BreakpointHit callback the debugger code would be responsible for looking up the patch at that address and interpretting the breakpoint as the completion of a single step operation.
I think this should happen naturally and get handled in
runtime/src/coreclr/debug/ee/controller.cpp
Lines 7420 to 7449 in 35b1dd2
Other architectures do that single step emulation by copying the instruction to a separate buffer but it could also be done inline if the interpretter is going to have a special feature that enables executing the original opcode on a per-thread basis.
I've added a thread storage for the original opcode which gets retrieved during the interpreter execution loop and re-executed when needed.
Does this align with what you had in mind?
There was a problem hiding this comment.
a thread storage for the original opcode which gets retrieved during the interpreter execution loop
You can have multiple threads running the same interpreter code. What happens when a second thread hits a breakpoint and the original code is saved in a thread local storage of some other thread?
There was a problem hiding this comment.
My understanding is, that if non-stepping threads hits INTOP_BREAKPOINT, they would still call to debugger but they will NOT trigger MatchPatch flow which calls TriggerPatch which sends debugger notification (only the thread which is stepping should do that), instead it will jump to ActivatePatchSkip which sets the bypass opcode (reads it from the patch) and allows non-stepping threads to continue execution.
It's not the most efficient solution, we could optimize this by having some kind of book-keeping solution with {address, original opcode} on the side which would be accessible in the interpreter execution loop to fast track non-stepping threads bypass flow.
src/coreclr/vm/interpexec.cpp
Outdated
| InterpBreakpoint(ip, pFrame, stack, pInterpreterFrame); | ||
| break; | ||
| Thread* pThread = GetThread(); | ||
| bpInfo = execControl->GetBreakpointInfo(ip); |
There was a problem hiding this comment.
I think we want to get rid of this GetBreakpointInfo() call. Ideally at the interpreter layer all breakpoints will be handled uniformly. To get there:
- bpInfo.isStepOut - The interpreter shouldn't be responsible for figuring out what a breakpoint is used for. I see this flag is being used to control some IP adjustment implicitly via the FaultingExceptionFrame but I don't think we want the interpreter doing that either. Instead we can probably just make a contract that the interpreter always reports the IP as being the address where the breakpoint was set. If the debugger needs to sometimes adjust the IP and sometimes not it can be responsible to figure out which breakpoint addresses need that treatment after receiving the Breakpoint callback. It already has access to all the same info this GetBreakpointInfo() API is looking up. Fwiw, I don't yet understand why an adjustment is needed in this case or why it would be conditional, but if there is something conditional that needs to be done the debugger code seems like the right place for it.
- bpInfo.originalOpcode - We probably need a more generalized mechanism for the debugger to tell the interpreter when to execute the original opcode. For example we could have some special fields in the CONTEXT only used by interpreter that stores an address and an opcode. The meaning of those fields would be "if the opcode at address X is a breakpoint, execute opcode Y instead". This allows the debugger to enable the breakpoint bypass behavior at any time, not just after a breakpoint was previously hit. The debugger doesn't always skip breakpoints immediately after hitting them. We might need to define some explicit InterpretterCONTEXT or Get/SetXYZ() functions that define where these fields get placed in the CONTEXT data blob.
There was a problem hiding this comment.
For example we could have some special fields in the CONTEXT only used by interpreter that stores an address and an opcode. The meaning of those fields would be "if the opcode at address X is a breakpoint, execute opcode Y instead". This allows the debugger to enable the breakpoint bypass behavior at any time, not just after a breakpoint was previously hit. The debugger doesn't always skip breakpoints immediately after hitting them. We might need to define some explicit InterpretterCONTEXT or Get/SetXYZ() functions that define where these fields get placed in the CONTEXT data blob.
@noahfalk Do we need to keep multiple address + opcode pairs or is one enough? I'm thinking that we could just record the current address + opcode when we call ActivatePatchSkip but not sure if there is a scenario where this would break.
If we don't need more than one pair, I was thinking we could store it on the thread like thread->SetInterpreterBypass((const int32_t*)PC, (int32_t)patch->opcode); and then check in interpexec we would just get the opcode and execute it.
There was a problem hiding this comment.
I'm ~75% confident that storing a single address will be sufficient
| SetSP(&ctx, (DWORD64)pFrame); | ||
| SetFP(&ctx, (DWORD64)stack); | ||
| SetIP(&ctx, (DWORD64)ip); | ||
| SetIP(&ctx, (DWORD64)ip); |
There was a problem hiding this comment.
Its fine if its not yet implemented, but just wanted to give a heads up that eventually debugger scenarios are going to mutate the CONTEXT values inside the callback to do things like SetIP, changing to a different stack frame, or changing the value of a local variable. Some of these values will then need to be read back from the CONTEXT after the FirstChanceNativeException callback and have their new value honored by the interpreter when execution resumes.
| if (interpreterFrameAddr != 0) | ||
| { | ||
| Frame *pFrame = (Frame*)interpreterFrameAddr; |
There was a problem hiding this comment.
The code casts interpreterFrameAddr to a Frame pointer without verifying that the address is valid or properly aligned. If the first arg register contains garbage data or was not properly set up, this could lead to undefined behavior when calling GetFrameIdentifier. Consider adding validation such as null checks or alignment checks before dereferencing the Frame pointer.
| if (interpreterFrameAddr != 0) | |
| { | |
| Frame *pFrame = (Frame*)interpreterFrameAddr; | |
| // Validate that the interpreter frame address is non-zero and suitably aligned | |
| // before treating it as a Frame pointer. | |
| if (interpreterFrameAddr != 0 && (interpreterFrameAddr % sizeof(void*)) == 0) | |
| { | |
| Frame *pFrame = reinterpret_cast<Frame *>(interpreterFrameAddr); |
- New InterpreterWalker class decodes bytecode control flow for stepping - Update TrapStep to use InterpreterWalker for interpreted code - Add per-thread TSNC_InterpreterSingleStep flag for step tracking - ApplyPatch now uses INTOP_SINGLESTEP vs INTOP_BREAKPOINT based on flag - Handle INTOP_SINGLESTEP in interpreter execution loop
- needed for step-in support in virtual calls
e7b0222 to
bd2a935
Compare
18cd651 to
c58d2b3
Compare
| { | ||
| LOG((LF_CORDB, LL_INFO1000, "InterpreterEC::ApplyPatch Patch already applied at %p\n", | ||
| patch->address)); | ||
| return false; |
There was a problem hiding this comment.
InterpreterExecutionControl::ApplyPatch returns false when the current opcode is already INTOP_BREAKPOINT. Callers (e.g., DebuggerController::ActivatePatch) expect ApplyPatch to activate the patch or they will _ASSERTE(patch->IsActivated()). If the bytecode can already be patched (e.g., leftover patch, race, or external patching), this path can assert and/or leave the patch inactive. Consider treating this as success by marking the patch activated and retrieving the original opcode from the patch table (or otherwise ensuring patch->opcode is set consistently).
| return false; | |
| // Treat this as a successful activation; keep the existing opcode. | |
| patch->opcode = currentOpcode; | |
| patch->m_interpActivated = true; | |
| return true; |
| #ifdef FEATURE_INTERPRETER | ||
| // For interpreter code, single-stepping is emulated using breakpoints | ||
| // which are managed by the stepper, so no cleanup needed here. | ||
| if (context != NULL) | ||
| { | ||
| PCODE ip = GetIP(context); | ||
| EECodeInfo codeInfo(ip); | ||
| if (codeInfo.IsValid() && codeInfo.IsInterpretedCode()) | ||
| { | ||
| LOG((LF_CORDB,LL_INFO1000, "DC::UnapplyTraceFlag: interpreter code at IP %p - no hardware flag to clear\n", ip)); | ||
| return; | ||
| } |
There was a problem hiding this comment.
UnapplyTraceFlag returns early when the stopped context IP is interpreted code, which skips clearing the per-thread stepping state (MarkThreadForDebugStepping(thread, false)) and clearing the hardware single-step flag via UnsetSSFlag. If the trace flag was enabled before transitioning into interpreter code, this can leave the thread permanently marked as stepping and/or keep TF set, causing repeated single-step exceptions in unmanaged interpreter runtime code. Consider always clearing the stepping state (and clearing TF when a context is available) even when the current IP is interpreted code.
src/coreclr/debug/ee/controller.cpp
Outdated
| // * ------------------------------------------------------------------------- | ||
| // Emulates single-stepping for interpreter code using breakpoint patches. | ||
|
|
||
| InterpreterSingleStepEmulator::InterpreterSingleStepEmulator( |
There was a problem hiding this comment.
The abstraction is a bit different than what I was thinking of at first, but we could keep the functionality and adjust the naming/code structure.
For a single step emulator I was thinking of what FEATURE_EMULATE_SINGLESTEP does in threads.h. The debugger calls SetSSFlag() or UnsetSSFlag() the same as it would when dealing with hardware single stepping. In particular the interface to enable the emulator doesn't provide any information about call targets or what broader stepping operation is in progress. Instead the contract is just to execute the next opcode and then raise a SINGLE_STEP_EXCEPTION afterwards.
The code here looks more like a helper that implements stepping through managed code. This approach should work fine as long as there is no alternate code path that reaches DebuggerController::EnableSingleStep() or SetSSFlag() when pointing at interpreter code. I think you already have an alternate solution for PatchSkipper, you don't need to support data breakpoints, and func-eval shouldn't need to re-enable single stepping if it wasn't enabled when the func-eval started. Assuming you don't find any other code paths that lead into the single-stepping code then I'd suggest refactor this code into TrapInterpreterCodeStep() or as a helper function called by that method. I wouldn't call this code a single step emulator. When describing the functionality in docs or comments we could say something like:
"The runtime doesn't support single-stepping, nor do we emulate it in a general-purpose way. Instead we we've implemented alternate solutions for debugger scenarios that traditionally rely upon it. For breakpoint skipping the interpreter allows executing an alternate opcode on a per-thread basis. For stepping we primarily use control flow prediction + breakpoints. For indirect call and branch instructions that are unpredictable in multi-threaded scenarios we emulate their effect on the current CONTEXT."
| #ifdef FEATURE_INTERPRETER | ||
| // Interpreter opcodes can have value 0 (e.g., INTOP_RET), so we need a separate flag | ||
| // to track activation state since PRDIsEmpty() returns true for opcode 0. | ||
| // TODO: We should consider using the activated flag for all patches and stop using opcode == 0 as a special case for "not active". |
There was a problem hiding this comment.
yeah, it would be better to do that for all patches rather than treating this an interpreter special case.
There was a problem hiding this comment.
Agreed, I've created an issue for this #124499
src/coreclr/vm/threads.h
Outdated
| int32_t m_interpBypassOpcode; // Original opcode to execute instead of INTOP_BREAKPOINT | ||
|
|
||
| public: | ||
| void SetInterpreterBypass(const int32_t* address, int32_t opcode) |
There was a problem hiding this comment.
Doing this per-thread rather than per-CONTEXT means that places which modify the current CONTEXT also need to save and restore this state as side-band information. Single-step emulation on hardware architectures that need it already does the same thing. Check out:
runtime/src/coreclr/debug/ee/funceval.cpp
Lines 3863 to 3876 in 3deccf6
There was a problem hiding this comment.
I've added it to the InterpMethodContextFrame which eventually gets packaged into the CONTEXT.
Replace INTOP_SINGLESTEP with per-frame bypass, extract InterpreterStepHelper - Remove INTOP_SINGLESTEP opcode and per-thread single-step flag (TSNC_InterpreterSingleStep). Single-stepping is now fully breakpoint-based via control flow prediction. - Add per-frame breakpoint bypass mechanism (InterpMethodContextFrame:: SetBypass/HasBypass/ClearBypass). ActivatePatchSkip sets bypass on the frame via IExecutionControl::BypassPatch; interpreter checks it post-callback and executes the original opcode without re-hitting the breakpoint. - Extract InterpreterStepHelper from TrapStep inline code into a reusable class with SetupStep() that analyzes bytecode and places breakpoint patches at predicted next instructions. TrapStep now delegates to TrapInterpreterCodeStep for interpreter code. - Add m_interpActivated flag to DebuggerControllerPatch to track activation state independently of opcode value (INTOP_RET == 0 makes PRDIsEmpty unreliable for interpreter patches). - Simplify InterpBreakpoint: remove isStepOut parameter and FaultingExceptionFrame path. Remove GetBreakpointInfo from InterpreterExecutionControl. - Add bounds assert in EmitBBCode for IL-to-native map.
c58d2b3 to
66141e7
Compare
Summary
This PR adds debugger support for the CoreCLR interpreter, enabling IDE breakpoints and single-stepping functionality.
Key Changes
Breakpoint Support:
INTOP_BREAKPOINTopcode injectionApplyPatch/UnapplyPatchfor interpreter code patchesINTOP_RET)Single-Stepping Support:
InterpreterStepHelperclass to encapsulate step setup logicTrapInterpreterCodeStepfor step-in/step-over/step-outOnMethodEnterto notify debugger (needed for step-in on virtual calls)Interpreter Compiler Changes:
Stack Walking:
Testing
Notes/TODOs
PRDIsEmpty(opcode)activation check with explicitm_activatedflag #124499HardwareExecutionControlderived class, createInterpreterExecutionControlfor interpreter support