Skip to content

Conversation

@nattress
Copy link
Contributor

@nattress nattress commented Jul 10, 2020

GC stress runs with Crossgen2 binaries frequently see AVs with stack-walk failures. This occurs when a delay load thunk is being executed as a stack-walk is performed. The runtime resolves any thunk address to the code info for the last managed method in the runtime functions table.

Fix the binary search algorithm in NativeUnwindInfoLookupTable::LookupUnwindInfoForMethod which doesn't correctly return -1 if the code address being searched for is larger than any present. This old bug was not hit because Crossgen emits stubs earlier in the image than managed code, whereas the two are reversed with Crossgen2.

Fix generation of the RuntimeFunctionsTable so that the terminating sentinel node is not treated as part of the table in the R2R header entry for the table. This was causing the sentinel node to be treated as its own runtime function by the runtime and could lead to reading beyond the table bounds in NativeUnwindInfoLookupTable::LookupUnwindInfoForMethod. The bug in the binary search of the table hid this bug until now.

Add the DelayLoadMethodCallThunks header entry. DelayLoadMethodCallThunkNodeRange is a placeholder node (not an ObjectNode) to provide a symbol to relocate against from the header. It represents the range of import thunks. Add support to the object writer for defining a range between two symbols in the same section. The relocation phase can then resolve the difference between the two symbols to determine the range size.

Fixes #38482

cc @dotnet/crossgen-contrib

@nattress nattress changed the title [Crossgen2] Add DelayLoadMethodCallThunks table, fix stack walks cont… [Crossgen2] Add DelayLoadMethodCallThunks table, fix stack walks containing thunks Jul 10, 2020
@nattress nattress requested review from janvorli and trylek July 10, 2020 08:46
…aining thunks

GC stress runs with Crossgen2 binaries frequently see AVs with stack-walk failures. This occurs when a delay load thunk is being executed as a stack-walk is performed. The runtime resolves any thunk address to the code info for the last managed method in the runtime functions table.

Fix the binary search algorithm in `NativeUnwindInfoLookupTable::LookupUnwindInfoForMethod` which doesn't correctly return -1 if the code address being searched for is larger than any present. This old bug was not hit because Crossgen emits stubs earlier in the image than managed code, whereas the two are reversed with Crossgen2.

Fix generation of the RuntimeFunctionsTable so that the terminating sentinel node is not treated as part of the table in the R2R header entry for the table. This was causing the sentinel node to be treated as its own runtime function by the runtime and could lead to reading beyond the table bounds in `NativeUnwindInfoLookupTable::LookupUnwindInfoForMethod`. The bug in the binary search of the table hid this bug until now.

Add the `DelayLoadMethodCallThunks` header entry. `DelayLoadMethodCallThunkNodeRange` is a placeholder node (not an `ObjectNode`) to provide a symbol to relocate against from the header. It represents the range of import thunks. Add support to the object writer for defining a range between two symbols in the same section. The relocation phase can then resolve the difference between the two symbols to determine the range size.

Fixes dotnet#38482
@nattress nattress force-pushed the importthunk_stackwalking_noderange branch from 643da86 to c8361cc Compare July 10, 2020 09:26
Copy link
Member

@davidwrighton davidwrighton left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could add the additional check I suggested, I consider this a great change.

Copy link
Member

@trylek trylek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great modulo some nits and David's feedback, thanks so much Simon for making these fixes and improvements!

{
Debug.Assert(_tableSize >= 0);
return _tableSize;
return _tableSize - 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit - it's kind of awkward that we need to duplicate this logic between this place and the section table emitter. Perhaps we might be able to shove some virtual method like EndSentinelSize or SizeDelta into HeaderTableNode and use it for these adjustments. Having said that, I know you're running out of time so please feel free to ignore it, it's no big deal and we can clean this up later if we decide it's worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea - I'll add a SentinelSize property so it's more descriptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually it's more awkward than I thought. Somewhere along the way, we stopped treating header table component nodes as HeaderTableNodes and in fact I further generalized it to DependencyNodeCode<NodeFactory> in this commit. For now I'll replace the numbers with a const in RuntimeFunctionsTableNode. HeaderTableNode currently just provides a base to factor default values for ObjectNode required overrides. We might want to just refactor it away altogether and use an interface on the header table component nodes to get the small amount of functionality needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, thank you!

{
PTR_RUNTIME_FUNCTION pFunctionEntry = pRuntimeFunctionTable + i;
if (RelativePc >= pFunctionEntry->BeginAddress)
if (RelativePc >= pFunctionEntry->BeginAddress && RelativePc <= pFunctionEntry->EndAddress)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is EndAddress actually a valid address? I would expect this to be the RIP after the "ret" instruction i.e. in the non-reachable filler between adjacent functions that's normally getting filled with the "INT 3" instruction (on Intel). Long story short, should this rather be "<" - especially in view of the fact that it may happen in some cases that there's no filler (i.e. the ret instruction has the three lowest-order RIP bits set to 111) in which case the <= sign could cause a spurious match?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that should be < thanks!

nattress and others added 3 commits July 10, 2020 15:27
* Pass image base address to `LookupUnwindInfoForMethod` and correct the runtime function EndAddress lookup to use the standard macro. The image base is needed on non-x64 platforms where end address isn't just part of the `RuntimeFunction` struct. Fortunately, all the callsites of `LookupUnwindInfoForMethod` had the image base easily available 🙌

* Assert if the ImportThunks are not emitted in a single run

* Replace the RuntimeFunctionsTable sentinel adjustment with a const
@davidwrighton
Copy link
Member

This needs a fix for Windows X86. I've taken a look and it looks practical to use the GCInfo to provide a function size. Does that sound reasonable to everyone else? As @nattress is currently unavailable, I could put that fix together.

@mangod9
Copy link
Member

mangod9 commented Jul 14, 2020

both Simon and @trylek are out this week. If this can wait Tomas might be able make the changes then.

@davidwrighton
Copy link
Member

We need to get this in for Preview 8, as we might get some usage on that, and the bug is very severe. I'll see if I can swing it tomorrow.

@jkotas
Copy link
Member

jkotas commented Jul 14, 2020

Would it be better to handle the thunk problem by excluding the thunk range (ReadyToRunSectionType::DelayLoadMethodCallThunks) early?

Cracking the unwind info or gc info feels expensive.

@davidwrighton
Copy link
Member

I like the @jkotas approach better. That's cheap, and avoids handling things specially for only 1 function. We can put the check in JitCodeToMethodInfo.

@davidwrighton davidwrighton self-assigned this Jul 14, 2020
if (pNgenLayout->m_CodeSections[iRange].IsInRange((TADDR)addr))
{
int MethodIndex = NativeUnwindInfoLookupTable::LookupUnwindInfoForMethod(rva, pNgenLayout->m_pRuntimeFunctions[iRange], 0, pNgenLayout->m_nRuntimeFunctions[iRange] - 1);
int MethodIndex = NativeUnwindInfoLookupTable::LookupUnwindInfoForMethod(PTR_TO_TADDR(m_decoder.GetBase()), rva, pNgenLayout->m_pRuntimeFunctions[iRange], 0, pNgenLayout->m_nRuntimeFunctions[iRange] - 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this?

@ghost ghost locked as resolved and limited conversation to collaborators Dec 8, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Test failure: readytorun\crossgen2\crossgen2smoke\crossgen2smoke.cmd

8 participants