Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Conversation

@myungjoo
Copy link

Make the assembly-created stack frame compatible with
GDB and LIBUNWIND_ARM.

Trying to fix #3856

Being tested. Trying to explain the approach with this patch.

Signed-off-by: MyungJoo Ham myungjoo.ham@samsung.com

sub sp, #((__PWTB_StackAlloc) + 4 * 4 + 9 * 4)

.ifnc \pushArgRegs, DoNotPushArgRegs
PUSH_ARGUMENT_REGISTERS
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these macros aren't used any more should they be removed also?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. after getting positive test results, I'll clean up the code as well.
Right now, I just wanted to share the idea.


alloc_stack __PWTB_StackAlloc
sub r7, #36
str r4, [r7]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we be using r12 as scratch register here to store the previous sp and then using stm instead to store r4-r11,lr, i.e. stm r12, {r4-r11,lr}?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as we save r4-r11 into the stack, that won't matter.
But, yes, using r12 instead of r7 makes the code look more clean.
Anyway, it appears that it's the custom or protocol of ARM gnueabi compilers (for making it compatible with debuggers or unwinders) to do mov r7 sp after push before sub sp #N (creating space for local variables). Right now, I'm blindly following what my compiler (clang 3.6 arm-linux-gnueabi) spits out with conventional C/CPP codes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh.. I'm not supposed to use R12 in this macro because the user of macro (ThePreStub) uses R12 as an argument, "pMethodDesc". I'll stick to what it is except for the modifications in R7.

@manu-st
Copy link

manu-st commented Mar 31, 2016

I took your file as is but it crashed very early in the initialization process:


Program received signal SIGSEGV, Segmentation fault.
0xb5c88a36 in InterlockedCompareExchangeAcquire (Destination=0xb6557ffc, Exchange=1, Comperand=0) at /home/local/Microsoft/coreclr/src/pal/inc/pal.h:5199
5199        return __sync_val_compare_and_swap(
(gdb) where
#0  0xb5c88a36 in InterlockedCompareExchangeAcquire (Destination=0xb6557ffc, Exchange=1, Comperand=0) at /home/local/Microsoft/coreclr/src/pal/inc/pal.h:5199
#1  0xb5c88bae in ObjHeader::EnterObjMonitorHelper (this=0xb6557ffc, pCurThread=0x4f1b8) at /home/local/Microsoft/coreclr/src/vm/syncblk.inl:71
#2  0xb5c87278 in Object::EnterObjMonitorHelper (this=0xb6558000, pCurThread=0x4f1b8) at /home/local/Microsoft/coreclr/src/vm/object.h:491
#3  0xb5c78a06 in JIT_MonReliableEnter_Portable (obj=0xb6558000, pbLockTaken=0xbe7fea94 "") at /home/local/Microsoft/coreclr/src/vm/jithelpers.cpp:4629
#4  0xb37b48ac in ?? ()

I'll check what could be causing this.

@myungjoo myungjoo force-pushed the testing/fix3856 branch 4 times, most recently from e6cda12 to 1527cf1 Compare March 31, 2016 02:35
@myungjoo
Copy link
Author

The main issue of the current version of the patch is that the data structure created by unixasmmacrosarm.inc is not matching with struct TransitionBlock, which is the argument of PreStubWork().

I'm working on it.

@myungjoo
Copy link
Author

Note: stack frame format broken. trying to become compliant with GDB/Unwind by not touching R7.

@myungjoo
Copy link
Author

Looks like "ThePreStubPatch" is not created as expected; I'm not seeing bx lr:


   �0xb650a928 <ThePreStub+130>             add    sp, #120        ; 0x78      �
   �0xb650a92a <ThePreStub+132>             ldmia.w        sp!, {r7, lr}       �
  >�0xb650a92e <ThePreStub+136>             bx     r12                         �
   �0xb650a930 <ThePreStubPatch>            nop                                �
   �0xb650a932 <ThePreStubPatchLabel>       ldrlt  r4, [pc, #-1904]        ; 0x�
   �0xb650a936 <NDirectImportThunk+2>       vpush  {d0-d7}    

@myungjoo
Copy link
Author

GOT THE WORKAROUND WORKING!

There had been a few missing Assembly directives in unixasmmacrosarm.inc.

I'll start cleaning up the patch because it has way too much debug clutters and unneeded fixes. I now need to start finding out which modification is not required to fix this.

(gdb) bt
#0  PreStubWorker (pTransitionBlock=0xbeffed2c, pMD=0xb5059e74)
    at /usr/src/debug/coreclr-0.0.1/src/vm/prestub.cpp:958
#1  0xb650a8fc in ThePreStub () at asmhelpers.S:850
#2  0xb650a6fe in CallDescrWorkerInternal () at asmhelpers.S:498
#3  0xb63d8634 in CallDescrWorker (pCallDescrData=0xbefff398)
    at /usr/src/debug/coreclr-0.0.1/src/vm/callhelpers.cpp:144
#4  0xb63d84d8 in CallDescrWorkerWithHandler (pCallDescrData=0xbefff398, 
    fCriticalCall=0) at /usr/src/debug/coreclr-0.0.1/src/vm/callhelpers.cpp:87
#5  0xb63d943a in MethodDescCallSite::CallTargetWorker (this=0xbefff4ec, 
    pArguments=0xbefff560)
    at /usr/src/debug/coreclr-0.0.1/src/vm/callhelpers.cpp:632
#6  0xb62d5a94 in MethodDescCallSite::Call (this=0xbefff4ec, 
    pArguments=0xbefff560)
    at /usr/src/debug/coreclr-0.0.1/src/vm/callhelpers.h:420
#7  0xb6511330 in AppDomain::InitializeDomainContext (this=0x437a8, 
    allowRedirects=1, pwszPath=0x0, pwszConfig=0x0)
    at /usr/src/debug/coreclr-0.0.1/src/vm/appdomain.cpp:9944
#8  0xb6510df2 in SystemDomain::InitializeDefaultDomain (allowRedirects=1, 
    pBinder=0x0) at /usr/src/debug/coreclr-0.0.1/src/vm/appdomain.cpp:3504
#9  0xb6510898 in SystemDomain::SetupDefaultDomain ()
    at /usr/src/debug/coreclr-0.0.1/src/vm/appdomain.cpp:3375
#10 0xb6510ef8 in SystemDomain::SetupDefaultDomainNoThrow ()
    at /usr/src/debug/coreclr-0.0.1/src/vm/appdomain.cpp:3398

str r0, [sp , #((__PWTB_StackAlloc) + 9 * 4)]
str r1, [sp , #((__PWTB_StackAlloc) + 9 * 4 + 4)]
str r2, [sp , #((__PWTB_StackAlloc) + 9 * 4 + 8)]
str r3, [sp , #((__PWTB_StackAlloc) + 9 * 4 + 12)]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be changed to stm. (or whatever that does not change sp register)
Change others to use multiple-register instructions as well.

OR

Update PUSH_ARGUMENT_REGISTERS/... not to update SP (or even R7 as well) and remove directives as long as no one uses it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest renaming the PUSH/POP_ARGUMENT_REGISTERS and PUSH / POP_CALLEE_SAVED_REGISTERS to "STORE/RESTORE", change its implementation to match your changes here and to use it here for a better readability of the PROLOG_WITH_TRANSITION_BLOCK macro.
These macros are not used anywhere else, so it should be no problem.

@benpye
Copy link

benpye commented Mar 31, 2016

Great work @myungjoo , I'll be interested to know where I've gone wrong there.

add r6, sp, #(__PWTB_FloatArgumentRegisters)
vstm r6, {d0-d7}
mov r6, sp
vstm r6, {d0 - r7}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be d0-d7 otherwise it does not compile for me.

@manu-st
Copy link

manu-st commented Apr 1, 2016

A side note, instead of using 16, 9 * 4, 8 * 8, it would be good to use SIZEOF__ArgumentRegisters, SIZEOF__FloatArgumentRegisters.. defined in asmconstants.h. This requires changing the order of inclusion of that file in asmhelper.S.

@myungjoo
Copy link
Author

myungjoo commented Apr 1, 2016

Actually, inclusion relations of ARM assembly inc/S/h files are too messy.

We need to reconstruct them. I'll include some minimum reconstruction with next series of patches related with implementation of StartUnwindingNativeFrames().

At this moment, I'm not going to reshuffle inclusion relations. Only relocating include statement in asmhelper.S and leaving others as it is may break other files including unixasmmacro.inc.

@myungjoo myungjoo changed the title [WIP - Do not merge] RFC Fix 3856 / Linux-ARM Fix Linux-ARM Broken Stack of ThePreStub Apr 1, 2016
@myungjoo
Copy link
Author

myungjoo commented Apr 1, 2016

Note that this does not fully resolve #3856, but fixes a major bug in Linux/ARM CoreCLR.

__PWTB_StackAlloc = __PWTB_TransitionBlock

// Make the stack operation compatible with GDB and LIBUNWIND
PROLOG_PUSH "{r7, lr}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack layout created by these macros has to match TransitionBlock in vm\callingconvention.h.

Reordering of the callee saved registers - to save r7 and lr next to each other - should be fine.

Moving argument registers around will have pretty non-trivial consequences. Would it be possible to keep them where they are?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas push {r7, lr} does not break TransitionBlock because the contents for TransitionBlock is created AFTER this: LINE 152, strm r6, {r8 - r11, lr}. And sp + #__PWTB_TransitionBlock still points to the data structure of {r4 - r11, lr, r0 - r3}, which is pointed by $r0 in asmhelper.S.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The arguments passed on the stack have to start immediately after the transition block end. Comment at https://github.com/dotnet/coreclr/blob/master/src/vm/callingconvention.h#L118 refers to this assumption.

This invariant does not hold with this change because of there is r7 and lr between the register passed arguments and stack passed arguments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. My next push will comply with that assumption while being atomic for r7 update and branching.

Question: is it ok to destroy the TransitionBlock data structure at the moment of leaving at EPILOG_WITH_TRANSITION_BLOCK_* ?

@myungjoo myungjoo changed the title [WIP. update coming] Fix Linux-ARM Broken Stack of ThePreStub Fix Linux-ARM Broken Stack w/ ThePreStub Apr 1, 2016
push {r4 - r11, lr}
.save {r4 - r11, lr}

PROLOG_STACK_SAVE r7
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the required action to allow unwind through this frame if this frame calls another function. .save are recommended for unwinder to further understand the register saved.

@janvorli
Copy link
Member

janvorli commented Apr 1, 2016

@myungjoo

Guranteeing r7-pc atomicity (in case there is an exception/GC in the middle of "return" or "tailcall")
is going to be another subject.

This is not strictly necessary. After the PreStubWorker returns and before we actually enter the jitted code, there is no hardware exception allowed - it would lead to the process termination anyways, we would not try to handle it as a hardware exception.
As for GC, if we tried to inject activation at that point (using the INJECT_ACTIVATION_SIGNAL), we would see that we are in native code and return right away doing nothing.
And once the managed code is entered, there is no trace of ThePreStub on the stack thanks to the tail call.
@jkotas I hope I am not missing something here.

@jkotas
Copy link
Member

jkotas commented Apr 1, 2016

@janvorli I agree with your analysis.

@myungjoo
Copy link
Author

myungjoo commented Apr 4, 2016

@janvorli @jkotas Ok. I was especially worried about GC case (such as "stop-the-world" interrupts for GC in the middle of EPILOG_WITH_TRANSITION_BLOCK_RETURN and EPILOG_WITH_TRANSITION_BLOCK_TAILCALL (ThePreStub is only one of such users) However, if my worry is about a case never goig to happen, I'll remove that TODO.

@manu-st
Copy link

manu-st commented Apr 4, 2016

@myungjoo I was playing with the code today, and I think that r7 is supposed to store the frame pointer, that is to say the value of `sp' before pushing any argument. So in my case, I've done:

        .ifnc \pushArgRegs, DoNotPushArgRegs
                PUSH_ARGUMENT_REGISTERS
                PUSH_CALLEE_SAVED_REGISTERS
                .setfp r7, sp, #(13 * 4)
                add r7, sp, #(13 * 4)
        .else
                PUSH_CALLEE_SAVED_REGISTERS
                .setfp r7, sp, #(9 * 4)
                add r7, sp, #(9 * 4)
        .endif

Looking at asmhelpers.S, I see that we save sp into r7 after pushing some stuff on the stack, so r7 is not actually the frame pointer. It might explain the stack corruption too. With the above fix, I could run to completion eh08_small to completion albeit with the following assert violation:

Assert failure(PID 20871 [0x00005187], Thread: 20871 [0x5187]): Consistency check failed: FAILED: state.fFound
    File: /home/local/Microsoft/coreclr/src/vm/exceptionhandling.cpp Line: 354
    Image: /usr/local/home/ubuntu/work/SpaceEscape/corerun

so there is still something wrong.

@janvorli
Copy link
Member

janvorli commented Apr 4, 2016

@myungjoo Threads are never interrupted (by modifying their context right away or hijacking) for GC while running in native code. We have barriers on the managed to native boundary that blocks the thread if it returns from native to managed code and the stop the world GC is pending to prevent those threads from running managed code in parallel with the GC. So you really don't need to worry about that case from the correctness point of view.
There is a debuggability benefit though if you manage to make the stack walkable at any point of the ThePreStub - you would get correct stack trace in the debugger at any point.

@janvorli
Copy link
Member

janvorli commented Apr 4, 2016

@manu-silicon, @myungjoo Just a hint - when I was verifying proper annotation for the AMD64 stubs in the past, I was stepping through the helpers one instruction a time and doing "bt" at each instruction to see that the stack trace keeps being the same.

@manu-st
Copy link

manu-st commented Apr 4, 2016 via email

@myungjoo
Copy link
Author

myungjoo commented Apr 5, 2016

Thanks everyone. Now I started to understand how unwind works.
(This is almost the first userspace (not kernel or bootloader) programming since graduation. :) )

@manu-st
Copy link

manu-st commented Apr 5, 2016

@jkotas @janvorli I was reading the unwind information for the arm assembler and they state the following:

If you do not use a frame pointer, then you should not use the .setfp pseudo op. If you do
not use a frame pointer, then you should avoid modifying the stack pointer outside of the
function prologue. Otherwise, the run-time library will be unable to find saved registers
when it is unwinding the stack.

It is not clear however where the frame pointer should point to. Looks like the convention is that it is the value of sp after pushing onto the stack registers that needs saving. Also most of the assembly written by hand is using r7, while the jit is using r11 to save the frame pointer. I think it does not matter but it might be nice to be consistent.

Anyway, I looked at when the callstack gets corrupted (with the current change to save the stack pointer into r7) and found that after the call to ThePreStub everything is fine and we then execute our first jitted routine. Stepping through that jitted routine it goes well until the first function call where once we step into the call stack is messed up. I'm wondering if the jit is generating the proper unwinding information. Namely we need the equivalent of .setfp and .pad. I'm currently looking at unwindarm.cpp to see how this is generated.

@jkotas
Copy link
Member

jkotas commented Apr 5, 2016

I'm wondering if the jit is generating the proper unwinding information.

The JIT on CoreCLR does not generate unwind information in the native format (e.g. in format that native debuggers understand) on Unix today.

@janvorli
Copy link
Member

janvorli commented Apr 5, 2016

@manu-silicon GDB is unable to show stack trace of managed frames. Even on x64, it shows a complete garbage. There is nothing you can do about it, as @jkotas has pointed out. That's why for the unwinding of managed frames, we use our internal unwinder and for unwinding native frames, the libunwind.

@myungjoo
Copy link
Author

myungjoo commented Apr 5, 2016

Anyway, one things that really boggles me is that the clang-generated machine code of CoreCLR-ARM's function looks like this:

  >�HException&, _CONTEXT*)>          stmdb        sp!, {r4, r7, r11, lr}      �
   �HException&, _CONTEXT*)+4>        add  r7, sp, #4                          �
   �HException&, _CONTEXT*)+6>        sub.w        sp, sp, #744    ; 0x2e8   
   �&, _CONTEXT*)>          push   {r7, lr}                                    �
   �&, _CONTEXT*)+2>        mov    r7, sp                                      �
   �&, _CONTEXT*)+4>        sub.w  sp, sp, #736    ; 0x2e0                     �
   �&, _CONTEXT*)+8>        mov    r2, r1          

Because most code looked like the second example, I wrote ".setft r7, sp / mov r7, sp" before. Anyway, after looking at more examples, it appears that r7(fp) is pointing at the saved r7 position.

@myungjoo
Copy link
Author

myungjoo commented Apr 6, 2016

Related message on the usage of FP(r7 here) of Clang ARM: https://llvm.org/bugs/show_bug.cgi?id=18505

Make the assembly-created stack frame compatible with
GDB and LIBUNWIND_ARM. This patch allows libunwind
or gdb to unwind through ThePreStub().

Partially Fix #3856

Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
@myungjoo
Copy link
Author

myungjoo commented Apr 8, 2016

@dnfclas Add cla-already-signed label again please. @dotnet-bot has replaced it with another label :( .

@myungjoo
Copy link
Author

ping!?

@jkotas
Copy link
Member

jkotas commented Apr 11, 2016

LGTM

@jkotas jkotas merged commit e8c82e2 into dotnet:master Apr 11, 2016
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
Fix Linux-ARM Broken Stack w/ ThePreStub

Commit migrated from dotnet/coreclr@e8c82e2
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ARM/Linux Coreclr breaks with Roslyn.

7 participants