-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Fix Linux-ARM Broken Stack w/ ThePreStub #3990
Conversation
| sub sp, #((__PWTB_StackAlloc) + 4 * 4 + 9 * 4) | ||
|
|
||
| .ifnc \pushArgRegs, DoNotPushArgRegs | ||
| PUSH_ARGUMENT_REGISTERS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these macros aren't used any more should they be removed also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. after getting positive test results, I'll clean up the code as well.
Right now, I just wanted to share the idea.
ade3f40 to
5d32d11
Compare
src/pal/inc/unixasmmacrosarm.inc
Outdated
|
|
||
| alloc_stack __PWTB_StackAlloc | ||
| sub r7, #36 | ||
| str r4, [r7] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we be using r12 as scratch register here to store the previous sp and then using stm instead to store r4-r11,lr, i.e. stm r12, {r4-r11,lr}?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As long as we save r4-r11 into the stack, that won't matter.
But, yes, using r12 instead of r7 makes the code look more clean.
Anyway, it appears that it's the custom or protocol of ARM gnueabi compilers (for making it compatible with debuggers or unwinders) to do mov r7 sp after push before sub sp #N (creating space for local variables). Right now, I'm blindly following what my compiler (clang 3.6 arm-linux-gnueabi) spits out with conventional C/CPP codes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uh.. I'm not supposed to use R12 in this macro because the user of macro (ThePreStub) uses R12 as an argument, "pMethodDesc". I'll stick to what it is except for the modifications in R7.
|
I took your file as is but it crashed very early in the initialization process: I'll check what could be causing this. |
e6cda12 to
1527cf1
Compare
|
The main issue of the current version of the patch is that the data structure created by unixasmmacrosarm.inc is not matching with struct TransitionBlock, which is the argument of PreStubWork(). I'm working on it. |
|
Note: stack frame format broken. trying to become compliant with GDB/Unwind by not touching R7. |
1527cf1 to
b6ef5e5
Compare
|
Looks like "ThePreStubPatch" is not created as expected; I'm not seeing |
|
GOT THE WORKAROUND WORKING! There had been a few missing Assembly directives in unixasmmacrosarm.inc. I'll start cleaning up the patch because it has way too much debug clutters and unneeded fixes. I now need to start finding out which modification is not required to fix this. |
b6ef5e5 to
576a63b
Compare
src/pal/inc/unixasmmacrosarm.inc
Outdated
| str r0, [sp , #((__PWTB_StackAlloc) + 9 * 4)] | ||
| str r1, [sp , #((__PWTB_StackAlloc) + 9 * 4 + 4)] | ||
| str r2, [sp , #((__PWTB_StackAlloc) + 9 * 4 + 8)] | ||
| str r3, [sp , #((__PWTB_StackAlloc) + 9 * 4 + 12)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be changed to stm. (or whatever that does not change sp register)
Change others to use multiple-register instructions as well.
OR
Update PUSH_ARGUMENT_REGISTERS/... not to update SP (or even R7 as well) and remove directives as long as no one uses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest renaming the PUSH/POP_ARGUMENT_REGISTERS and PUSH / POP_CALLEE_SAVED_REGISTERS to "STORE/RESTORE", change its implementation to match your changes here and to use it here for a better readability of the PROLOG_WITH_TRANSITION_BLOCK macro.
These macros are not used anywhere else, so it should be no problem.
|
Great work @myungjoo , I'll be interested to know where I've gone wrong there. |
576a63b to
a66fac5
Compare
src/pal/inc/unixasmmacrosarm.inc
Outdated
| add r6, sp, #(__PWTB_FloatArgumentRegisters) | ||
| vstm r6, {d0-d7} | ||
| mov r6, sp | ||
| vstm r6, {d0 - r7} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should be d0-d7 otherwise it does not compile for me.
|
A side note, instead of using 16, 9 * 4, 8 * 8, it would be good to use SIZEOF__ArgumentRegisters, SIZEOF__FloatArgumentRegisters.. defined in asmconstants.h. This requires changing the order of inclusion of that file in asmhelper.S. |
|
Actually, inclusion relations of ARM assembly inc/S/h files are too messy. We need to reconstruct them. I'll include some minimum reconstruction with next series of patches related with implementation of StartUnwindingNativeFrames(). At this moment, I'm not going to reshuffle inclusion relations. Only relocating include statement in asmhelper.S and leaving others as it is may break other files including unixasmmacro.inc. |
a66fac5 to
52ca485
Compare
|
Note that this does not fully resolve #3856, but fixes a major bug in Linux/ARM CoreCLR. |
src/pal/inc/unixasmmacrosarm.inc
Outdated
| __PWTB_StackAlloc = __PWTB_TransitionBlock | ||
|
|
||
| // Make the stack operation compatible with GDB and LIBUNWIND | ||
| PROLOG_PUSH "{r7, lr}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack layout created by these macros has to match TransitionBlock in vm\callingconvention.h.
Reordering of the callee saved registers - to save r7 and lr next to each other - should be fine.
Moving argument registers around will have pretty non-trivial consequences. Would it be possible to keep them where they are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas push {r7, lr} does not break TransitionBlock because the contents for TransitionBlock is created AFTER this: LINE 152, strm r6, {r8 - r11, lr}. And sp + #__PWTB_TransitionBlock still points to the data structure of {r4 - r11, lr, r0 - r3}, which is pointed by $r0 in asmhelper.S.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The arguments passed on the stack have to start immediately after the transition block end. Comment at https://github.com/dotnet/coreclr/blob/master/src/vm/callingconvention.h#L118 refers to this assumption.
This invariant does not hold with this change because of there is r7 and lr between the register passed arguments and stack passed arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright. My next push will comply with that assumption while being atomic for r7 update and branching.
Question: is it ok to destroy the TransitionBlock data structure at the moment of leaving at EPILOG_WITH_TRANSITION_BLOCK_* ?
src/pal/inc/unixasmmacrosarm.inc
Outdated
| push {r4 - r11, lr} | ||
| .save {r4 - r11, lr} | ||
|
|
||
| PROLOG_STACK_SAVE r7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the required action to allow unwind through this frame if this frame calls another function. .save are recommended for unwinder to further understand the register saved.
This is not strictly necessary. After the PreStubWorker returns and before we actually enter the jitted code, there is no hardware exception allowed - it would lead to the process termination anyways, we would not try to handle it as a hardware exception. |
|
@janvorli I agree with your analysis. |
|
@janvorli @jkotas Ok. I was especially worried about GC case (such as "stop-the-world" interrupts for GC in the middle of EPILOG_WITH_TRANSITION_BLOCK_RETURN and EPILOG_WITH_TRANSITION_BLOCK_TAILCALL (ThePreStub is only one of such users) However, if my worry is about a case never goig to happen, I'll remove that TODO. |
75205d5 to
71d054a
Compare
|
@myungjoo I was playing with the code today, and I think that r7 is supposed to store the frame pointer, that is to say the value of `sp' before pushing any argument. So in my case, I've done: Looking at asmhelpers.S, I see that we save sp into r7 after pushing some stuff on the stack, so r7 is not actually the frame pointer. It might explain the stack corruption too. With the above fix, I could run to completion eh08_small to completion albeit with the following assert violation: so there is still something wrong. |
|
@myungjoo Threads are never interrupted (by modifying their context right away or hijacking) for GC while running in native code. We have barriers on the managed to native boundary that blocks the thread if it returns from native to managed code and the stop the world GC is pending to prevent those threads from running managed code in parallel with the GC. So you really don't need to worry about that case from the correctness point of view. |
|
@manu-silicon, @myungjoo Just a hint - when I was verifying proper annotation for the AMD64 stubs in the past, I was stepping through the helpers one instruction a time and doing "bt" at each instruction to see that the stack trace keeps being the same. |
|
I've been doing that too. It does help and I learn something every day. We will get there no doubt about that.
|
71d054a to
1a88e44
Compare
|
Thanks everyone. Now I started to understand how unwind works. |
|
@jkotas @janvorli I was reading the unwind information for the arm assembler and they state the following: It is not clear however where the frame pointer should point to. Looks like the convention is that it is the value of sp after pushing onto the stack registers that needs saving. Also most of the assembly written by hand is using r7, while the jit is using r11 to save the frame pointer. I think it does not matter but it might be nice to be consistent. Anyway, I looked at when the callstack gets corrupted (with the current change to save the stack pointer into r7) and found that after the call to |
The JIT on CoreCLR does not generate unwind information in the native format (e.g. in format that native debuggers understand) on Unix today. |
|
@manu-silicon GDB is unable to show stack trace of managed frames. Even on x64, it shows a complete garbage. There is nothing you can do about it, as @jkotas has pointed out. That's why for the unwinding of managed frames, we use our internal unwinder and for unwinding native frames, the libunwind. |
|
Anyway, one things that really boggles me is that the clang-generated machine code of CoreCLR-ARM's function looks like this: Because most code looked like the second example, I wrote ".setft r7, sp / mov r7, sp" before. Anyway, after looking at more examples, it appears that r7(fp) is pointing at the saved r7 position. |
|
Related message on the usage of FP(r7 here) of Clang ARM: https://llvm.org/bugs/show_bug.cgi?id=18505 |
Make the assembly-created stack frame compatible with GDB and LIBUNWIND_ARM. This patch allows libunwind or gdb to unwind through ThePreStub(). Partially Fix #3856 Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
29f3716 to
f8edec4
Compare
|
@dnfclas Add cla-already-signed label again please. @dotnet-bot has replaced it with another label :( . |
|
ping!? |
|
LGTM |
Fix Linux-ARM Broken Stack w/ ThePreStub Commit migrated from dotnet/coreclr@e8c82e2
Make the assembly-created stack frame compatible with
GDB and LIBUNWIND_ARM.
Trying to fix #3856
Being tested. Trying to explain the approach with this patch.
Signed-off-by: MyungJoo Ham myungjoo.ham@samsung.com