implementing profiler ELT callbacks for AMD64 Linux#7719
Conversation
|
There's a big "TODO" in codegenxarch.cpp, genReturn(), that it looks like you haven't addressed yet: |
|
cc @dotnet/jit-contrib |
|
@BruceForstall thanks for your comment! Is it not enough to add push rax; push rdx; ... ; pop rdx; pop rax; into ProfileLeaveNaked for this TODO? |
|
The issue isn't the values in registers getting preserved, it is proper reporting of the GC information for those registers. If you look at Compiler::impFixupStructReturnType(), you can see how multi-reg returns of 8-15 byte structs are massaged: In genReturn(), I believe that under |
|
@BruceForstall I've placed the following code in the CodeGen::genReturn function but I got the following assertion Could you give me a hint to perform it by the right way? Thanks in advance |
|
@BruceForstall as I see the current version of coreclr generate the following code for a method that returns struct value Could you give some sample template of generated code for this case ? |
|
On unix structs of size > 8 and <=16 gets returned in two return registers RAX/XMM0 and RDX/XMM1. Consider a struct with one or two gc-ref type fields being returned. The gc-ref could be in RAX or RDX. Whichever register contains gc-ref that needs to be explicitly reported. In the code that you have posted above you are iterating through each of the 8-bytes of the struct and always marking REG_INTRET (which is RAX) as gcptr or non-ptr. That could be the reason why you are hitting an assert. Here is the right way to do it if method is returning multi-reg return type struct Apart from the generated code, what you need to verify is that gcinfo is correctly reported for return regs containing gc-refs around the call to profile leave callback. |
|
@sivarv many thanks for your hint |
| unsigned __int8 offset1 = 0; | ||
| var_types type0 = TYP_UNKNOWN; | ||
| var_types type1 = TYP_UNKNOWN; | ||
| compiler->GetStructTypeOffset(structDesc, &type0, &type1, &offset0, &offset1); |
There was a problem hiding this comment.
I see this code got duplicated. How about abstracting this in Compiler::GetStructTypeOffset() overload whose first argument is a CORINFO_CLASS_HANDLE and has out params to return offset and type of eightbytes of the struct?
| genEmitHelperCall(CORINFO_HELP_PROF_FCN_ENTER, 0, EA_UNKNOWN, REG_ARG_2); | ||
| #else | ||
| genEmitHelperCall(CORINFO_HELP_PROF_FCN_ENTER, 0, EA_UNKNOWN, REG_R11); | ||
| #endif |
There was a problem hiding this comment.
Is there a specific reason why we want to use REG_ARG2 on amd64 windows and REG_R11 on Unix?
Please note that without specifying a callTargetReg, genEmitHelperCall() would use REG_RAX.
| #if FEATURE_VARARG | ||
| if (compiler->info.compIsVarArgs && varTypeIsFloating(loadType)) | ||
| #if defined(UNIX_AMD64_ABI) | ||
| #ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING |
There was a problem hiding this comment.
I think FEATURE_UNIX_AMD64_STRUCT_PASSING #ifdef alone is sufficient here.
| regNumber argReg = varDsc->lvArgReg; | ||
| getEmitter()->emitIns_S_R(ins_Store(storeType), emitTypeSize(storeType), argReg, varNum, 0); | ||
| #if defined(UNIX_AMD64_ABI) | ||
| #ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING |
There was a problem hiding this comment.
I think UNIX_AMD64_ABI is not required here.
| genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_ARG_2); | ||
| #else | ||
| genEmitHelperCall(helper, 0, EA_UNKNOWN, REG_R11); | ||
| #endif |
There was a problem hiding this comment.
Any specific reason why we don't want to use default call target reg (rax)?
There was a problem hiding this comment.
@sivarv I guess that it would be better to use r10 & r11 for intermediate computations on Linux
| if (compiler->compCurBB == compiler->genReturnBB) | ||
| { | ||
| #if defined(_TARGET_AMD64_) | ||
| #ifdef FEATURE_UNIX_AMD64_STRUCT_PASSING |
There was a problem hiding this comment.
This #ifdef is redundant since FEATURE_UNIX_AMD64_STRUCT_PASSING is defined only on amd64.
There was a problem hiding this comment.
I think what you mean is that the first #if is redundant, as it is implied by the #ifdef
In reply to: 87924333 [](ancestors = 87924333)
|
|
||
| unsigned regCount = retTypeDesc.GetReturnRegCount(); | ||
|
|
||
| if (varTypeIsGC(compiler->info.compRetType)) |
There was a problem hiding this comment.
if (varTypeIsGC(compiler->info.compRetType)) [](start = 1, length = 55)
This condition would always be false if the return type is a struct (i.e. TYP_STRUCT) returned in two regs.
This if-check is not required and should be deleted. This comment also applies to if-check on line 1176.
There was a problem hiding this comment.
@sivarv I tried it but I have got the assertion in codegenlinear.cpp
System.RuntimeTypeHandle:GetMetadataImport(ref):struct: Regset after BB07 gcr=00000001 {rax}, byr=00000000 {}, regVars=00000000 {}
Assert failure(PID 27642 [0x00006bfa], Thread: 27642 [0x6bfa]): Assertion failed 'nonVarPtrRegs == RBM_NONE' in 'System.RuntimeTypeHandle:GetMetadataImport(ref):struct' (IL size 18)
File: /home/signatov/TEST/historical_debugging/coreclr/src/jit/codegenlinear.cpp Line: 422
Image: /home/signatov/TEST/historical_debugging/coreclr/bin/Product/Linux.x64.Debug/corerun
In my test case this assertion arises only on the forth method where the code with gcMarkRegPtrVal was inserted. Three methods with this code
System.RuntimeTypeHandle:GetIntroducedMethods(ref):struct
IntroducedMethodEnumerator:GetEnumerator():struct:this
System.RuntimeMethodHandle:GetUtf8Name(long):struct
are compiled successfully before this assertion. May be you have some ideas?
There was a problem hiding this comment.
@sivarv I inserted debug prints and see the following
=== System.RuntimeTypeHandle:GetMetadataImport(ref):struct: regCount 2
00000000 00000000
in GCInfo::gcMarkRegPtrVal type 0xe TYP_LONG 0xa TYP_REF 0xe TYP_BYREF 0xf
in GCInfo::gcMarkRegSetGCref
GC regs: 00000000 {} => 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
00000001 00000000
in GCInfo::gcMarkRegPtrVal type 0xa TYP_LONG 0xa TYP_REF 0xe TYP_BYREF 0xf
in GCInfo::gcMarkRegSetNpt
GC regs: (unchanged) 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
00000001 00000000
--------->>>>>>>>
in GCInfo::gcMarkRegSetNpt
GC regs: (unchanged) 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
in GCInfo::gcMarkRegSetNpt
GC regs: (unchanged) 00000001 {rax}
Byref regs: (unchanged) 00000000 {}
System.RuntimeTypeHandle:GetMetadataImport(ref):struct: Regset after BB07 gcr=00000001 {rax}, byr=00000000 {}, regVars=00000000 {}
Assert failure(PID 12181 [0x00002f95], Thread: 12181 [0x2f95]): Assertion failed 'nonVarPtrRegs == RBM_NONE' in 'System.RuntimeTypeHandle:GetMetadataImport(ref):struct' (IL size 18)
File: /home/signatov/TEST/historical_debugging/coreclr/src/jit/codegenlinear.cpp Line: 422
Image: /home/signatov/TEST/historical_debugging/coreclr/bin/Product/Linux.x64.Debug/corerun
In the previous successfully compiled methods arg type is TYP_LONG.
There was a problem hiding this comment.
As I see the argument of gcMarkRegSetNpt is regMaskTP not var_types, so I should replace
gcInfo.gcMarkRegSetNpt(var_types);
by
gcInfo.gcMarkRegSetNpt(genRegMask(var_types));
Am I right?
| LclVarDsc* varDsc; | ||
|
|
||
| // Since the method needs to make a profiler callback, it should have out-going arg space allocated. | ||
| noway_assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM); |
There was a problem hiding this comment.
Shouldn't this first noway_assert also be under the following #if? I believe it's only defined under FEATURE_FIXED_OUT_ARGS which is only true for amd64.
| gcInfo.gcMarkRegSetNpt(retTypeDesc.GetABIReturnReg(i)); | ||
| } | ||
| } | ||
| return; |
There was a problem hiding this comment.
return; [](start = 1, length = 18)
returns in the middle of the method are very hard to notice and could lead to accidental bugs. Instead I would suggest this to restructure this logic as follows. Note that the below logic requires no #ifdefs.
e.g.
ReturnTypeDesc retTypeDesc;
if (compiler->compMethodReturnsMultiRegRetType())
{
CORINFO_CLASS_HANDLE retClsHnd = compiler->info.compMethodInfo->args.retTypeClass;
retTypeDesc.InitializeStructReturnType(compiler, retClsHnd);
}
if (varTypeIsGC(compiler->info.compRetType))
{
gcInfo.gcMarkRegPtrVal(REG_INTRET, compiler->info.compRetType);
}
else if (compiler->compMethodReturnsMultiRegRetType())
{
for (unsigned i = 0; i < retTypeDesc.GetReturnRegCount(); ++i)
{
gcInfo.gcMarkRegPtrVal(retTypeDesc.GetABIReturnReg(i), retTypeDesc.GetReturnRegType(i));
}
}
genProfilingLeaveCallback();
if (varTypeIsGC(compiler->info.compRetType))
{
gcInfo.gcMarkRegSetNpt(REG_INTRET);
}
else if (compiler->compMethodReturnsMultiRegRetType())
{
for (unsigned i = 0; i < retTypeDesc.GetReturnRegCount(); ++i)
{
gcInfo.gcMarkRegSetNpt(retTypeDesc.GetABIReturnReg(i));
}
}
|
|
||
| #if defined(_TARGET_AMD64_) | ||
| // Since the method needs to make a profiler callback, it should have out-going arg space allocated. | ||
| noway_assert(compiler->lvaOutgoingArgSpaceVar != BAD_VAR_NUM); |
There was a problem hiding this comment.
Here too I think that this needs to be only under !UNIX_AMD64_ABI
There was a problem hiding this comment.
@CarolEidt sorry, but I didn't quite catch you. This code is for both Unix && Windows on AMD64.
| // - integer argument registers (rcx, rdx, r8, r9) | ||
| // - floating point argument registers (xmm1-3) | ||
| // - volatile integer registers (r10, r11) | ||
| // - volatile floating point registers (xmm4-5) |
There was a problem hiding this comment.
You should add here something like:
- upper halves of ymm registers on AVX (which are volatile)
| push_argument_register rcx | ||
| push_argument_register rdx | ||
| push_nonvol_reg r10 | ||
| push_nonvol_reg rax |
There was a problem hiding this comment.
How are these pushed registers represented in GC info? Or does the GC "know" about these?
There was a problem hiding this comment.
ELT profiler helpers are considered NO GC helpers. That is GC won't happen within the helper call.
|
CC : @noahfalk - to review profiler helpers. |
| // Upon entry : | ||
| // rdi = clientInfo | ||
| // rsi = profiledRsp | ||
|
|
There was a problem hiding this comment.
On windows these arguments are passed in rcx and rdx. Does Linux use a different 64bit ABI in general or is there a reason for the difference?
There was a problem hiding this comment.
Linux has different ABI for x64. It has 6 argument registers - RDI, RSI, RDX, RCX, R8 and R9
There was a problem hiding this comment.
@noahfalk According to the Unix ABI, the first 6 integer or pointer arguments to a function are passed in registers. The first is placed in rdi, the second in rsi, the third in rdx, and then rcx, r8 and r9.
| push_nonvol_reg r10 | ||
| push_nonvol_reg rax | ||
|
|
||
| lea rax, [rsp + 0x10] // caller rsp |
There was a problem hiding this comment.
I rarely deal with assembly, but it doesn't seem like this would calculate the caller's rsp correctly?
At this point in the function we have already pushed r15, r14, r13, r12, rbp, rbx, rsi, rdi, r10, and rax. Did those pushes generate code that doesn't update rsp?
There was a problem hiding this comment.
Right, this is not correct. It was this way in the Windows version of this function when only RAX was pushed on the stack.
@sergign60 what is the reason for pushing all the callee saved registers plus RSI, RDI and R10 when Windows version doesn't do that (I guess that's why the function name is ProfileEnterNaked)
|
|
||
|
|
||
| // setup ProfilePlatformSpecificData structure | ||
| xor r11, r11 // nullify r11 |
There was a problem hiding this comment.
Any reason not to use r8 so that we retain consistency with the windows implementation?
There was a problem hiding this comment.
@noahfalk I guess that it would be better to use r10 & r11 for intermediate computations on Linux
There was a problem hiding this comment.
makes sense now that I understand the linux ABI difference
| // Upon entry : | ||
| // rdi = clientInfo | ||
| // rsi = profiledRsp | ||
|
|
There was a problem hiding this comment.
Same questions as above on ProfileEnter
| // } PROFILE_PLATFORM_SPECIFIC_DATA, *PPROFILE_PLATFORM_SPECIFIC_DATA; | ||
| // | ||
| .equ SIZEOF_PROFILE_PLATFORM_SPECIFIC_DATA, 0x8*11 + 0x4*2 // includes fudge to make FP_SPILL right | ||
| .equ SIZEOF_OUTGOING_ARGUMENT_HOMES, 0x8*6 |
There was a problem hiding this comment.
This is not right. AMD64 ABI on Linux doesn't home arguments. On Windows, there are always 4 stack slots reserved for storing the 4 argument registers before the return address. So you can just remove this constant and also the OFFSETOF_PLATFORM_SPECIFIC_DATA below, since they are both zero.
|
@janvorli |
95b6b51 to
b4c7322
Compare
|
@janvorli Now I get the another set of failed tests, but the reason is not this profiling callbacks implementation. For example, fails (segmentation fault) with both with the profiler's callback calls and when I commented their calls. |
ad9fa9a to
65eb9e6
Compare
|
#10857 |
|
@janvorli I've got the results on Windows. Please look at [AMD64 Windows ELT] Some CoreCLR tests fail with ELT hooks enabled and GCStress & TailcallStress 10857 |
24e48d9 to
e180fe6
Compare
|
@dotnet-bot test Ubuntu arm Cross Release Build |
|
@janvorli please see @sergign60 note above. |
| // | ||
| void CodeGen::genProfilingLeaveCallback(unsigned helper /*= CORINFO_HELP_PROF_FCN_LEAVE*/) | ||
| { | ||
|
|
There was a problem hiding this comment.
I'd rather remove such new empty lines
| // UINT64 flt1; | ||
| // UINT64 flt2; | ||
| // UINT64 flt3; | ||
| // #if defined(UNIX_AMD64_ABI) |
There was a problem hiding this comment.
Is it source file used only for Unix? If so I think it would be good to remove this #ifdef bracket
| mov [rsp + 0x88], rdx // -- struct rdx field | ||
| mov [rsp + 0x90], rcx // -- struct rcx field | ||
| mov [rsp + 0x98], r8 // -- struct r8 field | ||
| mov [rsp + 0xa0], r9 // -- struct r9 field |
There was a problem hiding this comment.
Do we really need to save those scratch registers (rdi-r9)?
There was a problem hiding this comment.
@rartemev these registers are not scratch registers, they are argument registers.
4d3a75a to
9200f7a
Compare
This PR implements profiler ELT callbacks for AMD64 Linux. It's tested with historical debugger.
The main differences from appropriate Windows implementation:
@sivarv: I don't think this (regSet.AddMaskVars(REG_R10 | REG_R11 | REG_R12 | REG_R13);) is sufficient to have R10-R13 saved in prolog.
Note then genProfilingEnterCallback() is called after OS prolog and zero init of frame. By the time we are generating Enter callback, we have already generated the method prolog that has saved all required callee saved regs.