[Arm64] Crypto arm64 intrinsics implementation#16499
Conversation
|
@CarolEidt PTAL |
| using System.Runtime.InteropServices; | ||
| using System.Runtime.Intrinsics; | ||
| using System.Runtime.Intrinsics.Arm.Arm64; | ||
|
|
There was a problem hiding this comment.
@debayang Building these tests will fail until API is added to corefx.
If we want to merge this, it will need #if. Something like I did here https://github.com/dotnet/coreclr/blob/master/tests/src/JIT/HardwareIntrinsics/Arm64/Simd.cs#L6
|
@briansull PTAL at the emitters |
|
@debayang The Ubuntu formatting job is failing, this is due to formatting issues. If you click on the details link, you can see a patch to apply to fix the formatting issues. You can also run install and run clang-format on source files in the src jit tree. |
5c3d064 to
e31b5f0
Compare
|
@sdmaclea Corrected formatting |
|
@debayang Looks like the test you added failed on Windows x86. Probably #if is not correct yet. |
e31b5f0 to
1fa79f2
Compare
|
@sdmaclea Fixed test case. |
|
|
||
| bool is16Byte = (node->gtSIMDSize > 8); | ||
| emitAttr attr = is16Byte ? EA_16BYTE : EA_8BYTE; | ||
| insOpts opt = genGetSimdInsOpt(is16Byte, baseType); |
There was a problem hiding this comment.
I would like to see the signature for CodeGen::genGetSimdInsOpt(bool is16Byte, var_types elementType) changed to CodeGen::genGetSimdInsOpt(emitAttr size, var_types elementType) and an asset added at the start of that method : assert(isValidVectorDatasize(size);
All the callers to this method aleady use this unnecessary pattern to setup is16Byte
Also instead of attr we can use size, since we aren't going to have any GC flags for these instructions.
emitAttr size = is16Byte ? EA_16BYTE : EA_8BYTE;
I can do this for you on a follow up checkin.
There was a problem hiding this comment.
@briansull Thanks. May be better to handle in a different checkin as this appears to be used in other places as well.
| assert(isValidImmCondFlags(emitGetInsSC(id))); | ||
| break; | ||
|
|
||
| case IF_DR_2J: // DR_2J ................ ......nnnnnddddd Sd Sn |
There was a problem hiding this comment.
It is nice to add the instruction that use this encoding:
(sha1h)
|
|
||
| case IF_DV_2A: // DV_2A .Q.......X...... ......nnnnnddddd Vd Vn (fabs, fcvt - vector) | ||
| case IF_DV_2M: // DV_2M .Q......XX...... ......nnnnnddddd Vd Vn (abs, neg - vector) | ||
| case IF_DV_2P: // DV_2P ................ ......nnnnnddddd Vd Vn (aes, sha1su1) |
There was a problem hiding this comment.
I would prefer adding a * to indicate that all forms of the aes instructions use this encoding:
(aes*, sha1su1)
| case IF_DR_3E: // DR_3E X........X.mmmmm ssssssnnnnnddddd Rd Rn Rm imm(0-63) | ||
| case IF_DV_3F: // DV_3F ...........mmmmm ......nnnnnddddd Qd Sn Vm (vector) - Qd both source and dest | ||
| case IF_DV_3G: // DV_3G ...........mmmmm ......nnnnnddddd Vd Vn Vm (vector) - Vd both source and dest | ||
| case IF_DV_3H: // DV_3H ...........mmmmm ......nnnnnddddd Qd Qn Vm (vector) - Qd both source and dest |
There was a problem hiding this comment.
I think that DV_3F, DV_3G and DV_3H can all be collapsed to just one DV_3F
| case INS_sha1m: | ||
| assert(isValidVectorDatasize(size)); | ||
| assert(isVectorRegister(reg1)); | ||
| assert(isFloatReg(reg2)); |
There was a problem hiding this comment.
isFloatRegister is implemented the same as a isVectorRegister.
| emitDispReg(id->idReg2(), size, true); | ||
| emitDispVectorReg(id->idReg3(), id->idInsOpt(), false); | ||
| break; | ||
|
|
There was a problem hiding this comment.
After collapsing (DV_3F, DV_3G and DV_3H) in order to print the correct register in the Asm Dump you will need to special case using either opcode or the binary bits of 'code'. (See IF_DR_3A for opcode or IF_DV_2O for 'code')
1fa79f2 to
e1911dc
Compare
|
@briansull Made the requested changes in emitarm64.cpp. |
| SimdUnaryOp, // SIMD intrinsics which take one vector operand and return a vector | ||
| SimdBinaryOp, // SIMD intrinsics which take two vector operands and return a vector | ||
| SimdUnaryOp, // SIMD intrinsics which take one vector operand and return a vector | ||
| SimdBinaryOverlapOp, // Same as SimdBinaryOp , with destination vector same as first source vector |
There was a problem hiding this comment.
Would it make sense to describe these as RMW (read-modify-write) Ops? That's the term used for xarch operations that overwrite their first source, and to me "Overlap" generally implies a more complex kind of operation that does a partial (re)write or something.
There was a problem hiding this comment.
@CarolEidt I guess renaming them to "RMW" from "Overlap" should be OK. Will do that.
| HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_PMULL , Pmull ) | ||
| HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA1 , Sha1 ) | ||
| HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2 , Sha2 ) | ||
| HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA256 , Sha256 ) |
There was a problem hiding this comment.
I believe the same changes needs to be made to src/inc/corjitflags.h
There was a problem hiding this comment.
@CarolEidt Missed it as the build passed.
I guess "pal/src/misc/jitsupport.cpp" also needs to be corrected.
There was a problem hiding this comment.
Yes, I missed that. @BruceForstall - can you comment on whether this change needs to be made elsewhere?
There was a problem hiding this comment.
The first parameter is the flag name, the second is the class which is controlled by the flag.
You can either:
- Rename the class and leave the flag names alone (they match Linux Arm64 extension naming convention now)
- Rename the class, rename the flags in JIT and rename the flags in the VM.
git grep HAS_ARM64_SHA2
src/inc/corjitflags.h: CORJIT_FLAG_HAS_ARM64_SHA2 = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better
src/jit/hwintrinsiclistArm64.h:HARDWARE_INTRINSIC_CLASS(JIT_FLAG_HAS_ARM64_SHA2 , Sha2 )
src/jit/jitee.h: JIT_FLAG_HAS_ARM64_SHA2 = 55, // ID_AA64ISAR0_EL1.SHA2 is 1 or better
src/pal/src/misc/jitsupport.cpp: CPUCompileFlags.Set(CORJIT_FLAGS::CORJIT_FLAG_HAS_ARM64_SHA2);
I think either is acceptable. The flag names should be consistent in the JIT & VM. If you update the flag names, you probably need to update the JITEEVersionIdentifier GUID too.
There was a problem hiding this comment.
can you comment on whether this change needs to be made elsewhere?
Looks like @sdmaclea found all the spots.
leave the flag names alone (they match Linux Arm64 extension naming convention now)
Well, CORJIT_FLAG_HAS_ARM64_SHA512 already breaks the mold kind-of like what is suggested here.
you probably need to update the JITEEVersionIdentifier GUID too
Since there's no data binary change (just internal #define name change), there should be no need.
|
|
||
| switch (compiler->getHWIntrinsicInfo(intrinsicID).form) | ||
| { | ||
| case HWIntrinsicInfo::Sha1HashOp: |
There was a problem hiding this comment.
In this method, presumably above where you are already separately handling the 2 and 3 operand cases, you need to set op2 (and op3 if present) as delay free, so that they won't be the same as the target register. Otherwise, you can't copy op1 to the target without possibly trashing another operand.
This may complicate the code a bit (though it will be potentially a bit simpler after #16517.
If none of the operands can be contained (i.e. used from memory, or an immediate that can be directly encoded), then instead of calling GetOperandInfo you would call getLocationInfo on those operands, e.g.
LocationInfoListNode* op2Info = getLocationInfo(op2);
op2Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;
And similar for op3 if present. If either of these can be contained, you need to handle those cases. You could possibly use CheckAndSetDelayFree from lsraxarch.cpp, moving it to lsrabuild.cpp. I hope that's not necessary as that code is actually going away and being handled another way in #16517.
There was a problem hiding this comment.
@CarolEidt @sdmaclea getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?
case HWIntrinsicInfo::Sha1HashOp:
assert(info->srcCount == 3);
info->setInternalCandidates(this, RBM_ALLFLOAT);
info->internalFloatCount = 1;
if (!op2->isContained())
{
LocationInfoListNode* op2Info = useList.Begin()->Next();
op2Info->info.isDelayFree = true;
LocationInfoListNode* op3Info = op2Info->Next();
op3Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;
}
break;
case HWIntrinsicInfo::SimdTernaryRMWOp:
assert(info->srcCount == 3);
if (!op2->isContained())
{
LocationInfoListNode* op2Info = useList.Begin()->Next();
op2Info->info.isDelayFree = true;
LocationInfoListNode* op3Info = op2Info->Next();
op3Info->info.isDelayFree = true;
info->hasDelayFreeSrc = true;
}
break;
There was a problem hiding this comment.
getLocationInfo() causing assertion failure here . Below seems to work as intended . Does this look OK?
Yes, my mistake. I had missed the fact that the operand info is retrieved above (n the calls to GetOperandInfo()).
However, here you are assuming that there is a one-to-one match between the LocationInfoListNodes and the operands, which is true as long as the operands are not contained. You check op2, but you should add an assert(!op3->isContained()) prior to accessing its info.
|
|
||
| getEmitter()->emitIns_R_R(INS_fmov, EA_4BYTE, tmpReg, elementReg); | ||
|
|
||
| if (targetReg != op1Reg) |
There was a problem hiding this comment.
You need to assert that neither op2Reg nor op3Reg are the same as targetReg, otherwise you will overwrite them here. See also my comments on lsraarm64.cpp - you need to set those operands as delay free so that this is ensured.
There was a problem hiding this comment.
@CarolEidt @sdmaclea Will handle along with the above lsra modification which will make sure op2reg and op3reg are not same as op1Reg.
Also in this scenario - how should we ensure that tmpReg is not same as targetReg (e.g sha1p instruction below is wrong)
mov w0, #20
fmov s16, w0
mov v16.16b, v8.16b
sha1p q16, s16, v10.4s
There was a problem hiding this comment.
That is handled similarly, you need to do:
info->isInternalRegDelayFree = true;
|
@sdmaclea - I had recreated the forked branch and this got automatically closed. |
@dotnet/arm64-contrib
Arm64 intrinsics implementation for Aes, SHA1 and SHA256
API proposal:ARM64 Crypto HW Intrinsics