Skip to content

Optimize jump stubs on arm64 #62302

@EgorBo

Description

@EgorBo

On x64 we emit the following code for jump stubs:

mov rax, 123456789abcdef0h
jmp rax

as I understand from

// mov rax, 123456789abcdef0h 48 b8 xx xx xx xx xx xx xx xx
// jmp rax ff e0

while on arm64 we make a memory load (from data section via pc):

ldr x16, [pc, #8]
br  x16
[target address]

// +0: ldr x16, [pc, #8]
// +4: br x16
// +8: [target address]

I'm just wondering if it's not faster to do what x64 does and emit the const directly even if it takes 4 instructions to populate it...

mov     x8, #9044
movk    x8, #9268, lsl #16
movk    x8, #61203, lsl #32
movk    x8, #43981, lsl #48
br      x8

I'm asking because I have a feeling that it could be a bottleneck if I read it correctly from the TE traces (Plaintext benchmark):
image

cc @dotnet/jit-contrib @jkotas

Metadata

Metadata

Assignees

Type

No type

Projects

Status

No status

Relationships

None yet

Development

No branches or pull requests

Issue actions