Description
Bmi1.X64.TrailingZeroCount(0) returns 0 instead of the correct value 64 when JIT optimizations are enabled. The issue occurs only with DOTNET_TieredCompilation=0 (full JIT optimization from the start). With tiered compilation enabled (default), the result is correct.
This is a regression - .NET 10 produces the correct result in all configurations.
Reproduction Steps
using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
public struct S0
{
public ulong F0;
}
public struct S1
{
public S1(Vector128<uint> f0, Vector256<sbyte> f1) : this()
{
}
}
public class Program
{
public static Vector<ulong> s_1;
public static S0 s_2;
public static void Main()
{
S1 vr5;
var vr7 = M4(vr5);
var vr8 = s_2.F0;
var vr9 = (uint)Bmi2.X64.MultiplyNoFlags(0, vr8);
long vr10 = Bmi2.MultiplyNoFlags(vr7, vr9);
var vr11 = s_2.F0;
var vr12 = (byte)Bmi1.X64.TrailingZeroCount(vr11);
M2(vr10, vr12);
}
public static void M2(long arg0, byte arg1)
{
Vector<ulong> var4 = s_1;
Console.WriteLine(arg1);
}
public static uint M4(S1 argThis)
{
argThis = new S1(Vector128.Create(3499348842U, 4066470924U, 1, 1), Vector256.Create<sbyte>(14));
return 3721809729U;
}
}
With tiered compilation (default):
$ dotnet run -c Release
64
Without tiered compilation:
$ DOTNET_TieredCompilation=0 dotnet run -c Release
0
Expected output in both cases: 64 (TrailingZeroCount(0) should return 64 as no bits are set).
The program was found by Fuzzlyn with seed:
8846370544251159633-vectort,vector128,vector256,x86aes,x86avx,x86avx2,x86bmi1,x86bmi1x64,x86bmi2,x86bmi2x64,x86fma,x86gfni,x86gfniv256,x86lzcnt,x86lzcntx64,x86pclmulqdq,x86pclmulqdqv256,x86popcnt,x86popcntx64,x86sse,x86ssex64,x86sse2,x86sse2x64,x86sse3,x86sse41,x86sse41x64,x86sse42,x86sse42x64,x86ssse3,x86x86base
Fuzzlyn reduced the original 70.3 KiB program to the repro above.
Expected behavior
Bmi1.X64.TrailingZeroCount(0) should return 64 regardless of JIT optimization level. The TZCNT instruction on x86-64 returns the operand size (64) when the input is zero.
Actual behavior
With full JIT optimizations (DOTNET_TieredCompilation=0), the result is 0 instead of 64. The JIT appears to misoptimize the computation, likely folding or propagating a constant incorrectly.
Regression?
Yes. .NET 10.0.5 returns 64 in both configurations. The issue reproduces on 11.0.100-preview.4.26208.106
Known Workarounds
The issue only manifests with DOTNET_TieredCompilation=0. With the default tiered compilation enabled, the result is correct. However, note that Tier1 (fully optimized) JIT may produce the same incorrect result for long-running applications once the method is promoted from Tier0.
Configuration
Reproduces:
.NET SDK: 11.0.100-preview.4.26208.106
Runtime: Microsoft.NETCore.App 11.0.0-preview.4.26203.108
OS: Linux x64, Windows x64
Does NOT reproduce:
.NET SDK: 10.0.201
Runtime: Microsoft.NETCore.App 10.0.5
OS: Linux x64, Windows x64
Other information
- The surrounding code (BMI2 intrinsics, Vector types, struct arguments) may be relevant to triggering the misoptimization — the bug may not reproduce with a simpler
TrailingZeroCount(0) call alone.
- JIT disassembly for
Main with DOTNET_JitDisasm=Main and DOTNET_TieredCompilation=0:
; Assembly listing for method Program:Main() (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX on Unix
; FullOpts code
; optimized code
; rbp based frame
; partially interruptible
; No PGO data
G_M000_IG01: ;; offset=0x0000
push rbp
mov rbp, rsp
G_M000_IG02: ;; offset=0x0004
xor edi, edi
call [Program:M4(S1):uint]
mov rdx, qword ptr [(reloc 0x74d5d363d730)]
xor edi, edi
mulx rdi, rdi, rdi
mov edx, eax
mulx edi, edi, edi
xor esi, esi
tzcnt rsi, rdx
movzx rsi, sil
call [Program:M2(long,byte)]
nop
G_M000_IG03: ;; offset=0x0033
pop rbp
ret
; Total bytes of code 53
Description
Bmi1.X64.TrailingZeroCount(0)returns0instead of the correct value64when JIT optimizations are enabled. The issue occurs only withDOTNET_TieredCompilation=0(full JIT optimization from the start). With tiered compilation enabled (default), the result is correct.This is a regression - .NET 10 produces the correct result in all configurations.
Reproduction Steps
With tiered compilation (default):
Without tiered compilation:
Expected output in both cases:
64(TrailingZeroCount(0)should return 64 as no bits are set).The program was found by Fuzzlyn with seed:
Fuzzlyn reduced the original 70.3 KiB program to the repro above.
Expected behavior
Bmi1.X64.TrailingZeroCount(0)should return64regardless of JIT optimization level. The TZCNT instruction on x86-64 returns the operand size (64) when the input is zero.Actual behavior
With full JIT optimizations (
DOTNET_TieredCompilation=0), the result is0instead of64. The JIT appears to misoptimize the computation, likely folding or propagating a constant incorrectly.Regression?
Yes. .NET 10.0.5 returns
64in both configurations. The issue reproduces on 11.0.100-preview.4.26208.106Known Workarounds
The issue only manifests with
DOTNET_TieredCompilation=0. With the default tiered compilation enabled, the result is correct. However, note that Tier1 (fully optimized) JIT may produce the same incorrect result for long-running applications once the method is promoted from Tier0.Configuration
Reproduces:
Does NOT reproduce:
Other information
TrailingZeroCount(0)call alone.MainwithDOTNET_JitDisasm=MainandDOTNET_TieredCompilation=0: