Skip to content

JIT miscompilation: Bmi1.X64.TrailingZeroCount returns wrong result with optimizations enabled (.NET 11 regression) #126667

@pawlos

Description

@pawlos

Description

Bmi1.X64.TrailingZeroCount(0) returns 0 instead of the correct value 64 when JIT optimizations are enabled. The issue occurs only with DOTNET_TieredCompilation=0 (full JIT optimization from the start). With tiered compilation enabled (default), the result is correct.

This is a regression - .NET 10 produces the correct result in all configurations.

Reproduction Steps

using System.Numerics;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
 
public struct S0
{
    public ulong F0;
}
 
public struct S1
{
    public S1(Vector128<uint> f0, Vector256<sbyte> f1) : this()
    {
    }
}
 
public class Program
{
    public static Vector<ulong> s_1;
    public static S0 s_2;
 
    public static void Main()
    {
        S1 vr5;
        var vr7 = M4(vr5);
        var vr8 = s_2.F0;
        var vr9 = (uint)Bmi2.X64.MultiplyNoFlags(0, vr8);
        long vr10 = Bmi2.MultiplyNoFlags(vr7, vr9);
        var vr11 = s_2.F0;
        var vr12 = (byte)Bmi1.X64.TrailingZeroCount(vr11);
        M2(vr10, vr12);
    }
 
    public static void M2(long arg0, byte arg1)
    {
        Vector<ulong> var4 = s_1;
        Console.WriteLine(arg1);
    }
 
    public static uint M4(S1 argThis)
    {
        argThis = new S1(Vector128.Create(3499348842U, 4066470924U, 1, 1), Vector256.Create<sbyte>(14));
        return 3721809729U;
    }
}

With tiered compilation (default):

$ dotnet run -c Release
64

Without tiered compilation:

$ DOTNET_TieredCompilation=0 dotnet run -c Release
0

Expected output in both cases: 64 (TrailingZeroCount(0) should return 64 as no bits are set).

The program was found by Fuzzlyn with seed:

8846370544251159633-vectort,vector128,vector256,x86aes,x86avx,x86avx2,x86bmi1,x86bmi1x64,x86bmi2,x86bmi2x64,x86fma,x86gfni,x86gfniv256,x86lzcnt,x86lzcntx64,x86pclmulqdq,x86pclmulqdqv256,x86popcnt,x86popcntx64,x86sse,x86ssex64,x86sse2,x86sse2x64,x86sse3,x86sse41,x86sse41x64,x86sse42,x86sse42x64,x86ssse3,x86x86base

Fuzzlyn reduced the original 70.3 KiB program to the repro above.

Expected behavior

Bmi1.X64.TrailingZeroCount(0) should return 64 regardless of JIT optimization level. The TZCNT instruction on x86-64 returns the operand size (64) when the input is zero.

Actual behavior

With full JIT optimizations (DOTNET_TieredCompilation=0), the result is 0 instead of 64. The JIT appears to misoptimize the computation, likely folding or propagating a constant incorrectly.

Regression?

Yes. .NET 10.0.5 returns 64 in both configurations. The issue reproduces on 11.0.100-preview.4.26208.106

Known Workarounds

The issue only manifests with DOTNET_TieredCompilation=0. With the default tiered compilation enabled, the result is correct. However, note that Tier1 (fully optimized) JIT may produce the same incorrect result for long-running applications once the method is promoted from Tier0.

Configuration

Reproduces:

.NET SDK: 11.0.100-preview.4.26208.106
Runtime: Microsoft.NETCore.App 11.0.0-preview.4.26203.108
OS: Linux x64, Windows x64

Does NOT reproduce:

.NET SDK: 10.0.201
Runtime: Microsoft.NETCore.App 10.0.5
OS: Linux x64, Windows x64

Other information

  • The surrounding code (BMI2 intrinsics, Vector types, struct arguments) may be relevant to triggering the misoptimization — the bug may not reproduce with a simpler TrailingZeroCount(0) call alone.
  • JIT disassembly for Main with DOTNET_JitDisasm=Main and DOTNET_TieredCompilation=0:
; Assembly listing for method Program:Main() (FullOpts)
; Emitting BLENDED_CODE for x64 + VEX on Unix
; FullOpts code
; optimized code
; rbp based frame
; partially interruptible
; No PGO data

G_M000_IG01:                ;; offset=0x0000
       push     rbp
       mov      rbp, rsp

G_M000_IG02:                ;; offset=0x0004
       xor      edi, edi
       call     [Program:M4(S1):uint]
       mov      rdx, qword ptr [(reloc 0x74d5d363d730)]
       xor      edi, edi
       mulx     rdi, rdi, rdi
       mov      edx, eax
       mulx     edi, edi, edi
       xor      esi, esi
       tzcnt    rsi, rdx
       movzx    rsi, sil
       call     [Program:M2(long,byte)]
       nop

G_M000_IG03:                ;; offset=0x0033
       pop      rbp
       ret

; Total bytes of code 53

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions