Skip to content

Rewrite BigInteger internals to use native-width (nuint) limbs#125799

Open
stephentoub wants to merge 27 commits intodotnet:mainfrom
stephentoub:biginteger-nuint-rewrite
Open

Rewrite BigInteger internals to use native-width (nuint) limbs#125799
stephentoub wants to merge 27 commits intodotnet:mainfrom
stephentoub:biginteger-nuint-rewrite

Conversation

@stephentoub
Copy link
Member

@stephentoub stephentoub commented Mar 19, 2026

This rewrites the internal implementation of System.Numerics.BigInteger to use native-width (nuint) limbs instead of uint limbs. On 64-bit platforms, this halves the number of limbs needed to represent a value, improving throughput of all multi-limb arithmetic. The public API surface remains unchanged.

Core Representation Change

  • Internal _bits array changed from uint[] to nuint[], with all arithmetic primitives updated accordingly.
  • On 64-bit platforms, each limb holds 64 bits instead of 32, halving iteration counts for schoolbook algorithms.
  • _sign remains int to avoid regressions on small inline values.
  • Parse/ToString updated to work with nuint limbs while maintaining exact formatting behavior.

Algorithmic Improvements

Montgomery Multiplication for ModPow

Added Montgomery multiplication with REDC for modular exponentiation when the modulus is odd. This replaces the Barrett reduction path for odd moduli, eliminating expensive division-based reduction in the inner loop.

Sliding Window Exponentiation

ModPow now uses left-to-right sliding window exponentiation (window size chosen based on exponent bit length) instead of simple square-and-multiply, reducing the number of modular multiplications.

Fused Two's Complement for Bitwise Operations

Bitwise AND, OR, and XOR on negative values now fuse the two's complement conversion with the logical operation in a single pass, avoiding separate allocation and negation steps.

Cached Powers-of-10 Table

The PowersOf1e9 table used by divide-and-conquer ToString/Parse is now cached and reused across calls, avoiding repeated expensive computation for large number formatting.

Unrolled Single-Limb Primitives

Added Mul1, MulAdd1, and SubMul1 primitives that handle the common case of multiplying a multi-limb number by a single limb. These are used in the inner loops of schoolbook multiply and division.

GCD Optimizations

  • LehmerCore rewritten to avoid Int128/UInt128 overhead, using direct ulong arithmetic.
  • Eliminated unnecessary array copy when one GCD operand is zero.

Division Tuning

  • Burnikel-Ziegler threshold retuned for 64-bit limbs.
  • DivRem helpers optimized for the wider limb size.

Hardware Intrinsics

BigMul, DivRem, and AddWithCarry primitives use BMI2 (mulx) and ADX (adcx/adox) intrinsics when available, with fallback to Math.BigMul/UInt128 arithmetic.

ModPow Small-Modulus Optimization

When the modulus fits in a uint (common for single-limb moduli on 64-bit), the inner loop uses ulong arithmetic instead of UInt128, avoiding unnecessary widening.

Bug Fixes

  • Fixed SubtractSelf callers to restore borrow == 0 postcondition, fixing incorrect results in Barrett reduction (FastReducer.SubMod) when the Barrett quotient overshoots, Toom-3/Karatsuba signed subtraction, and Montgomery reduction overflow handling.
  • Fixed SubWithBorrow sign-extension issue that would produce incorrect results on 32-bit platforms.
  • Fixed BitwiseAnd sign handling for specific negative operand combinations.
  • Fixed ToString regression for numbers near limb boundaries by processing 32-bit halves in the naive conversion loop.

Test Coverage

  • Added ~1,000 lines of property-based validation tests covering arithmetic identity laws, bitwise consistency, and round-trip conversions.
  • Added edge-case tests for sign combinations, SIMD vector boundaries, and Barrett/FastReducer paths with even large moduli.
  • All 3,031 existing tests pass in both Debug and Release configurations.

Fixes #97780
Fixes #111708
Fixes #41495

stephentoub and others added 22 commits March 18, 2026 11:51
- Change all span/array types for limbs: uint->nuint, Span<uint>->Span<nuint>,
  ReadOnlySpan<uint>->ReadOnlySpan<nuint>, ArrayPool<uint>->ArrayPool<nuint>,
  stackalloc uint[]->stackalloc nuint[]
- PowersOf1e9: platform-dependent Indexes and LeadingPowers (32-bit/64-bit)
  using MemoryMarshal.Cast for ReadOnlySpan<nuint> from fixed-size backing
- MultiplyAdd: use UInt128 on 64-bit, ulong on 32-bit for widening multiply
- Naive base conversion: use BigIntegerCalculator.DivRem for widening divide
- OmittedLength: divide by kcbitNuint instead of hardcoded 32
- digitRatio constants: scale by 32/kcbitNuint for platform-appropriate ratios
- IBigIntegerHexOrBinaryParser: nuint blocks (DigitsPerBlock uses nint.Size)
- BigIntegerToDecChars: cast nuint base1E9 values to uint for UInt32ToDecChars
- Fix pre-existing unused variable in BigInteger.cs Log method

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change the internal limb type of BigInteger from uint (32-bit) to nuint
(native-width), so that on 64-bit platforms each limb holds a full 64-bit
value. This halves loop iterations for all arithmetic operations on 64-bit
systems while remaining identical on 32-bit (nuint == uint).

Key changes:
- Struct fields: int _sign -> nint _sign, uint[]? _bits -> nuint[]? _bits
- All BigIntegerCalculator methods: ReadOnlySpan<uint> -> ReadOnlySpan<nuint>
- Widening multiply uses Math.BigMul(ulong,ulong)->UInt128 on 64-bit
- Squaring carry propagation uses UInt128 to prevent overflow on 64-bit
- PowersOf1e9 constructor strips trailing zero limbs after squaring
- GCD 2-limb short-circuit restricted to 32-bit only (nint.Size == 4)
- explicit operator decimal: overflow check for high limb on 64-bit

Test updates for platform-dependent behavior:
- Rotation tests: ring size is now nint.Size*8 bits per limb
- GenericMath tests: GetByteCount, PopCount, LeadingZeroCount, TryWrite,
  UnsignedRightShift all reflect nuint-width inline values
- DebuggerDisplay tests: 4 nuint limbs covers larger values on 64-bit
- MyBigInt reference implementation: alignment changed to nint.Size

All 2647 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SubWithBorrow's 32-bit path cast nuint operands through (int) before
widening to long, which sign-extends values >= 0x80000000 instead of
zero-extending. For example, SubWithBorrow(0xFFFFFFFF, 0, 0) would
incorrectly return borrowOut=1 instead of 0.

Fix: cast directly to long (zero-extension for unsigned nuint/uint).

Also update FastReducer comments from '2^(32*k)' to '2^(kcbitNuint*k)'
to reflect the variable limb width.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace branching carry propagation (if/++carry) with branchless
conditional-move patterns in the division inner loop. Branch
misprediction penalties dominate for large divisions where the
carry is unpredictable. The branchless version uses unsigned
underflow comparison (original < loWithCarry) converted to 0/1.

See dotnet#41495 for the original proposal showing 2.6x
improvement for 65536-bit division.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the three-buffer approach (allocate x, y, z; copy+negate x and y;
operate into z) with a single-buffer approach that computes two's
complement limbs on-the-fly via GetTwosComplementLimb(). This eliminates
2 temporary buffer allocations and 2-3 full data passes per operation.

For positive operands (the common case), no negation is needed at all --
magnitude limbs are read directly and zero-extended. For negative operands,
the two's complement is computed inline: ~magnitude + carry with carry
propagation.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PowersOf1e9 table (computed by repeated squaring of 10^9) is
deterministic and expensive to compute for large numbers. Cache it
in a static field so that subsequent ToString/Parse calls on
similarly-sized or smaller numbers reuse the precomputed table
instead of recomputing from scratch.

This eliminates the ArrayPool rent/return overhead at both call
sites and avoids redundant squaring work entirely on cache hits.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Raise DivideBurnikelZieglerThreshold from 32 to 64 limbs based on
empirical benchmarking with nuint (64-bit) limbs. With 64-bit limbs,
each schoolbook division step processes twice the data, making the
schoolbook algorithm competitive to larger sizes. The old threshold
caused BZ to be used at 32-63 limb divisors where it was 9-26%
slower than schoolbook due to recursive overhead.

The new threshold of 64 improves division performance across all
tested sizes (12-26% faster for balanced divisions). Even sizes
above the threshold benefit because BZ sub-problems now bottom out
into schoolbook earlier, avoiding near-threshold recursive overhead.

Multiply thresholds (Karatsuba=32, Toom3=256) were validated as
already optimal for 64-bit limbs through the same benchmarking.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add comprehensive BigIntegerPropertyTests covering algebraic invariants
at various sizes including algorithm threshold boundaries:
- Parse/ToString roundtrip (decimal and hex)
- ToByteArray roundtrip
- Division invariant (q*d+r == n)
- Multiply/divide roundtrip ((a*b)/b == a)
- Square vs multiply consistency (a*a == Pow(a,2))
- GCD divides both operands
- Add/subtract roundtrip ((a+b)-b == a)
- Shift roundtrip ((a<<n)>>n == a)
- Carry propagation with all-ones patterns
- Power-of-two boundary arithmetic
- nuint.MaxValue edge cases
- ModPow basic invariants (a^0=1, a^1=a, a^2=a*a)
- Bitwise identities ((a&b)|(a^b)==a|b, a^a==0, etc.)

Add nuint-boundary test data to existing Add/Subtract Theories:
- Values at 2^64 boundary (carry/borrow across 64-bit limb)
- Multi-limb carry propagation (all-ones + 1)

Fix bug in fused BitwiseAnd: when both operands are positive,
zLen lacked sign extension (+1), causing the two's complement
constructor to misinterpret results with high bit set as negative.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BigIntegerEdgeCaseTests class with tests targeting:
- All 4 sign combinations (+/+, +/-, -/+, -/-) for arithmetic and
  bitwise operations, exercising two's complement paths
- Vector-aligned limb counts (2,3,4,5,7,8,9,15,16,17 limbs) to
  exercise SIMD loop tails at Vector128/256/512 boundaries
- Asymmetric operand sizes (1x65, 3x33, 16x33 limbs etc.) to
  exercise RightSmall and mixed-algorithm paths
- Toom-Cook 3 multiply with both operands >= 256 limbs
- Barrett reduction in ModPow with large modulus (65 limbs)
- GCD with specific operand size offsets (0,1,2,3+ limb difference)
- CopySign with all inline/array and positive/negative combinations
- Explicit Int32/Int64/Int128/UInt128 conversions at boundaries
- Power-of-two divisors with shift equivalence verification
- Threshold-straddling multiply pairs at Karatsuba/BZ/Toom3 edges

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement Montgomery modular exponentiation as an alternative to Barrett
reduction when the modulus is odd. This avoids expensive division in the
inner loop by working in Montgomery form (multiplying by R = 2^(k*wordBits)
mod n), using REDC for reduction, and converting back at the end.

Key components:
- PowCoreMontgomery: square-and-multiply loop in Montgomery domain
- ComputeMontgomeryInverse: Newton's method for -n0^{-1} mod 2^wordsize
- MontgomeryReduce (REDC): single-pass reduction without trial division
- PowCoreBarrett: refactored Barrett path into helper method

The Montgomery path is selected automatically when the modulus is odd,
which is the common case for cryptographic operations (RSA, etc.).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cover previously uncovered code paths:
- PowCore(FastReducer, multi-limb power): 20% → 100%
- PowCoreBarrett pool return paths: 97.2% → 100%
- Pow pool return paths: 90% → 100%
- PowCore multi-limb dispatcher: 80% → 100%

Uses even moduli (≥ 33 and 65 limbs) with multi-limb exponents to
exercise the Barrett reduction path that Montgomery doesn't cover.
Cross-validates using the identity a^e ≡ a^e1 * a^e2 (mod m).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Montgomery ModPow now uses k-ary sliding window exponentiation instead of
basic binary square-and-multiply. The window size is chosen adaptively based
on exponent bit length (1-7), matching Java's BigInteger thresholds. For
window size k, this precomputes 2^(k-1) odd powers of the base in Montgomery
form, then processes the exponent left-to-right. This reduces the number of
Montgomery multiplications by ~20-40% for large exponents.

Squaring now uses separate thresholds (SquareKaratsubaThreshold=48,
SquareToom3Threshold=384) instead of sharing the multiply thresholds.
Squaring has fewer cross-terms than general multiplication, so schoolbook
squaring remains competitive at larger sizes. Java and Python both use
higher thresholds for squaring than multiplication.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When GCD(0, x) returns |x|, reuse the existing _bits array directly
via the (nint, nuint[]) constructor instead of copying through the
ReadOnlySpan constructor. Since BigInteger is immutable, the array
is never mutated after construction, making sharing safe.

Add 25 targeted GCD-with-zero test cases covering:
- Both operand orderings (GCD(0,x) and GCD(x,0))
- Positive and negative multi-limb values
- Various sizes: 1-limb, 2-limb, 3-limb, many-limb
- Edge case: 1-limb values that exceed nint.MaxValue (stored in _bits)
- Symmetry and sign-invariance properties

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Int128 arithmetic in LehmerCore with Math.BigMul (compiles to
a single mul instruction on x64) plus explicit carry tracking using
long/ulong. The Lehmer coefficients are bounded at 31 bits, so each
95-bit product is computed natively.

Add fast paths to DivRem(nuint, nuint, nuint, out nuint):
- hi == 0: single native 64-bit division
- divisor <= uint.MaxValue: split into two chained 32-bit-half divisions
  (covers ToString base-10^9 conversion and small-divisor Divide paths)
- Fallback to UInt128 for large divisors (Knuth division)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Naive base-10^9 conversion loop had a 4.3x regression for medium
numbers (200 digits) because BigIntegerCalculator.DivRem was too large
to inline on 64-bit, preventing the JIT from applying multiply-by-
reciprocal optimization for the constant divisor 10^9.

Fix: process each 64-bit limb as two 32-bit halves. Each inner division
becomes ulong/const_uint which the JIT optimizes to a single mulq+shift
(~5 cycles) instead of a div r64 (~35 cycles). The math is equivalent:
(base * 2^32 + hi) * 2^32 + lo = base * 2^64 + limb.

Results: 200-char 4.30x regression eliminated (now 1.00x); 20K-char
cases improved from 0.97-1.18x to 0.52-0.54x (2x faster than main).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add int-range fast path in operator* with Int128 fallback extracted
  to [NoInlining] helper to keep operator body small for JIT inlining
- Rewrite LehmerCore 64-bit path to process nuint limbs as uint halves
  via MemoryMarshal.Cast on little-endian, using the same cheap long
  arithmetic as the 32-bit path (eliminates Math.BigMul + manual carry
  overhead); big-endian falls back to Int128 widening arithmetic
- Add uint fast path in scalar Gcd(nuint, nuint) for values <= uint.Max
  (div r32 is faster than div r64)
- Replace byte-scanning loop in TryGetBytes with BitOperations.LeadingZeroCount
  for O(1) MSB detection (was O(bytesPerLimb) per call)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n inner loops

Introduce fused multiply (Mul1), multiply-accumulate (MulAdd1), and
subtract-multiply (SubMul1) span-based primitives unrolled by 4 on 64-bit.
Issuing 4 widening multiplies before consuming carries hides the 3-5 cycle
multiply latency by allowing the CPU to pipeline them while carry chains
complete sequentially.

- Mul1: replaces scalar Multiply(left, scalar, bits)
- MulAdd1: replaces inline inner loop in schoolbook multiply (Naive)
- SubMul1: replaces SubtractDivisor loop in grammar-school division
- All use Span/ReadOnlySpan parameters (no Unsafe.Add) for safety
- Both use UInt128 for clean carry tracking on 64-bit, ulong on 32-bit
- Removed now-unused MulAdd helper
- All 3031 tests pass

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert all Unsafe.Add-based inner loops in AddSub.cs and SquMul.cs to
use direct span indexing (span[i]) with upfront bounds-proving accesses
that enable the JIT to elide bounds checks. Key changes:

- AddSub.cs: Convert Add, AddSelf, Subtract, SubtractSelf and their
  tail helpers from Unsafe.Add to span[i] with for-loops + bounds hints.
  Remove unused ref nuint resultPtr parameters from tail helpers.
  Remove using System.Runtime.InteropServices.

- SquMul.cs: Convert Naive square inner loop and SubtractCore from
  Unsafe.Add/MemoryMarshal.GetReference to span[i] with bounds hints.

The pattern uses upfront span accesses to prove cross-span length
relationships to the JIT, enabling bounds check elision in standard
for loops. This avoids do...while loops which prevent JIT range analysis.

Benchmarks confirm no regression vs the previous Unsafe.Add version.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BigMul: Call Math.BigMul directly instead of going through UInt128
  multiplication. Math.BigMul uses Bmi2.X64.MultiplyNoFlags on x86-64
  and ArmBase.Arm64.MultiplyHigh on ARM64.

- DivRem: Use X86Base.X64.DivRem for 128-by-64 division with large
  divisors (>uint.MaxValue), replacing the expensive UInt128 software
  division path. Falls back to UInt128 on non-x86 platforms.

- AddWithCarry: Replace UInt128-based carry detection with native
  overflow detection pattern (sum < a), avoiding 128-bit arithmetic
  for a simple carry-out computation.

Benchmarks show: Add 64K 12% faster, Divide 64K 6% faster, Add 1K
regression eliminated (1.034 -> 0.994).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The nuint limb conversion changed _sign from int to nint, which caused
significant regressions for small-value operations (Multiply 16-bit 1.78x,
Divide 16-bit 1.67x) due to 64-bit imul/idiv being slower than 32-bit
equivalents. Reverting _sign to int while keeping nuint[] _bits preserves
all large-number speedups while restoring small-value performance.

Key changes:
- Field declaration: nint _sign -> int _sign
- s_bnMinInt simplified (no nint.Size branching)
- All constructors updated for int _sign with nuint[] _bits
- Removed MultiplyNint helper (no longer needed)
- All operators simplified (removed nint.Size==8 branches for _sign)
- Fixed decimal constructor canonicalization: use _bits[0] <= int.MaxValue
  instead of (int)_bits[0] > 0 to avoid truncation with 64-bit limbs
- NumericsHelpers.Abs updated for int parameter

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On 64-bit platforms, PowCore unconditionally used UInt128 arithmetic
for the square-and-multiply loop, even when all values fit in uint.
UInt128 multiply+modulus is significantly more expensive than ulong.

Now checks modulus <= uint.MaxValue and uses the cheaper ulong path,
eliminating the ModPow 16-bit regression (1.27x -> 1.02x) while
preserving all large-number speedups.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@dotnet-policy-service
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-numerics
See info in area-owners.md if you want to be subscribed.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR modernizes System.Numerics.BigInteger internals by switching the limb representation from uint to native-width nuint, and updates core arithmetic/modular algorithms (notably ModPow) to better exploit 64-bit platforms while keeping the public API unchanged.

Changes:

  • Reworked BigInteger arithmetic plumbing to operate on nuint limbs (32-bit on x86, 64-bit on x64), including shift/rotate, add/sub, division, GCD, and reduction helpers.
  • Improved modular exponentiation for odd moduli via a Montgomery + sliding-window path, plus supporting primitives (AddWithCarry, SubWithBorrow, Mul1, MulAdd1, SubMul1, etc.).
  • Updated and expanded tests to be limb-width aware and added a large new property/edge-case test suite.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/libraries/System.Runtime.Numerics/tests/System.Runtime.Numerics.Tests.csproj Adds the new BigInteger property/edge-case test file to the test project.
src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs Updates generic math tests to be native-limb-width aware (counts/rotates/byte counts/write-bytes).
src/libraries/System.Runtime.Numerics/tests/BigInteger/op_rightshift.cs Adjusts shift tests for native limb width and nuint-based internals.
src/libraries/System.Runtime.Numerics/tests/BigInteger/Rotate.cs Updates rotate tests for limb-width variability; refactors theory data generation.
src/libraries/System.Runtime.Numerics/tests/BigInteger/MyBigInt.cs Updates helper logic that assumed 4-byte limb alignment to use nint.Size.
src/libraries/System.Runtime.Numerics/tests/BigInteger/DebuggerDisplayTests.cs Updates debugger display tests for nuint[] internal constructor and 64-bit formatting differences.
src/libraries/System.Runtime.Numerics/tests/BigInteger/BigIntegerPropertyTests.cs Adds extensive property-based + edge-case coverage for BigInteger across sizes/thresholds/sign combos.
src/libraries/System.Runtime.Numerics/tests/BigInteger/BigInteger.SubtractTests.cs Adds new subtract cases around 64-bit carry/borrow boundaries and large power-of-two transitions.
src/libraries/System.Runtime.Numerics/tests/BigInteger/BigInteger.AddTests.cs Adds new add cases around 64-bit boundaries and multi-limb carry propagation patterns.
src/libraries/System.Runtime.Numerics/src/System/Numerics/NumericsHelpers.cs Updates complement helpers to Span<nuint> and adapts integer helpers for the new limb width.
src/libraries/System.Runtime.Numerics/src/System/Numerics/Complex.cs Minor modernizations (target-typed new, inline out vars) unrelated to BigInteger logic.
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.Utils.cs Introduces key native-limb primitives and constants (BitsPerLimb, carry/borrow, widening ops, scalar mul helpers).
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.ShiftRot.cs Ports rotate/shift routines to nuint limbs with SIMD paths.
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.PowMod.cs Reworks modular exponentiation, including Montgomery + sliding window for odd moduli and updated single-limb paths.
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.GcdInv.cs Updates GCD/Lehmer core to handle native-limb representation and avoids unnecessary overhead on some paths.
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.FastReducer.cs Ports Barrett/FastReducer logic to nuint limbs.
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.DivRem.cs Ports and retunes division (incl. Burnikel–Ziegler threshold) for nuint limbs and new helpers.
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.AddSub.cs Ports add/subtract to nuint limbs using new carry/borrow primitives and bounds-check elimination patterns.
src/libraries/System.Runtime.Numerics/src/System/Number.Polyfill.cs Removes BOM and small formatting/style adjustments.
src/libraries/Common/src/System/Number.Parsing.Common.cs Small refactor to pattern matching for readability.
src/libraries/Common/src/System/Number.NumberBuffer.cs Minor debug assert/style updates.
src/libraries/Common/src/System/Number.Formatting.Common.cs Minor debug assert/style updates (pattern matching).
Comments suppressed due to low confidence (3)

src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs:482

  • The final destination.ToArray() assertion assumes the entire 24-byte buffer has deterministic contents. On 32-bit, the last 4 bytes were never written by any successful call in this test (max write size is 20), so those bytes can contain stale stack data. Either clear destination up front / clear the tail after each write, or only assert over the bytes that are known to have been written.
            Assert.False(BinaryIntegerHelper<BigInteger>.TryWriteBigEndian(default, Span<byte>.Empty, out bytesWritten));
            Assert.Equal(0, bytesWritten);
            Assert.Equal(new byte[] { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, destination.ToArray());

src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs:493

  • destination is stackalloc'd as 24 bytes for all platforms, but on 32-bit the largest successful writes in this test only write 20 bytes. Any later assertions that compare the full buffer (rather than bytesWritten) can observe uninitialized stack data. Consider allocating destination with nint.Size == 8 ? 24 : 20, or clearing the span before asserting full-buffer contents.
            Span<byte> destination = stackalloc byte[24];
            int bytesWritten = 0;

            Assert.True(BinaryIntegerHelper<BigInteger>.TryWriteLittleEndian(Zero, destination, out bytesWritten));
            Assert.Equal(nint.Size, bytesWritten);
            Assert.Equal(new byte[nint.Size], destination.Slice(0, nint.Size).ToArray());

src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs:549

  • The final destination.ToArray() assertion in the Little Endian test compares all 24 bytes. On 32-bit, the last 4 bytes are not guaranteed to have been written by any prior successful call (max write is 20), so the expected trailing zeros can be unstable. Clear destination (or at least the tail) before the last assertion, or assert only over the bytes known to be written.
            Assert.False(BinaryIntegerHelper<BigInteger>.TryWriteLittleEndian(default, Span<byte>.Empty, out bytesWritten));
            Assert.Equal(0, bytesWritten);
            Assert.Equal(nint.Size == 8 ? new byte[] { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF } : new byte[] { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00 }, destination.ToArray());

@stephentoub
Copy link
Member Author

@MihuBot benchmark BigInteger -long


namespace System.Numerics
{
internal static partial class BigIntegerCalculator
{
/// <summary>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting to myself that I finished going through BigInteger.cs and need to resume review here.

and noting I didn't post any of the repetitive comments around nint.Size or other nits, but if we do fix those then we should fix it consistently.

Copilot AI review requested due to automatic review settings March 20, 2026 03:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 27 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.AddSub.cs:229

  • The private Subtract helper doesn't assert that the final borrow is 0. Given other subtraction helpers (e.g., SubtractSelf) now enforce the borrow == 0 postcondition, consider adding a Debug.Assert(borrow == 0) at the end here as well so underflow bugs are caught early in Debug builds.
        [MethodImpl(MethodImplOptions.AggressiveInlining)]
        private static void Subtract(ReadOnlySpan<nuint> left, Span<nuint> bits, int startIndex, nuint initialBorrow)
        {
            // Executes the subtraction for one big and one single-limb integer.

            int i = startIndex;
            nuint borrow = initialBorrow;

            if (left.Length != 0)
            {
                _ = bits[left.Length - 1];
            }

            if (left.Length <= CopyToThreshold)
            {
                for (; i < left.Length; i++)
                {
                    nuint val = left[i];
                    nuint diff = val - borrow;
                    borrow = (diff > val) ? 1 : (nuint)0;
                    bits[i] = diff;
                }
            }
            else
            {
                for (; i < left.Length;)
                {
                    nuint val = left[i];
                    nuint diff = val - borrow;
                    borrow = (diff > val) ? 1 : (nuint)0;
                    bits[i] = diff;
                    i++;

                    // Once borrow is set to 0 it can not be 1 anymore.
                    // So the tail of the loop is just the movement of argument values to result span.
                    if (borrow == 0)
                    {
                        break;
                    }
                }

                if (i < left.Length)
                {
                    CopyTail(left, bits, i);
                }
            }
        }

stephentoub and others added 3 commits March 20, 2026 10:09
…ered positive'

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explains why the intermediate decimal representation uses nuint[] (wasting
32 bits per element on 64-bit) rather than uint[] or base-1e19:
- Base-1e19 benchmarked as net regression (UInt128 division too slow)
- uint[] would require duplicating BigIntegerCalculator arithmetic routines

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@stephentoub
Copy link
Member Author

@EgorBot -linux_amd -osx_arm64

using System.Numerics;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);

public class Bench
{
    [Benchmark]
    [Arguments(3000)]
    public StringBuilder PiDigits(int n) => PiDigitsRunner.Run(n);
}

public class PiDigitsRunner
{
    BigInteger q, r, s, t, u, v, w;
    int i;
    StringBuilder strBuf = new StringBuilder(40), lastBuf = null;
    int n;

    PiDigitsRunner(int n) { this.n = n; }

    void compose_r(int bq, int br, int bs, int bt)
    {
        u = r * bs;
        r *= bq;
        v = t * br;
        r += v;
        t *= bt;
        t += u;
        s *= bt;
        u = q * bs;
        s += u;
        q *= bq;
    }

    void compose_l(int bq, int br, int bs, int bt)
    {
        r *= bt;
        u = q * br;
        r += u;
        u = t * bs;
        t *= bt;
        v = s * br;
        t += v;
        s *= bq;
        s += u;
        q *= bq;
    }

    int extract(int j)
    {
        u = q * j;
        u += r;
        v = s * j;
        v += t;
        w = u / v;
        return (int)w;
    }

    bool prdigit(int y)
    {
        strBuf.Append(y);
        if (++i % 10 == 0 || i == n)
        {
            if (i % 10 != 0)
                for (int j = 10 - (i % 10); j > 0; j--)
                    strBuf.Append(" ");
            strBuf.Append("\t:");
            strBuf.Append(i);
            lastBuf = strBuf;
            strBuf = new StringBuilder(40);
        }
        return i == n;
    }

    void RunInner()
    {
        int k = 1;
        i = 0;
        q = 1; r = 0; s = 0; t = 1;
        for (;;)
        {
            int y = extract(3);
            if (y == extract(4))
            {
                if (prdigit(y)) return;
                compose_r(10, -10 * y, 0, 1);
            }
            else
            {
                compose_l(k, 4 * k + 2, 0, 2 * k + 1);
                k++;
            }
        }
    }

    public static StringBuilder Run(int n)
    {
        var m = new PiDigitsRunner(n);
        m.RunInner();
        return m.lastBuf;
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Perf] BigInteger formatting performance regression in .NET 9 BigInteger.ModPow asserting in SubtractSelf BigInteger performance improvements

7 participants