Rewrite BigInteger internals to use native-width (nuint) limbs#125799
Rewrite BigInteger internals to use native-width (nuint) limbs#125799stephentoub wants to merge 27 commits intodotnet:mainfrom
Conversation
- Change all span/array types for limbs: uint->nuint, Span<uint>->Span<nuint>, ReadOnlySpan<uint>->ReadOnlySpan<nuint>, ArrayPool<uint>->ArrayPool<nuint>, stackalloc uint[]->stackalloc nuint[] - PowersOf1e9: platform-dependent Indexes and LeadingPowers (32-bit/64-bit) using MemoryMarshal.Cast for ReadOnlySpan<nuint> from fixed-size backing - MultiplyAdd: use UInt128 on 64-bit, ulong on 32-bit for widening multiply - Naive base conversion: use BigIntegerCalculator.DivRem for widening divide - OmittedLength: divide by kcbitNuint instead of hardcoded 32 - digitRatio constants: scale by 32/kcbitNuint for platform-appropriate ratios - IBigIntegerHexOrBinaryParser: nuint blocks (DigitsPerBlock uses nint.Size) - BigIntegerToDecChars: cast nuint base1E9 values to uint for UInt32ToDecChars - Fix pre-existing unused variable in BigInteger.cs Log method Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Change the internal limb type of BigInteger from uint (32-bit) to nuint (native-width), so that on 64-bit platforms each limb holds a full 64-bit value. This halves loop iterations for all arithmetic operations on 64-bit systems while remaining identical on 32-bit (nuint == uint). Key changes: - Struct fields: int _sign -> nint _sign, uint[]? _bits -> nuint[]? _bits - All BigIntegerCalculator methods: ReadOnlySpan<uint> -> ReadOnlySpan<nuint> - Widening multiply uses Math.BigMul(ulong,ulong)->UInt128 on 64-bit - Squaring carry propagation uses UInt128 to prevent overflow on 64-bit - PowersOf1e9 constructor strips trailing zero limbs after squaring - GCD 2-limb short-circuit restricted to 32-bit only (nint.Size == 4) - explicit operator decimal: overflow check for high limb on 64-bit Test updates for platform-dependent behavior: - Rotation tests: ring size is now nint.Size*8 bits per limb - GenericMath tests: GetByteCount, PopCount, LeadingZeroCount, TryWrite, UnsignedRightShift all reflect nuint-width inline values - DebuggerDisplay tests: 4 nuint limbs covers larger values on 64-bit - MyBigInt reference implementation: alignment changed to nint.Size All 2647 tests pass. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
SubWithBorrow's 32-bit path cast nuint operands through (int) before widening to long, which sign-extends values >= 0x80000000 instead of zero-extending. For example, SubWithBorrow(0xFFFFFFFF, 0, 0) would incorrectly return borrowOut=1 instead of 0. Fix: cast directly to long (zero-extension for unsigned nuint/uint). Also update FastReducer comments from '2^(32*k)' to '2^(kcbitNuint*k)' to reflect the variable limb width. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace branching carry propagation (if/++carry) with branchless conditional-move patterns in the division inner loop. Branch misprediction penalties dominate for large divisions where the carry is unpredictable. The branchless version uses unsigned underflow comparison (original < loWithCarry) converted to 0/1. See dotnet#41495 for the original proposal showing 2.6x improvement for 65536-bit division. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the three-buffer approach (allocate x, y, z; copy+negate x and y; operate into z) with a single-buffer approach that computes two's complement limbs on-the-fly via GetTwosComplementLimb(). This eliminates 2 temporary buffer allocations and 2-3 full data passes per operation. For positive operands (the common case), no negation is needed at all -- magnitude limbs are read directly and zero-extended. For negative operands, the two's complement is computed inline: ~magnitude + carry with carry propagation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PowersOf1e9 table (computed by repeated squaring of 10^9) is deterministic and expensive to compute for large numbers. Cache it in a static field so that subsequent ToString/Parse calls on similarly-sized or smaller numbers reuse the precomputed table instead of recomputing from scratch. This eliminates the ArrayPool rent/return overhead at both call sites and avoids redundant squaring work entirely on cache hits. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Raise DivideBurnikelZieglerThreshold from 32 to 64 limbs based on empirical benchmarking with nuint (64-bit) limbs. With 64-bit limbs, each schoolbook division step processes twice the data, making the schoolbook algorithm competitive to larger sizes. The old threshold caused BZ to be used at 32-63 limb divisors where it was 9-26% slower than schoolbook due to recursive overhead. The new threshold of 64 improves division performance across all tested sizes (12-26% faster for balanced divisions). Even sizes above the threshold benefit because BZ sub-problems now bottom out into schoolbook earlier, avoiding near-threshold recursive overhead. Multiply thresholds (Karatsuba=32, Toom3=256) were validated as already optimal for 64-bit limbs through the same benchmarking. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add comprehensive BigIntegerPropertyTests covering algebraic invariants at various sizes including algorithm threshold boundaries: - Parse/ToString roundtrip (decimal and hex) - ToByteArray roundtrip - Division invariant (q*d+r == n) - Multiply/divide roundtrip ((a*b)/b == a) - Square vs multiply consistency (a*a == Pow(a,2)) - GCD divides both operands - Add/subtract roundtrip ((a+b)-b == a) - Shift roundtrip ((a<<n)>>n == a) - Carry propagation with all-ones patterns - Power-of-two boundary arithmetic - nuint.MaxValue edge cases - ModPow basic invariants (a^0=1, a^1=a, a^2=a*a) - Bitwise identities ((a&b)|(a^b)==a|b, a^a==0, etc.) Add nuint-boundary test data to existing Add/Subtract Theories: - Values at 2^64 boundary (carry/borrow across 64-bit limb) - Multi-limb carry propagation (all-ones + 1) Fix bug in fused BitwiseAnd: when both operands are positive, zLen lacked sign extension (+1), causing the two's complement constructor to misinterpret results with high bit set as negative. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add BigIntegerEdgeCaseTests class with tests targeting: - All 4 sign combinations (+/+, +/-, -/+, -/-) for arithmetic and bitwise operations, exercising two's complement paths - Vector-aligned limb counts (2,3,4,5,7,8,9,15,16,17 limbs) to exercise SIMD loop tails at Vector128/256/512 boundaries - Asymmetric operand sizes (1x65, 3x33, 16x33 limbs etc.) to exercise RightSmall and mixed-algorithm paths - Toom-Cook 3 multiply with both operands >= 256 limbs - Barrett reduction in ModPow with large modulus (65 limbs) - GCD with specific operand size offsets (0,1,2,3+ limb difference) - CopySign with all inline/array and positive/negative combinations - Explicit Int32/Int64/Int128/UInt128 conversions at boundaries - Power-of-two divisors with shift equivalence verification - Threshold-straddling multiply pairs at Karatsuba/BZ/Toom3 edges Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Implement Montgomery modular exponentiation as an alternative to Barrett
reduction when the modulus is odd. This avoids expensive division in the
inner loop by working in Montgomery form (multiplying by R = 2^(k*wordBits)
mod n), using REDC for reduction, and converting back at the end.
Key components:
- PowCoreMontgomery: square-and-multiply loop in Montgomery domain
- ComputeMontgomeryInverse: Newton's method for -n0^{-1} mod 2^wordsize
- MontgomeryReduce (REDC): single-pass reduction without trial division
- PowCoreBarrett: refactored Barrett path into helper method
The Montgomery path is selected automatically when the modulus is odd,
which is the common case for cryptographic operations (RSA, etc.).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Cover previously uncovered code paths: - PowCore(FastReducer, multi-limb power): 20% → 100% - PowCoreBarrett pool return paths: 97.2% → 100% - Pow pool return paths: 90% → 100% - PowCore multi-limb dispatcher: 80% → 100% Uses even moduli (≥ 33 and 65 limbs) with multi-limb exponents to exercise the Barrett reduction path that Montgomery doesn't cover. Cross-validates using the identity a^e ≡ a^e1 * a^e2 (mod m). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Montgomery ModPow now uses k-ary sliding window exponentiation instead of basic binary square-and-multiply. The window size is chosen adaptively based on exponent bit length (1-7), matching Java's BigInteger thresholds. For window size k, this precomputes 2^(k-1) odd powers of the base in Montgomery form, then processes the exponent left-to-right. This reduces the number of Montgomery multiplications by ~20-40% for large exponents. Squaring now uses separate thresholds (SquareKaratsubaThreshold=48, SquareToom3Threshold=384) instead of sharing the multiply thresholds. Squaring has fewer cross-terms than general multiplication, so schoolbook squaring remains competitive at larger sizes. Java and Python both use higher thresholds for squaring than multiplication. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When GCD(0, x) returns |x|, reuse the existing _bits array directly via the (nint, nuint[]) constructor instead of copying through the ReadOnlySpan constructor. Since BigInteger is immutable, the array is never mutated after construction, making sharing safe. Add 25 targeted GCD-with-zero test cases covering: - Both operand orderings (GCD(0,x) and GCD(x,0)) - Positive and negative multi-limb values - Various sizes: 1-limb, 2-limb, 3-limb, many-limb - Edge case: 1-limb values that exceed nint.MaxValue (stored in _bits) - Symmetry and sign-invariance properties Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace Int128 arithmetic in LehmerCore with Math.BigMul (compiles to a single mul instruction on x64) plus explicit carry tracking using long/ulong. The Lehmer coefficients are bounded at 31 bits, so each 95-bit product is computed natively. Add fast paths to DivRem(nuint, nuint, nuint, out nuint): - hi == 0: single native 64-bit division - divisor <= uint.MaxValue: split into two chained 32-bit-half divisions (covers ToString base-10^9 conversion and small-divisor Divide paths) - Fallback to UInt128 for large divisors (Knuth division) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The Naive base-10^9 conversion loop had a 4.3x regression for medium numbers (200 digits) because BigIntegerCalculator.DivRem was too large to inline on 64-bit, preventing the JIT from applying multiply-by- reciprocal optimization for the constant divisor 10^9. Fix: process each 64-bit limb as two 32-bit halves. Each inner division becomes ulong/const_uint which the JIT optimizes to a single mulq+shift (~5 cycles) instead of a div r64 (~35 cycles). The math is equivalent: (base * 2^32 + hi) * 2^32 + lo = base * 2^64 + limb. Results: 200-char 4.30x regression eliminated (now 1.00x); 20K-char cases improved from 0.97-1.18x to 0.52-0.54x (2x faster than main). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add int-range fast path in operator* with Int128 fallback extracted to [NoInlining] helper to keep operator body small for JIT inlining - Rewrite LehmerCore 64-bit path to process nuint limbs as uint halves via MemoryMarshal.Cast on little-endian, using the same cheap long arithmetic as the 32-bit path (eliminates Math.BigMul + manual carry overhead); big-endian falls back to Int128 widening arithmetic - Add uint fast path in scalar Gcd(nuint, nuint) for values <= uint.Max (div r32 is faster than div r64) - Replace byte-scanning loop in TryGetBytes with BitOperations.LeadingZeroCount for O(1) MSB detection (was O(bytesPerLimb) per call) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n inner loops Introduce fused multiply (Mul1), multiply-accumulate (MulAdd1), and subtract-multiply (SubMul1) span-based primitives unrolled by 4 on 64-bit. Issuing 4 widening multiplies before consuming carries hides the 3-5 cycle multiply latency by allowing the CPU to pipeline them while carry chains complete sequentially. - Mul1: replaces scalar Multiply(left, scalar, bits) - MulAdd1: replaces inline inner loop in schoolbook multiply (Naive) - SubMul1: replaces SubtractDivisor loop in grammar-school division - All use Span/ReadOnlySpan parameters (no Unsafe.Add) for safety - Both use UInt128 for clean carry tracking on 64-bit, ulong on 32-bit - Removed now-unused MulAdd helper - All 3031 tests pass Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Convert all Unsafe.Add-based inner loops in AddSub.cs and SquMul.cs to use direct span indexing (span[i]) with upfront bounds-proving accesses that enable the JIT to elide bounds checks. Key changes: - AddSub.cs: Convert Add, AddSelf, Subtract, SubtractSelf and their tail helpers from Unsafe.Add to span[i] with for-loops + bounds hints. Remove unused ref nuint resultPtr parameters from tail helpers. Remove using System.Runtime.InteropServices. - SquMul.cs: Convert Naive square inner loop and SubtractCore from Unsafe.Add/MemoryMarshal.GetReference to span[i] with bounds hints. The pattern uses upfront span accesses to prove cross-span length relationships to the JIT, enabling bounds check elision in standard for loops. This avoids do...while loops which prevent JIT range analysis. Benchmarks confirm no regression vs the previous Unsafe.Add version. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BigMul: Call Math.BigMul directly instead of going through UInt128 multiplication. Math.BigMul uses Bmi2.X64.MultiplyNoFlags on x86-64 and ArmBase.Arm64.MultiplyHigh on ARM64. - DivRem: Use X86Base.X64.DivRem for 128-by-64 division with large divisors (>uint.MaxValue), replacing the expensive UInt128 software division path. Falls back to UInt128 on non-x86 platforms. - AddWithCarry: Replace UInt128-based carry detection with native overflow detection pattern (sum < a), avoiding 128-bit arithmetic for a simple carry-out computation. Benchmarks show: Add 64K 12% faster, Divide 64K 6% faster, Add 1K regression eliminated (1.034 -> 0.994). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The nuint limb conversion changed _sign from int to nint, which caused significant regressions for small-value operations (Multiply 16-bit 1.78x, Divide 16-bit 1.67x) due to 64-bit imul/idiv being slower than 32-bit equivalents. Reverting _sign to int while keeping nuint[] _bits preserves all large-number speedups while restoring small-value performance. Key changes: - Field declaration: nint _sign -> int _sign - s_bnMinInt simplified (no nint.Size branching) - All constructors updated for int _sign with nuint[] _bits - Removed MultiplyNint helper (no longer needed) - All operators simplified (removed nint.Size==8 branches for _sign) - Fixed decimal constructor canonicalization: use _bits[0] <= int.MaxValue instead of (int)_bits[0] > 0 to avoid truncation with 64-bit limbs - NumericsHelpers.Abs updated for int parameter Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
On 64-bit platforms, PowCore unconditionally used UInt128 arithmetic for the square-and-multiply loop, even when all values fit in uint. UInt128 multiply+modulus is significantly more expensive than ulong. Now checks modulus <= uint.MaxValue and uses the cheaper ulong path, eliminating the ModPow 16-bit regression (1.27x -> 1.02x) while preserving all large-number speedups. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
Tagging subscribers to this area: @dotnet/area-system-numerics |
There was a problem hiding this comment.
Pull request overview
This PR modernizes System.Numerics.BigInteger internals by switching the limb representation from uint to native-width nuint, and updates core arithmetic/modular algorithms (notably ModPow) to better exploit 64-bit platforms while keeping the public API unchanged.
Changes:
- Reworked BigInteger arithmetic plumbing to operate on
nuintlimbs (32-bit on x86, 64-bit on x64), including shift/rotate, add/sub, division, GCD, and reduction helpers. - Improved modular exponentiation for odd moduli via a Montgomery + sliding-window path, plus supporting primitives (
AddWithCarry,SubWithBorrow,Mul1,MulAdd1,SubMul1, etc.). - Updated and expanded tests to be limb-width aware and added a large new property/edge-case test suite.
Reviewed changes
Copilot reviewed 24 out of 25 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/libraries/System.Runtime.Numerics/tests/System.Runtime.Numerics.Tests.csproj | Adds the new BigInteger property/edge-case test file to the test project. |
| src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs | Updates generic math tests to be native-limb-width aware (counts/rotates/byte counts/write-bytes). |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/op_rightshift.cs | Adjusts shift tests for native limb width and nuint-based internals. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/Rotate.cs | Updates rotate tests for limb-width variability; refactors theory data generation. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/MyBigInt.cs | Updates helper logic that assumed 4-byte limb alignment to use nint.Size. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/DebuggerDisplayTests.cs | Updates debugger display tests for nuint[] internal constructor and 64-bit formatting differences. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/BigIntegerPropertyTests.cs | Adds extensive property-based + edge-case coverage for BigInteger across sizes/thresholds/sign combos. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/BigInteger.SubtractTests.cs | Adds new subtract cases around 64-bit carry/borrow boundaries and large power-of-two transitions. |
| src/libraries/System.Runtime.Numerics/tests/BigInteger/BigInteger.AddTests.cs | Adds new add cases around 64-bit boundaries and multi-limb carry propagation patterns. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/NumericsHelpers.cs | Updates complement helpers to Span<nuint> and adapts integer helpers for the new limb width. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/Complex.cs | Minor modernizations (target-typed new, inline out vars) unrelated to BigInteger logic. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.Utils.cs | Introduces key native-limb primitives and constants (BitsPerLimb, carry/borrow, widening ops, scalar mul helpers). |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.ShiftRot.cs | Ports rotate/shift routines to nuint limbs with SIMD paths. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.PowMod.cs | Reworks modular exponentiation, including Montgomery + sliding window for odd moduli and updated single-limb paths. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.GcdInv.cs | Updates GCD/Lehmer core to handle native-limb representation and avoids unnecessary overhead on some paths. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.FastReducer.cs | Ports Barrett/FastReducer logic to nuint limbs. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.DivRem.cs | Ports and retunes division (incl. Burnikel–Ziegler threshold) for nuint limbs and new helpers. |
| src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.AddSub.cs | Ports add/subtract to nuint limbs using new carry/borrow primitives and bounds-check elimination patterns. |
| src/libraries/System.Runtime.Numerics/src/System/Number.Polyfill.cs | Removes BOM and small formatting/style adjustments. |
| src/libraries/Common/src/System/Number.Parsing.Common.cs | Small refactor to pattern matching for readability. |
| src/libraries/Common/src/System/Number.NumberBuffer.cs | Minor debug assert/style updates. |
| src/libraries/Common/src/System/Number.Formatting.Common.cs | Minor debug assert/style updates (pattern matching). |
Comments suppressed due to low confidence (3)
src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs:482
- The final
destination.ToArray()assertion assumes the entire 24-byte buffer has deterministic contents. On 32-bit, the last 4 bytes were never written by any successful call in this test (max write size is 20), so those bytes can contain stale stack data. Either cleardestinationup front / clear the tail after each write, or only assert over the bytes that are known to have been written.
Assert.False(BinaryIntegerHelper<BigInteger>.TryWriteBigEndian(default, Span<byte>.Empty, out bytesWritten));
Assert.Equal(0, bytesWritten);
Assert.Equal(new byte[] { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 }, destination.ToArray());
src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs:493
destinationis stackalloc'd as 24 bytes for all platforms, but on 32-bit the largest successful writes in this test only write 20 bytes. Any later assertions that compare the full buffer (rather thanbytesWritten) can observe uninitialized stack data. Consider allocatingdestinationwithnint.Size == 8 ? 24 : 20, or clearing the span before asserting full-buffer contents.
Span<byte> destination = stackalloc byte[24];
int bytesWritten = 0;
Assert.True(BinaryIntegerHelper<BigInteger>.TryWriteLittleEndian(Zero, destination, out bytesWritten));
Assert.Equal(nint.Size, bytesWritten);
Assert.Equal(new byte[nint.Size], destination.Slice(0, nint.Size).ToArray());
src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs:549
- The final
destination.ToArray()assertion in the Little Endian test compares all 24 bytes. On 32-bit, the last 4 bytes are not guaranteed to have been written by any prior successful call (max write is 20), so the expected trailing zeros can be unstable. Cleardestination(or at least the tail) before the last assertion, or assert only over the bytes known to be written.
Assert.False(BinaryIntegerHelper<BigInteger>.TryWriteLittleEndian(default, Span<byte>.Empty, out bytesWritten));
Assert.Equal(0, bytesWritten);
Assert.Equal(nint.Size == 8 ? new byte[] { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF } : new byte[] { 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00 }, destination.ToArray());
src/libraries/System.Runtime.Numerics/tests/BigIntegerTests.GenericMath.cs
Outdated
Show resolved
Hide resolved
|
@MihuBot benchmark BigInteger -long |
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigInteger.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigInteger.cs
Outdated
Show resolved
Hide resolved
|
|
||
| namespace System.Numerics | ||
| { | ||
| internal static partial class BigIntegerCalculator | ||
| { | ||
| /// <summary> |
There was a problem hiding this comment.
Just noting to myself that I finished going through BigInteger.cs and need to resume review here.
and noting I didn't post any of the repetitive comments around nint.Size or other nits, but if we do fix those then we should fix it consistently.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 27 changed files in this pull request and generated no new comments.
Comments suppressed due to low confidence (1)
src/libraries/System.Runtime.Numerics/src/System/Numerics/BigIntegerCalculator.AddSub.cs:229
- The private Subtract helper doesn't assert that the final
borrowis 0. Given other subtraction helpers (e.g.,SubtractSelf) now enforce theborrow == 0postcondition, consider adding aDebug.Assert(borrow == 0)at the end here as well so underflow bugs are caught early in Debug builds.
[MethodImpl(MethodImplOptions.AggressiveInlining)]
private static void Subtract(ReadOnlySpan<nuint> left, Span<nuint> bits, int startIndex, nuint initialBorrow)
{
// Executes the subtraction for one big and one single-limb integer.
int i = startIndex;
nuint borrow = initialBorrow;
if (left.Length != 0)
{
_ = bits[left.Length - 1];
}
if (left.Length <= CopyToThreshold)
{
for (; i < left.Length; i++)
{
nuint val = left[i];
nuint diff = val - borrow;
borrow = (diff > val) ? 1 : (nuint)0;
bits[i] = diff;
}
}
else
{
for (; i < left.Length;)
{
nuint val = left[i];
nuint diff = val - borrow;
borrow = (diff > val) ? 1 : (nuint)0;
bits[i] = diff;
i++;
// Once borrow is set to 0 it can not be 1 anymore.
// So the tail of the loop is just the movement of argument values to result span.
if (borrow == 0)
{
break;
}
}
if (i < left.Length)
{
CopyTail(left, bits, i);
}
}
}
src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Runtime.Numerics/src/System/Number.BigInteger.cs
Outdated
Show resolved
Hide resolved
…ered positive' Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Explains why the intermediate decimal representation uses nuint[] (wasting 32 bits per element on 64-bit) rather than uint[] or base-1e19: - Base-1e19 benchmarked as net regression (UInt128 division too slow) - uint[] would require duplicating BigIntegerCalculator arithmetic routines Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@EgorBot -linux_amd -osx_arm64 using System.Numerics;
using System.Text;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkSwitcher.FromAssembly(typeof(Bench).Assembly).Run(args);
public class Bench
{
[Benchmark]
[Arguments(3000)]
public StringBuilder PiDigits(int n) => PiDigitsRunner.Run(n);
}
public class PiDigitsRunner
{
BigInteger q, r, s, t, u, v, w;
int i;
StringBuilder strBuf = new StringBuilder(40), lastBuf = null;
int n;
PiDigitsRunner(int n) { this.n = n; }
void compose_r(int bq, int br, int bs, int bt)
{
u = r * bs;
r *= bq;
v = t * br;
r += v;
t *= bt;
t += u;
s *= bt;
u = q * bs;
s += u;
q *= bq;
}
void compose_l(int bq, int br, int bs, int bt)
{
r *= bt;
u = q * br;
r += u;
u = t * bs;
t *= bt;
v = s * br;
t += v;
s *= bq;
s += u;
q *= bq;
}
int extract(int j)
{
u = q * j;
u += r;
v = s * j;
v += t;
w = u / v;
return (int)w;
}
bool prdigit(int y)
{
strBuf.Append(y);
if (++i % 10 == 0 || i == n)
{
if (i % 10 != 0)
for (int j = 10 - (i % 10); j > 0; j--)
strBuf.Append(" ");
strBuf.Append("\t:");
strBuf.Append(i);
lastBuf = strBuf;
strBuf = new StringBuilder(40);
}
return i == n;
}
void RunInner()
{
int k = 1;
i = 0;
q = 1; r = 0; s = 0; t = 1;
for (;;)
{
int y = extract(3);
if (y == extract(4))
{
if (prdigit(y)) return;
compose_r(10, -10 * y, 0, 1);
}
else
{
compose_l(k, 4 * k + 2, 0, 2 * k + 1);
k++;
}
}
}
public static StringBuilder Run(int n)
{
var m = new PiDigitsRunner(n);
m.RunInner();
return m.lastBuf;
}
} |
This rewrites the internal implementation of
System.Numerics.BigIntegerto use native-width (nuint) limbs instead ofuintlimbs. On 64-bit platforms, this halves the number of limbs needed to represent a value, improving throughput of all multi-limb arithmetic. The public API surface remains unchanged.Core Representation Change
_bitsarray changed fromuint[]tonuint[], with all arithmetic primitives updated accordingly._signremainsintto avoid regressions on small inline values.nuintlimbs while maintaining exact formatting behavior.Algorithmic Improvements
Montgomery Multiplication for ModPow
Added Montgomery multiplication with REDC for modular exponentiation when the modulus is odd. This replaces the Barrett reduction path for odd moduli, eliminating expensive division-based reduction in the inner loop.
Sliding Window Exponentiation
ModPow now uses left-to-right sliding window exponentiation (window size chosen based on exponent bit length) instead of simple square-and-multiply, reducing the number of modular multiplications.
Fused Two's Complement for Bitwise Operations
Bitwise AND, OR, and XOR on negative values now fuse the two's complement conversion with the logical operation in a single pass, avoiding separate allocation and negation steps.
Cached Powers-of-10 Table
The
PowersOf1e9table used by divide-and-conquer ToString/Parse is now cached and reused across calls, avoiding repeated expensive computation for large number formatting.Unrolled Single-Limb Primitives
Added
Mul1,MulAdd1, andSubMul1primitives that handle the common case of multiplying a multi-limb number by a single limb. These are used in the inner loops of schoolbook multiply and division.GCD Optimizations
LehmerCorerewritten to avoidInt128/UInt128overhead, using directulongarithmetic.Division Tuning
DivRemhelpers optimized for the wider limb size.Hardware Intrinsics
BigMul,DivRem, andAddWithCarryprimitives use BMI2 (mulx) and ADX (adcx/adox) intrinsics when available, with fallback toMath.BigMul/UInt128arithmetic.ModPow Small-Modulus Optimization
When the modulus fits in a
uint(common for single-limb moduli on 64-bit), the inner loop usesulongarithmetic instead ofUInt128, avoiding unnecessary widening.Bug Fixes
SubtractSelfcallers to restoreborrow == 0postcondition, fixing incorrect results in Barrett reduction (FastReducer.SubMod) when the Barrett quotient overshoots, Toom-3/Karatsuba signed subtraction, and Montgomery reduction overflow handling.SubWithBorrowsign-extension issue that would produce incorrect results on 32-bit platforms.BitwiseAndsign handling for specific negative operand combinations.Test Coverage
Fixes #97780
Fixes #111708
Fixes #41495