Fix IndexOutOfRangeException in FrozenHashTable when creating from huge collections#125555
Fix IndexOutOfRangeException in FrozenHashTable when creating from huge collections#125555
Conversation
When creating a FrozenSet/FrozenDictionary from a collection with more than ~1 billion items, `CalcNumBuckets` overflows int on the computation `uniqueCodesCount * 2`, producing a negative value that causes `primes[-1]` to be accessed. Fix by using `long` for `minNumBuckets` in CalcNumBuckets, and add an Array.MaxLength check in Create to throw OutOfMemoryException instead of a confusing ArgumentOutOfRangeException when the combined bucket+hashcode allocation would overflow int. Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
|
@copilot you need to add tests. Make sure they do fail without the fix |
There was a problem hiding this comment.
Pull request overview
Fixes an IndexOutOfRangeException in System.Collections.Frozen when building frozen sets/dictionaries from extremely large inputs by preventing integer overflow during bucket-count calculation and guarding against invalid pooled-array sizes.
Changes:
- Update
FrozenHashTable.CalcNumBucketsto compute the “minimum bucket count” usinglongto avoidintoverflow for very largeuniqueCodesCount. - Add a max-array-length guard in
FrozenHashTable.Createbefore renting a combinedArrayPool<int>buffer sizednumBuckets + hashCodes.Length.
These tests create collections with > int.MaxValue / 2 items, which is the minimum count that triggers the overflow in `uniqueCodesCount * 2` in CalcNumBuckets. Without the fix, both tests throw IndexOutOfRangeException; with the fix they complete successfully. Co-authored-by: danmoseley <6385855+danmoseley@users.noreply.github.com>
Added two
Both create a collection with 1,073,741,825 items (just over |
There was a problem hiding this comment.
Pull request overview
Fixes an IndexOutOfRangeException in System.Collections.Frozen when building frozen sets/dictionaries from extremely large inputs by preventing integer overflow in bucket-count calculation and guarding large temporary allocations.
Changes:
- Prevent overflow in
FrozenHashTable.CalcNumBucketsby computingminNumBucketsaslong. - Add an
Array.MaxLength-based guard before renting the combinedbucketStarts + nextsbuffer inFrozenHashTable.Create. - Add two
[OuterLoop]regression tests intended to cover the huge-collection scenario forToFrozenSet/ToFrozenDictionary.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs | Uses long for bucket sizing math; adds max-length guard before renting the combined temporary buffer. |
| src/libraries/System.Collections.Immutable/tests/Frozen/FrozenSetTests.cs | Adds an [OuterLoop] test attempting to freeze a ~1.07B element HashSet<int>. |
| src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs | Adds an [OuterLoop] test attempting to freeze a ~1.07B entry Dictionary<int,int>. |
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenSetTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
|
Checking for other places |
The previous tests allocated 1B+ elements requiring tens of GB, making them impractical for CI even as OuterLoop. Replace with 4M-element tests that exercise the CalcNumBuckets early-return path (where uniqueCodesCount * 2 exceeds the precomputed primes table) with ~100MB memory. These tests don't directly verify the integer overflow fix (which requires >1.07B items to trigger) but add coverage for a previously untested code path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
Per @EgorBo's review: instead of an inline #if NET / 0x7FFFFFC7 / #endif in FrozenHashTable.Create(), define the polyfill const in Polyfills.cs under #if !NET (as a partial struct extension) and the NET property in FrozenHashTable.cs. The usage site becomes a simple `ArrayMaxLength` reference with no preprocessor guards. Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
|
Could you request outer loop appropriately? I'm not familiar with which of the several flavors to use here |
|
The tests i added do not require anything like that memory. Sorry if that wasn't clear above. And I found in practice it used up to 40Gb! |
|
/azp list |
This comment was marked as resolved.
This comment was marked as resolved.
|
Azure Pipelines successfully started running 1 pipeline(s). |
Reverts commit 6a9eeae: the partial struct extension in Polyfills.cs and matching #if NET property in FrozenHashTable.cs are removed. FrozenHashTable.cs is restored to using the inline #if NET / #else / 0x7FFFFFC7 / #endif pattern as in commit 339f322. Co-authored-by: EgorBo <523221+EgorBo@users.noreply.github.com>
The tests are already marked with |
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
src/libraries/System.Collections.Immutable/src/System/Collections/Frozen/FrozenHashTable.cs
Show resolved
Hide resolved
- Add blank line after OOM throw block (stephentoub nit) - Early-return in CalcNumBuckets when minNumBuckets + hashCodes.Length exceeds Array.MaxLength, avoiding the expensive collision-counting loop for sizes that will fail in Create anyway Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
addressed |
src/libraries/System.Collections.Immutable/tests/Frozen/FrozenDictionaryTests.cs
Outdated
Show resolved
Hide resolved
| // exercising the CalcNumBuckets early-return path. | ||
| // The integer overflow fix for >1B items (long minNumBuckets) | ||
| // cannot be practically tested without multi-GB allocations. | ||
| const int count = 4_000_000; |
There was a problem hiding this comment.
I don't understand the test. Why is a dictionary with only 4M elements going to hit the minNumBuckets + hashCodes.Length > Array.MaxLength code path?
There was a problem hiding this comment.
Good catch -- it does not hit the Array.MaxLength path. It exercises the prime-table-exceeded early return (largest precomputed prime is 7,199,369, so 4M * 2 overflows the table). Fixed the comment in e169e7e to say that accurately.
There was a problem hiding this comment.
Copilot made these replies, I did not tell it to do so, only push. sigh.
There was a problem hiding this comment.
I still don't understand. Are you saying this test is for something unrelated to this PR?
There was a problem hiding this comment.
Correct, this is unrelated to the PR (description attempted to note this). It's testing the path when min buckets needed is larger than largest precomputed prime.
But, now I see it seems this IS covered for FrozenDictionary, so removed current two and just added a test for FrozenSet that is the same pattern.
Meanwhile as noted, this PR fix cannot be tested without using 20-40GB memory: so I tested it locally.
-- Dan
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tionary test Remove the two custom 4M-element ExceedsPrimeTable tests (FrozenDictionary and FrozenSet) which were redundant with the existing 8M ulong dictionary test. Add a matching CreateHugeSet_Success test in FrozenSet_Generic_Tests_ulong to provide parallel FrozenSet coverage of the same large-collection code path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| [OuterLoop("Takes several seconds")] | ||
| [Theory] | ||
| [InlineData(8_000_000)] | ||
| public void CreateHugeSet_Success(int largeCount) |
There was a problem hiding this comment.
Dictionary one has AllowVeryLargeSizes. Set doesn't have that variable, but it's redundant anyway as in neither case is the fixture subclassed, and the purpose of this I guess is to limit executions of slow tests.
There was a problem hiding this comment.
these comments are by Dan.
ToFrozenSet()/ToFrozenDictionary()throwsIndexOutOfRangeExceptionfor collections with more than ~1.07 billion items due to integer overflow inFrozenHashTable.Description
In
CalcNumBuckets,int minNumBuckets = uniqueCodesCount * 2silently overflows foruniqueCodesCount > int.MaxValue / 2, producing a negative value. This causes the prime-table search loop to skip entirely (minPrimeIndexInclusivestays 0), leavingmaxPrimeIndexExclusive = 0, and thenprimes[-1]throws.IndexOutOfRangeException(crash accessingprimes[-1])OutOfMemoryException(clear, intentional)Changes
CalcNumBuckets: UselongforminNumBucketsto prevent overflow. For inputs exceeding the prime table range, the existing early-return viaHashHelpers.GetPrime(uniqueCodesCount)now fires correctly.Create: Add anArray.MaxLengthguard beforeArrayPool.Rent(numBuckets + hashCodes.Length). Without it, the sum overflowsintfor ~1B+ item collections, producing a misleadingArgumentOutOfRangeException. Uses the same#if NET / #elsefallback pattern asLengthBuckets.cs.Tests: Added two
[OuterLoop]regression tests —ToFrozenSet_LargeSet_ExceedsPrimeTableandToFrozenDictionary_LargeDictionary_ExceedsPrimeTable— with 4M elements. These exercise theCalcNumBucketsearly-return path (whereuniqueCodesCount * 2exceeds the precomputed primes table) and are practical for CI (~100MB memory). The integer overflow fix boundary (>1.07B items) was verified locally on a 64GB machine — the exception changed fromIndexOutOfRangeExceptiontoOutOfMemoryException.Original prompt
This section details on the original issue you should resolve
<issue_title>Creating a FrozenSet with from huge HashSet throws an exception</issue_title>
<issue_description>### Describe the bug
Creating a FrozenSet with from HashSet throws an exception
To Reproduce
Exceptions (if any)
Further technical details
details of dotnet --info
.NET SDK: Version: 11.0.100-preview.3.26163.101 Commit: 1b989af698 Workload version: 11.0.100-manifests.b985b9e9 MSBuild version: 18.6.0-preview-26163-101+1b989af69
Runtime Environment:
OS Name: cachyos
OS Version:
OS Platform: Linux
RID: linux-x64
Base Path: /home/cert/.dotnet/sdk/11.0.100-preview.3.26163.101/
.NET workloads installed:
There are no installed workloads to display.
Configured to use workload sets when installing new manifests.
No workload sets are installed. Run "dotnet workload restore" to install a workload set.
Host:
Version: 11.0.0-preview.3.26163.101
Architecture: x64
Commit: 1b989af698
.NET SDKs installed:
8.0.419 [/home/cert/.dotnet/sdk]
9.0.312 [/home/cert/.dotnet/sdk]
10.0.300-preview.0.26163.113 [/home/cert/.dotnet/sdk]
11.0.100-preview.3.26163.101 [/home/cert/.dotnet/sdk]
.NET runtimes installed:
Microsoft.AspNetCore.App 8.0.25 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 9.0.14 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 10.0.3 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 11.0.0-preview.3.26163.101 [/home/cert/.dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 8.0.25 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 9.0.14 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 10.0.3 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 11.0.0-preview.3.26163.101 [/home/cert/.dotnet/shared/Microsoft.NETCore.App]
Other architectures found:
None
Environment variables:
DOTNET_ROOT [/home/cert/.dotnet]
global.json file:
Not found
Tested on .NET 8, .NET 9, .NET 10 and .NET 11
full log
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.