Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Reuse HashHelpers for BinaryFormatter objectholder hashes#25509

Merged
ViktorHofer merged 6 commits into
dotnet:masterfrom
ViktorHofer:SerializationHolderHash
Mar 28, 2018
Merged

Reuse HashHelpers for BinaryFormatter objectholder hashes#25509
ViktorHofer merged 6 commits into
dotnet:masterfrom
ViktorHofer:SerializationHolderHash

Conversation

@ViktorHofer
Copy link
Copy Markdown
Member

@ViktorHofer ViktorHofer commented Nov 26, 2017

Fixes https://github.com/dotnet/corefx/issues/25533
Fixes https://github.com/dotnet/corefx/issues/24902

With #17949 we increased the MaxArrazSize of ObjectHolders. Because of that change our largest prime number 6584983 can now be smaller than the max size of an array. By @danmosemsft suggestion I'm reusing the HashHelpers utility class to utilize larger pre-computed prime numbers and remove the logic for calculating the next optimal prime number.
This also changes the start prime number size as before it was 5 and now it is 3. When I'm off plane I will add benchmarks with performance numbers. A quick test didn't show any noticeable performance regressions.

@Alois-xx I'm marking issue #24902 as fixed. As @stephentoub already mentioned we brought back BinaryFormatter for a smoother transition from .NET Framework to .NET Core and don't intend to optimize it further. If you want to, you can work by yourself on changing the data structure to remove the necessity of the pre-computed prime numbers and therefore the virtual limit. That would require extensive tests to be confident that we don't break or regress existing code.

cc @joperezr

{
if (min < 0)
throw new ArgumentException(SR.Arg_HTCapacityOverflow);
Contract.EndContractBlock();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo below
// Allow the hashtables to grow to maximum possible size (~2G elements) before encoutering capacity overflow.

@danmoseley
Copy link
Copy Markdown
Member

There is another duplicate here:
https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Extensions/src/System/Collections/Hashtable.cs#L1655
It can also be replaced with the CoreFX one that has the larger precomputed list. It has a field SerializationInfoTable that it doesn't use itself that would be moved onto Hashtable proper.

@danmoseley
Copy link
Copy Markdown
Member

danmoseley commented Nov 26, 2017

I see that https://github.com/dotnet/corefx/blob/master/src/System.Private.DataContractSerialization/src/System/Runtime/Serialization/ObjectToIdCache.cs could potentially use HashHelpers also. Not suggesting you do this now.

Correction: no it couldn't, as ObjecttoIDCache uses a sequence of primes that are no less than 2x the previous one, whereas HashHelpers uses primes that are no less than 1.2x the previous one. I see GCC uses 2x. Wonder what the history is there.

@danmoseley
Copy link
Copy Markdown
Member

Interestingly ObjecttoIdCache.cs will go beyond the largest prime that fits in an int (0x7FEFFFFD) and just use a non prime (0x7FEFFFFF). It says that's the largest possible array size, although I would have thought that was ‭0x7FFFFFFF‬. Maybe there's some overhead. This assumes the hashtables would still work (albeit inefficiently) with a non prime and not eg loop forever.

**
===========================================================*/

using System;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can remove the comment above wrapped in ==========

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Nov 26, 2017

largest possible array size, although I would have thought that was ‭0x7FFFFFFF

The largest possible array size in both .NET Framework and .NET Core is 0X7FEFFFFF (https://docs.microsoft.com/en-us/dotnet/framework/configure-apps/file-schema/runtime/gcallowverylargeobjects-element).

@JonHanna
Copy link
Copy Markdown
Contributor

Largest array size is 0x7FFFFFC7 for 1-byte sized value-type elements and 0X7FEFFFFF otherwise. If you are looking to ObjectToIdCache.GetPrime for guidance it's worth noting that for that case growth is always incremental without deletions, and one at a time. This means that for it a policy of doubling on growth and finding the next available prime can be plotted entirely without waste. For a structure that can have deletions and/or have growth of a predetermined amount other than one at a time the best table would have more intermediate values to catch a wider range of best-fits.

@stephentoub
Copy link
Copy Markdown
Member

</resheader>
<data name="Arg_HTCapacityOverflow" xml:space="preserve">
<value>Hashtable's capacity overflowed and went negative. Check load factor, capacity and the current size of the table.</value>
</data>
Copy link
Copy Markdown
Member

@stephentoub stephentoub Nov 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is an error in HashHelpers referring to Hashtable? If that were to actually be hit, it would have no relevance to BinaryFormatter.

@ViktorHofer ViktorHofer force-pushed the SerializationHolderHash branch 2 times, most recently from 8d4979e to ce441ee Compare November 27, 2017 23:16
@@ -40,24 +30,41 @@ internal static class HashHelpers
1103, 1327, 1597, 1931, 2333, 2801, 3371, 4049, 4861, 5839, 7013, 8419, 10103, 12143, 14591,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a comment explaining why we don't just grow the table... since apparently it wasn't obvious to us

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added.

<value>System.Resources.ResXResourceWriter, System.Windows.Forms, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089</value>
</resheader>
<data name="Arg_HTCapacityOverflow" xml:space="preserve">
<value>Capacity overflowed and went negative.</value>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like "Cannot add more than {0} objects." might be more useful/user oriented? They don't care whether we use signed ints.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't address this one as we now should never hit this message unless the array entirely overflows.

@stephentoub
Copy link
Copy Markdown
Member

stephentoub commented Dec 5, 2017

Tests (in particular if this is fixing https://github.com/dotnet/corefx/issues/24902)?

@ViktorHofer
Copy link
Copy Markdown
Member Author

I'm waiting for a decision being made in https://github.com/dotnet/corefx/issues/25533. Precisely speaking, if we should always compute prime numbers and completely get rid of the pre-computed ones. I might benchmark it both ways to get the discussion going again.

@jkotas
Copy link
Copy Markdown
Member

jkotas commented Dec 5, 2017

if we should always compute prime numbers

I expect that it will be too slow to compute the small prime numbers compared to the table lookup. I think the table make sense for small numbers.

@danmoseley
Copy link
Copy Markdown
Member

Overlap with dotnet/coreclr#15453

@karelz
Copy link
Copy Markdown
Member

karelz commented Jan 18, 2018

@ViktorHofer if we're blocked, would it make sense to close the PR until it is unblocked?

@ViktorHofer
Copy link
Copy Markdown
Member Author

I don't think we are blocked here. We haven't found a consensus yet. But I agree, the discussion seems to be stall

@karelz
Copy link
Copy Markdown
Member

karelz commented Jan 18, 2018

If we can't each consensus in next few days, I would recommend to bring the discussion into an issue/email and close the PR, until we know what to do.

@karelz
Copy link
Copy Markdown
Member

karelz commented Jan 27, 2018

Closing, please reopen when consensus is reached.

@karelz karelz closed this Jan 27, 2018
@karelz karelz added this to the 2.1.0 milestone Feb 4, 2018
@ViktorHofer ViktorHofer reopened this Mar 21, 2018
This reverts commit ddf8ca0, reversing
changes made to 0a0ea7f.
@ViktorHofer ViktorHofer force-pushed the SerializationHolderHash branch from ce441ee to eed669a Compare March 21, 2018 22:20
@ViktorHofer
Copy link
Copy Markdown
Member Author

Reopening to finally fix the remaining issues that are tagged with 2.1.

@ViktorHofer ViktorHofer force-pushed the SerializationHolderHash branch from eed669a to 06daf0e Compare March 22, 2018 15:02
@joperezr
Copy link
Copy Markdown
Member

@ViktorHofer is this one ready to go now?

@ViktorHofer
Copy link
Copy Markdown
Member Author

Working on this right now.

@ViktorHofer
Copy link
Copy Markdown
Member Author

@stephentoub > Tests (in particular if this is fixing #24902)?

Just added relevant ones.

@stephentoub @jkotas PTAL

@ViktorHofer ViktorHofer force-pushed the SerializationHolderHash branch from fb7c2c9 to af85154 Compare March 28, 2018 17:29
@ViktorHofer
Copy link
Copy Markdown
Member Author

Gosh... We have the same implementation in CoreLib shared.

private Object _syncRoot;

private static ConditionalWeakTable<object, SerializationInfo> s_serializationInfoTable;
private static ConditionalWeakTable<object, SerializationInfo> SerializationInfoTable => LazyInitializer.EnsureInitialized(ref s_serializationInfoTable);
Copy link
Copy Markdown
Member Author

@ViktorHofer ViktorHofer Mar 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CoreLib we use Interlocked.CompareExchange instead. @jkotas any preference here?

internal static ConditionalWeakTable<object, SerializationInfo> SerializationInfoTable
        {
            get
            {
                if (s_serializationInfoTable == null)
                    Interlocked.CompareExchange(ref s_serializationInfoTable, new ConditionalWeakTable<object, SerializationInfo>(), null);

                return s_serializationInfoTable;
            }
        }

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have preference.

LazyInitializer.EnsureInitialized is convenience helper. It looks nicer, but it results into bigger slower code.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks :) I'm fine with using LazyInitializer here as the serialization code path isn't highly perf related.

@ViktorHofer
Copy link
Copy Markdown
Member Author

This is failing on x86 Win machines with an OutOfMemoryException.

@ViktorHofer ViktorHofer merged commit b6b5982 into dotnet:master Mar 28, 2018
@ViktorHofer ViktorHofer deleted the SerializationHolderHash branch March 28, 2018 22:38
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…efx#25509)

* Reuse HashHelpers for BinaryFormatter objectholder hashes

* Revert "Merge pull request dotnet/corefx#6203 from SunnyWar/master"

This reverts commit dotnet/corefx@ddf8ca0, reversing
changes made to dotnet/corefx@0a0ea7f.

* Change resource string, make HashTable reuse existing HashHelper

* Add comment describing hash number growth

* Add hash number growth tests for BinaryFormatter & HashSet

* Disable tests on x86 because of OOMs


Commit migrated from dotnet/corefx@b6b5982
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Right sizing HashSet ctor is allocating too many elements Remove Artificial Object Graph Limit in BinaryFormatter

7 participants