Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

#2727 Removed unnecessary IsPrime function after expanding table.#6203

Merged
stephentoub merged 1 commit into
dotnet:masterfrom
SunnyWar:master
Feb 22, 2016
Merged

#2727 Removed unnecessary IsPrime function after expanding table.#6203
stephentoub merged 1 commit into
dotnet:masterfrom
SunnyWar:master

Conversation

@SunnyWar
Copy link
Copy Markdown
Contributor

The IsPrime function is only used when the table does not have enough values. This change expands the number of values up to the maximum and removes the, now unneeded, IsPrime function.

Suggest exploring alternative bucket sizes and ways to get the bucket size be deferred till another time.

return (candidate == 2);
}

1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369, 8639249, 10367101,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What algorithm/formula did you use to expand the table?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plotted the existing points in Excel. I noticed that, except for the first few points, the rest rose by a factor of 1.2 times. So I wrote a little program to compute them so they would match the same pattern.

  1. take the last prime, multiply it by 1.2
  2. find the next prime >= the number in computed in step 1.
  3. goto 1

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to change allocations in certain cases. For example, before your change, adding 8M items to a HashSet<int> would result in the _slots and _buckets arrays ending at 11998949 in length, and after your change they'll be 12440537 in length (~400K elements each, ~4% increase). Whether that's good or bad for the scenarios that hit this, I don't know. I do like that we can simplify the code by expanding out the table, but we should understand the full ramifications of doing so before such a change is made.

@weshaggard
Copy link
Copy Markdown
Member

cc @ellismg @vancem

@JonHanna
Copy link
Copy Markdown
Contributor

I tried these values in August, and found that the collections tests were appreciably slower to run, and consistently so, which was enough to make me suspect this wasn't a win.
What have you found in that regard?

@vancem
Copy link
Copy Markdown

vancem commented Feb 19, 2016

In general I think this is a good change (because it makes the code simpler). I don't think the issue that we might use slightly different table sizes as Stephen notes is really a problem. The effects of any particular size number is small and will average out, and we should not get too hung up about that.

@JonHanna's data is concerning (but also very suprising, I am very suspicious of some outside effect purturbig the results). Can you describe the data you collected in more detail? First I would expect NO change for any perf tests that used dictionary sizes less then 7MB, is that what you saw? Frankly I would also expect no interesting change event above this since the new algorithm is roughly like the old one and the performance of a dictionary is only very weakly affected by how the table is grown.

So we should investigate if we have true negative data, but in the absence of that, this does not seem like a scary change at all.

@SunnyWar
Copy link
Copy Markdown
Contributor Author

@JonHanna, I don't see how it's possible you got different results since these values in the table and the way they are used is EXACTLY the same for any hash tables of size 7199369 and below.

@JonHanna
Copy link
Copy Markdown
Contributor

It's perfectly possible that I was just unlucky.

@SunnyWar
Copy link
Copy Markdown
Contributor Author

@JonHanna Micro-bench tests are indeed tricky.

@JonHanna
Copy link
Copy Markdown
Contributor

Yep, and that wasn't even a micro-bench really. I'm happy if someone says "well I compared the two here, and I don't know what Jon's talking about, they turn out the same" 😄

@stephentoub
Copy link
Copy Markdown
Member

Test Innerloop CentOS7.1 Release Build and Test please
Test Innerloop CentOS7.1 Debug Build and Test please

stephentoub added a commit that referenced this pull request Feb 22, 2016
#2727 Removed unnecessary IsPrime function after expanding table.
@stephentoub stephentoub merged commit ddf8ca0 into dotnet:master Feb 22, 2016
@karelz karelz modified the milestone: 1.0.0-rtm Dec 3, 2016
ViktorHofer added a commit to ViktorHofer/corefx that referenced this pull request Nov 27, 2017
This reverts commit ddf8ca0, reversing
changes made to 0a0ea7f.
ViktorHofer added a commit to ViktorHofer/corefx that referenced this pull request Mar 21, 2018
This reverts commit ddf8ca0, reversing
changes made to 0a0ea7f.
ViktorHofer added a commit that referenced this pull request Mar 28, 2018
* Reuse HashHelpers for BinaryFormatter objectholder hashes

* Revert "Merge pull request #6203 from SunnyWar/master"

This reverts commit ddf8ca0, reversing
changes made to 0a0ea7f.

* Change resource string, make HashTable reuse existing HashHelper

* Add comment describing hash number growth

* Add hash number growth tests for BinaryFormatter & HashSet

* Disable tests on x86 because of OOMs
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
dotnet/corefx#2727 Removed unnecessary IsPrime function after expanding table.

Commit migrated from dotnet/corefx@ddf8ca0
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…efx#25509)

* Reuse HashHelpers for BinaryFormatter objectholder hashes

* Revert "Merge pull request dotnet/corefx#6203 from SunnyWar/master"

This reverts commit dotnet/corefx@ddf8ca0, reversing
changes made to dotnet/corefx@0a0ea7f.

* Change resource string, make HashTable reuse existing HashHelper

* Add comment describing hash number growth

* Add hash number growth tests for BinaryFormatter & HashSet

* Disable tests on x86 because of OOMs


Commit migrated from dotnet/corefx@b6b5982
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants