#2727 Removed unnecessary IsPrime function after expanding table.#6203
Conversation
| return (candidate == 2); | ||
| } | ||
|
|
||
| 1674319, 2009191, 2411033, 2893249, 3471899, 4166287, 4999559, 5999471, 7199369, 8639249, 10367101, |
There was a problem hiding this comment.
What algorithm/formula did you use to expand the table?
There was a problem hiding this comment.
I plotted the existing points in Excel. I noticed that, except for the first few points, the rest rose by a factor of 1.2 times. So I wrote a little program to compute them so they would match the same pattern.
- take the last prime, multiply it by 1.2
- find the next prime >= the number in computed in step 1.
- goto 1
There was a problem hiding this comment.
This is going to change allocations in certain cases. For example, before your change, adding 8M items to a HashSet<int> would result in the _slots and _buckets arrays ending at 11998949 in length, and after your change they'll be 12440537 in length (~400K elements each, ~4% increase). Whether that's good or bad for the scenarios that hit this, I don't know. I do like that we can simplify the code by expanding out the table, but we should understand the full ramifications of doing so before such a change is made.
|
I tried these values in August, and found that the collections tests were appreciably slower to run, and consistently so, which was enough to make me suspect this wasn't a win. |
|
In general I think this is a good change (because it makes the code simpler). I don't think the issue that we might use slightly different table sizes as Stephen notes is really a problem. The effects of any particular size number is small and will average out, and we should not get too hung up about that. @JonHanna's data is concerning (but also very suprising, I am very suspicious of some outside effect purturbig the results). Can you describe the data you collected in more detail? First I would expect NO change for any perf tests that used dictionary sizes less then 7MB, is that what you saw? Frankly I would also expect no interesting change event above this since the new algorithm is roughly like the old one and the performance of a dictionary is only very weakly affected by how the table is grown. So we should investigate if we have true negative data, but in the absence of that, this does not seem like a scary change at all. |
|
@JonHanna, I don't see how it's possible you got different results since these values in the table and the way they are used is EXACTLY the same for any hash tables of size 7199369 and below. |
|
It's perfectly possible that I was just unlucky. |
|
@JonHanna Micro-bench tests are indeed tricky. |
|
Yep, and that wasn't even a micro-bench really. I'm happy if someone says "well I compared the two here, and I don't know what Jon's talking about, they turn out the same" 😄 |
|
Test Innerloop CentOS7.1 Release Build and Test please |
#2727 Removed unnecessary IsPrime function after expanding table.
* Reuse HashHelpers for BinaryFormatter objectholder hashes * Revert "Merge pull request #6203 from SunnyWar/master" This reverts commit ddf8ca0, reversing changes made to 0a0ea7f. * Change resource string, make HashTable reuse existing HashHelper * Add comment describing hash number growth * Add hash number growth tests for BinaryFormatter & HashSet * Disable tests on x86 because of OOMs
dotnet/corefx#2727 Removed unnecessary IsPrime function after expanding table. Commit migrated from dotnet/corefx@ddf8ca0
…efx#25509) * Reuse HashHelpers for BinaryFormatter objectholder hashes * Revert "Merge pull request dotnet/corefx#6203 from SunnyWar/master" This reverts commit dotnet/corefx@ddf8ca0, reversing changes made to dotnet/corefx@0a0ea7f. * Change resource string, make HashTable reuse existing HashHelper * Add comment describing hash number growth * Add hash number growth tests for BinaryFormatter & HashSet * Disable tests on x86 because of OOMs Commit migrated from dotnet/corefx@b6b5982
The IsPrime function is only used when the table does not have enough values. This change expands the number of values up to the maximum and removes the, now unneeded, IsPrime function.
Suggest exploring alternative bucket sizes and ways to get the bucket size be deferred till another time.