-
Notifications
You must be signed in to change notification settings - Fork 846
[CompilerPerf] Changed Map.count + Set.count from O(n) to O(1) (AVL logic based on size not height) #5365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This gist was about the same: https://gist.github.com/manofstick/11b5ac3c3cf993ce32e69d04a6549161 This gist was slightly better - but less than 5% across the board (but possibly performance improvement is being masked by the lack of #5307 - as much more time than necessary will be lost in https://gist.github.com/manofstick/275fe8ed62091aec52cd382548719f2a And this gist was consistently better with the new balancing https://gist.github.com/manofstick/e97dc9775bf01fd22b2f238cac9f1c27
|
|
can you please apply the same to set.fs and TaggedCollections.fs (2 times) |
| let empty = MapEmpty | ||
|
|
||
| let height = function | ||
| let size = function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please inline size function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No really keen to inline whilst MapOne exists. size as it stands has to do two attempted types casts which easily outweight a function call.
|
Gist for quick Set check: https://gist.github.com/manofstick/b25efcf69fb7791357918d5a780ac438 Seems to be ~10% faster in this gist... (create + union + element check) . But haven't done exhaustive testings...
|
|
You can see in |
|
yes setone and mapone can probably go as well. but please do in separate PR. it will keep things easier for VF# team to accept |
|
@forki yes, yes, I just meant I would push the |
|
@manofstick Just curious about this statement:
I don't think this is true, as the current implementation re-balances once the height of one sub-tree is 2 higher than the other. This implementation re-balances after it's twice as high. I'm not familiar with the performance of AVL trees beyond what I learned in college, so I don't know what the long-term ramifications of this would be. But I'm certainly not opposed to basing |
|
@cartemp Yes, I don't mean the same trees. That was stated. I was probably a bit strong without the word equivalent, but I was meaning computational complexity. Still getting balanced binary trees that are still created using the same AVL transforms. But yes the rest is a bit hand wavey! (I've actually sent it to a mate at University of British Columbia to do an analysis of, but he's usually pretty busy. But we'll see... At the moment I'm trusting intuition and tests) |
|
I wonder if we now rebalance more often.
Paul Westcott <notifications@github.com> schrieb am Di., 24. Juli 2018,
21:59:
… @cartemp
Yes, I don't mean the same trees. That was stated. I was probably a bit
strong without the word equivalent, but I was meaning computational
complexity. Still getting balanced binary trees that are still created
using the same AVL transforms. But yes the rest is a bit hand wavey! (I've
actually sent it to a mate at University of Vancouver to do an analysis of,
but he's usually pretty busy. But we'll see... At the moment I'm trusting
intuition and tests)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5365 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AADgNCLHLIB4EOBtqXARpxP3lurOogBKks5uJ3y6gaJpZM4VZ5Vk>
.
|
|
It's possible. I haven't run the numbers. Anyway am seeing a non-insignificant performance improvement for "smallish" (thousands) of node trees - so it's possible that it's just doing better rebalancing. Anyway, this is why I'm running random and sequential data in tests in case there are degenerate sequences. ...nd I'm willing to accept that in 1962 when AVL trees were first described, they were more interested in saving memory, and so the cost of carrying the size with each node would of been decadent. But considering we were already carrying a Anyway, this branch only has this change in it, so it can be tested in isolation... |
|
Another test: https://gist.github.com/manofstick/f285aa7b16025aabd4f60a4b8413ab81
|
|
what about very large size? are we now restricting the count since we track the size and not height as an int? |
|
The first gist mentioned in #5365 (comment) deals with large n, where it seems to return to about the same performance. |
|
@forki - but I'll create some more tests over the days ahead... |
|
didn't mean perf - I meant are getting issues with the count of elements when the size integer overflows? In theory the height overflows much later. |
|
@forki - yes. Could be a showstopper? |
|
dunno. Not even sure if it is a real problem or just imaginary |
|
This data structure looks like a Weight-balanced tree. @forki It seems that the other Weight-balanced tree implementations solved it by using the logarithmic of the size: "In order to ensure performance, the algorithm keeps the height of a tree How the balancing implemented? "A weight-balanced tree (WBT) is a binary search tree, whose balance is based on the sizes [1] https://yoichihirai.com/bst.pdf |
|
Closing this. Was always kind of a side thing. Better to follow the path to #5463 |
Splitting #5360.
Currently Map stores the height of each node, and uses that to determine if it should rebalance itself. This change stores the count of values instead of the height - and rebalances a node when one side is over twice the size of the other - I believe algorithmically equivalent - although in practice it may slightly change performance, as for particular cases a better/worse tree could be in an inner loop. Intuitively I think this should build better trees with more information, but I haven't attempted to mathematically prove this.
This makes Map.count an O(1) operation rather than an O(n) operation.