Skip to content

Make Prio queues more compact#118

Merged
treeowl merged 8 commits intolspitzner:masterfrom
treeowl:compact-prio
May 31, 2023
Merged

Make Prio queues more compact#118
treeowl merged 8 commits intolspitzner:masterfrom
treeowl:compact-prio

Conversation

@treeowl
Copy link
Copy Markdown
Collaborator

@treeowl treeowl commented May 4, 2023

Store the value associated with each key as its rightmost child,
which saves one word per element.

As a result, the binomial trees must become lazy, which should be
good for maps and lazy traversals. The down side is that we will
need tag checks to know that we have realized Succ constructors.

Benchmarks indicate this improves performance.

Closes #115.

@treeowl treeowl force-pushed the compact-prio branch 5 times, most recently from 5e6790d to 87bdf15 Compare May 4, 2023 22:07
@treeowl treeowl requested a review from konsumlamm May 4, 2023 22:08
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 4, 2023

@konsumlamm This is obviously not quite ready to actually merge, but before I make it so, I was hoping you'd give it a glance and see if it's something you'd be okay with if indeed it improves performance.

@treeowl treeowl force-pushed the compact-prio branch 4 times, most recently from c2df822 to 88c8c90 Compare May 4, 2023 22:25
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 4, 2023

Note: along with reducing the total queue size by $n$ words, this reduces the number of words allocated on insertion by $2$ (amortized) and the number of words allocated on deletion by $O(\log n)$, thanks to the slightly more compact Cons nodes.

Comment thread src/Data/PQueue/Min.hs
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 7, 2023

@konsumlamm I believe this is ready to go.

@treeowl treeowl force-pushed the compact-prio branch 3 times, most recently from e77449e to 11e01c4 Compare May 7, 2023 05:30
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 7, 2023

I guess there's still the question of whether to let maps be lazy (better for performance in general) or whether to make them stricter (maintaining theoretical worst-case bounds). I'd lean toward the former in this instance, but I hate having to decide.

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 15, 2023

@konsumlamm, this still awaits your comments.

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 21, 2023

@konsumlamm It's been two weeks now.

Copy link
Copy Markdown
Collaborator

@konsumlamm konsumlamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for taking so long to review, this is not a simple PR.

I find the Nattish approach a bit confusing, but the numbers speak for themselves. I'm actually surprised that the benchmarks improve so much though, all that really changed is Zero, isn't it? I'd expect data Zero k a = Zero to be a single global value, so why does this make such a big difference?

Did you benchmark the folds? They look like they might get slower with this change.

Comment thread src/Data/PQueue/Prio/Internals.hs
Comment thread src/Data/PQueue/Prio/Internals.hs
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 21, 2023

I don't remember the numbers I got from benchmarks. What did they look like to you? The space savings aren't in the Zero value itself; they're in reclaiming the space we previously used to point to it (all over the place) and using that space to store attached values instead.

@konsumlamm
Copy link
Copy Markdown
Collaborator

I don't remember the numbers I got from benchmarks. What did they look like to you?

The benchmarks take 10-20% less time for me.

The space savings aren't in the Zero value itself; they're in reclaiming the space we previously used to point to it (all over the place) and using that space to store attached values instead.

I see, so we save one word per tree. This is $\log n$ in total though, not $n$, as you said, right?

@konsumlamm
Copy link
Copy Markdown
Collaborator

Ok, the folds seem to improve as well.

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 21, 2023

I see, so we save one word per tree. This is log⁡n in total though, not n, as you said, right?

Incorrect. You can count out the Zeros in a tree of a given size to get a better understanding. Or you can just notice that we have enough room to fit all $n$ of the values where we used to have nullary Zeros.

@konsumlamm
Copy link
Copy Markdown
Collaborator

I see, so we save one word per tree. This is log⁡n in total though, not n, as you said, right?

Incorrect. You can count out the Zeros in a tree of a given size to get a better understanding. Or you can just notice that we have enough room to fit all n of the values where we used to have nullary Zeros.

Ah right, the Zeros are the leaves, not the roots, that's where my mistake was.

@konsumlamm
Copy link
Copy Markdown
Collaborator

I guess there's still the question of whether to let maps be lazy (better for performance in general) or whether to make them stricter (maintaining theoretical worst-case bounds). I'd lean toward the former in this instance, but I hate having to decide.

You mean lazy in the spine? What is the current situation? Does this PR change that?

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 21, 2023

What were off were my allocation calculations, which I believe I overstated. I'll have to recalculate those if anyone cares.

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 21, 2023

Lazy all through. If we want to be strict instead, I can do that, but the code will be more complicated and for the most part I expect slower.

@konsumlamm
Copy link
Copy Markdown
Collaborator

So this basically reverts #103? Did you change your mind on #100? I'd be fine with lazy maps, if you say that's more efficient.

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 21, 2023

Well ... for now it keeps the plain (non-Prio) ones the same, as the strict constructors are better there. It more than reverts that for Prio queues, since it extends laziness into the trees (not just the spine). It all falls out of the representation change, which leads to different trade-offs.

@konsumlamm
Copy link
Copy Markdown
Collaborator

I don't like having different laziness for Prio and non-Prio queues.

Can you explain why the new representation necessitates fully lazy maps? Why can't we just keep the old implementation?

I'm a bit worried that fully lazy maps might lead to situations where you build up a lot of computations which slow down a subsequent deleteMin, or will that not happen?

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 22, 2023

It doesn't necessitate fully lazy maps. Keeping them strict in the spine is easy, but not very useful for predictable performance. Keeping them entirely strict in the structure is quite possible too, but it's slightly trickier. That said, I just realized it should be somewhat less tricky with the Nattish maps than the old ones. We just need to stop the strict mapping at Succy Zeroy, and be lazy at Zeroy. It's still more code, but should be pretty readable. I'll give it a whirl and see how that goes.

@konsumlamm
Copy link
Copy Markdown
Collaborator

Keeping them strict in the spine is easy, but not very useful for predictable performance.

So why was that useful before, but not anymore?

@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 22, 2023

The old representation had entirely strict trees, which held lazy values, so calculating the spine strictly calculated everything (except the values) strictly. The new representation uses the same fields for values as for lists of trees, so those fields must be lazy. To calculate everything but the values strictly, we must calculate strictly down to Succy Zeroy, then calculate lazily. I suspect this won't be a big deal. With the old auxiliary class approach, it would've required an extra "layer" of computational structure, but I don't think we'll need that now.

Store the value associated with each key as its rightmost child,
which saves one word per element.

As a result, the binomial trees must become lazy, which should be
good for maps and lazy traversals. The down side is that we will
need tag checks to know that we have realized `Succ` constructors.

Benchmarking suggests this implementation is substantially faster
than the previous one.
treeowl added 2 commits May 22, 2023 22:22
* Make maps and unordered traversals build the structure
  eagerly again, and restore the key strictness of `mapKeysMonotonic`.

* Fix the documentation of `mapKeysMonotonic`. The given function need
  only be weakly monotonic; strict monotonicity is not required.
treeowl added 2 commits May 22, 2023 22:44
We no longer require the function to be *strictly* monotonic, so
we should test with weakly monotonic functions.
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 23, 2023

@konsumlamm I think maps should be back to the way they were now.

treeowl added 3 commits May 23, 2023 18:18
Document `mapU` and strengthen its test.
Copy link
Copy Markdown
Collaborator

@konsumlamm konsumlamm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

Comment thread src/Data/PQueue/Prio/Internals.hs
@treeowl
Copy link
Copy Markdown
Collaborator Author

treeowl commented May 31, 2023

Sorry, I double confused myself. The allocation reduction is as big as I thought.

@treeowl treeowl merged commit e4fbf94 into lspitzner:master May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

More compact Prio queues

2 participants