-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-16713: [C++] Pull join accumulation outside of HashJoinImpl #13332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0ffad81 to
9810c0d
Compare
|
Looks good to me. I have only a few comments:
|
|
|
Re AccumulationQueue: then it becomes just a vector, and maybe doesn't need promoting it to a separate class in a separate file |
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. A few minor nits and some questions but otherwise I think this helps clear things up.
This is true. The only difference I see between the accumulation queue and a vector of batches at the moment is |
|
|
7cf02ea to
709084c
Compare
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor nits but this seems ready. Thanks for doing this, it's a good cleanup, above and beyond it's potential utility for spillover. I find this easier to follow than the old implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to get some consistency with how we use int64_t, uint64_t, and size_t within the engine. Do you have any convention suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say we should probably be using size_t for most things that are non-negative. That guy on the mailing list had an epic struggle building for 32-bit because we were using int64_t where we should've used size_t.
BTW here I use size_t because that's what std::vector::size() returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think row_count can never be nagative either, so it should be a size_t. ExecBatch::length should probably be size_t as well.
5ec4ff5 to
b43cada
Compare
|
The only failures seem to be with S3FS (unrelated to hash join completely) and the tracing span thingy which is being addressed in #13108 |
|
I wish I knew why this PR triggers the span thing to flare up. It doesn't seem that any of the code you have is touching spans. I suspect it is something rather subtle. At the moment appveyor passes on other PRs so if I merge this it will break things, even if there is nothing in this PR that is related. I've proposed a solution on #13108 that should work. So hopefully we can get that merged pretty quick. Otherwise we'll have to figure out what change here is causing the span bug to trigger. |
b43cada to
8d29ddb
Compare
8d29ddb to
790089a
Compare
westonpace
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that the span issue is resolved it seems MSVC is happy so let's merge.
No description provided.