Skip to content

Conversation

@mbrobbel
Copy link
Member

@mbrobbel mbrobbel commented Dec 2, 2021

This PR adds support for concatenation of union arrays.

For sparse union arrays this is trivial: the type buffers and child arrays are concatenated like the other concatenate implementations.
For dense union arrays the following approach is used:

  • The type buffers are concatenated.
  • The child arrays are concatenated, but slicing is ignored. This makes building the offsets buffer more simple.
  • For every input array the length of the different child arrays (concatenated up to that point) is tracked. By iterating over the types buffers these offsets can be applied to the values in the concatenated offsets buffer.

Does this make sense or should we slice child arrays (when required) and reflect this in the concatenated offsets buffer?

This PR also removes a check in DenseUnionArray::Make that rejected empty offsets buffers. This made it impossible to construct empty dense union arrays. I discussed removing this check with @bkietz.

@github-actions
Copy link

github-actions bot commented Dec 2, 2021

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky indeed. I think just concatenating the dense union children (instead of trying to slice out only those elements which are referenced by an offset) is fine.

{child_one_sliced, child_two_sliced}));
ASSERT_OK(expected_sliced->ValidateFull());
AssertArraysEqual(*expected_sliced, *concat_sliced_arrays);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also test concatenation of an array 1) which is not sliced, but whose children are sliced/have an offset? 2) which is sliced, whose children additionally have an offset?

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, thanks!

Copy link
Member

@bkietz bkietz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

CI failures seemed like unrelated flakes so I'm restarting those jobs to see if we can get to all-green

@bkietz
Copy link
Member

bkietz commented Dec 6, 2021

merging

@bkietz bkietz closed this in a93c493 Dec 6, 2021
@ursabot
Copy link

ursabot commented Dec 6, 2021

Benchmark runs are scheduled for baseline = e903a21 and contender = a93c493. a93c493 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.9% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.49% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants