ARROW-4975: [C++] Support concatenation of UnionArrays #11843

mbrobbel · 2021-12-02T21:24:25Z

This PR adds support for concatenation of union arrays.

For sparse union arrays this is trivial: the type buffers and child arrays are concatenated like the other concatenate implementations.
For dense union arrays the following approach is used:

The type buffers are concatenated.
The child arrays are concatenated, but slicing is ignored. This makes building the offsets buffer more simple.
For every input array the length of the different child arrays (concatenated up to that point) is tracked. By iterating over the types buffers these offsets can be applied to the values in the concatenated offsets buffer.

Does this make sense or should we slice child arrays (when required) and reflect this in the concatenated offsets buffer?

This PR also removes a check in DenseUnionArray::Make that rejected empty offsets buffers. This made it impossible to construct empty dense union arrays. I discussed removing this check with @bkietz.

github-actions · 2021-12-02T21:24:45Z

https://issues.apache.org/jira/browse/ARROW-4975

lidavidm

Tricky indeed. I think just concatenating the dense union children (instead of trying to slice out only those elements which are referenced by an offset) is fine.

cpp/src/arrow/array/concatenate_test.cc

lidavidm · 2021-12-03T15:00:20Z

cpp/src/arrow/array/concatenate_test.cc

+                                             {child_one_sliced, child_two_sliced}));
+  ASSERT_OK(expected_sliced->ValidateFull());
+  AssertArraysEqual(*expected_sliced, *concat_sliced_arrays);
+}


Can we also test concatenation of an array 1) which is not sliced, but whose children are sliced/have an offset? 2) which is sliced, whose children additionally have an offset?

bkietz

This is looking great, thanks!

cpp/src/arrow/array/array_nested.cc

cpp/src/arrow/array/concatenate.cc

cpp/src/arrow/array/concatenate_test.cc

This makes it more readable. Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>

… 0..n

bkietz

LGTM

CI failures seemed like unrelated flakes so I'm restarting those jobs to see if we can get to all-green

bkietz · 2021-12-06T15:56:53Z

merging

ursabot · 2021-12-06T16:00:59Z

Benchmark runs are scheduled for baseline = e903a21 and contender = a93c493. a93c493 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Finished ⬇️0.9% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.49% ⬆️0.09%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Support concatenation of UnionArrays

099c645

github-actions bot added the Component: C++ label Dec 2, 2021

Handle child array length overflow

15204d4

lidavidm approved these changes Dec 3, 2021

View reviewed changes

bkietz requested changes Dec 3, 2021

View reviewed changes

cpp/src/arrow/array/array_nested.cc Show resolved Hide resolved

cpp/src/arrow/array/concatenate.cc Outdated Show resolved Hide resolved

cpp/src/arrow/array/concatenate_test.cc Outdated Show resolved Hide resolved

mbrobbel and others added 4 commits December 6, 2021 10:12

Use ArrayFromJSON for dense union concatenate test

4473426

This makes it more readable. Co-authored-by: Benjamin Kietzman <bengilgit@gmail.com>

Use ArrayOf for sparse union array concatenate test

57ce777

Add more tests for concatenation of dense union arrays

cff97cb

Handle concatenation of dense union arrays with type codes other than…

a269df9

… 0..n

bkietz approved these changes Dec 6, 2021

View reviewed changes

bkietz closed this in a93c493 Dec 6, 2021

asfimport mentioned this pull request Dec 6, 2021

[C++] Support concatenation of UnionArrays #16035

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARROW-4975: [C++] Support concatenation of UnionArrays #11843

ARROW-4975: [C++] Support concatenation of UnionArrays #11843

Uh oh!

mbrobbel commented Dec 2, 2021

Uh oh!

github-actions bot commented Dec 2, 2021

Uh oh!

lidavidm left a comment

Uh oh!

Uh oh!

lidavidm Dec 3, 2021

Uh oh!

bkietz left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bkietz left a comment

Uh oh!

bkietz commented Dec 6, 2021

Uh oh!

ursabot commented Dec 6, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ARROW-4975: [C++] Support concatenation of UnionArrays #11843

ARROW-4975: [C++] Support concatenation of UnionArrays #11843

Uh oh!

Conversation

mbrobbel commented Dec 2, 2021

Uh oh!

github-actions bot commented Dec 2, 2021

Uh oh!

lidavidm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

lidavidm Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bkietz left a comment

Choose a reason for hiding this comment

Uh oh!

bkietz commented Dec 6, 2021

Uh oh!

ursabot commented Dec 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ursabot commented Dec 6, 2021 •

edited

Loading