-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-8866: [C++] Split UNION into SPARSE_UNION and DENSE_UNION #7378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
652e5fb to
bb5ce46
Compare
|
Ah wish I had synced up with you about compute/kernels/vector_selection_internal.h, hope that didn't take up too much of your time. I've deleted all the Take-related union code for now since I'm revamping the entire approach to performing takes. Don't worry about it -- I will rebase my patch since this one should probably go in first. |
|
@wesm no problem it didn't take much time. Sparse and Dense were basically distinct code paths already in vector_selection_internal. No need to rebase yours; rebasing this one across that file shouldn't be a problem |
8ec562a to
5eb29ef
Compare
|
@bkietz OK, I likely won't be done with what I'm doing until tomorrow. To be clear I am removing Union take functionality altogether until it can be adapted to the new approach at a later time (and disabling the existing unit tests). This is such a seldom traveled path that I don't feel it's worth investing the effort right now. |
5eb29ef to
290ad5b
Compare
|
My patch #7382 should not conflict with this so no need to worry about rebasing |
| const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); } | ||
|
|
||
| protected: | ||
| static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why you made this protected? The byte width may be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not used anywhere except in this class. If needed it can easily be reintroduced but it seemed better to minimize the public API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, it was public so could be used by anybody really. I'm ok with not making the public API too plethoric but this seems a rather useful function (and trivial to maintain).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think this change belongs in this PR
cpp/src/arrow/type_fwd.h
Outdated
| @@ -351,39 +358,21 @@ std::shared_ptr<DataType> ARROW_EXPORT time64(TimeUnit::type unit); | |||
| std::shared_ptr<DataType> ARROW_EXPORT | |||
| struct_(const std::vector<std::shared_ptr<Field>>& fields); | |||
|
|
|||
| /// \brief Create a UnionType instance | |||
| std::shared_ptr<DataType> ARROW_EXPORT | |||
| union_(const std::vector<std::shared_ptr<Field>>& child_fields, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep this factory function, for compatibility and for convenience. UnionMode still exists, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnionMode still exists but I've already factored out usage of union_. If we need it later it'll be straightforward to add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is more of a "don't break third-party code" concern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quickly restoring backward compatibility functions
cpp/src/arrow/type.h
Outdated
| static constexpr int8_t kMaxTypeCode = 127; | ||
| static constexpr int kInvalidChildId = -1; | ||
|
|
||
| static constexpr const char* type_name() { return "union"; } | ||
|
|
||
| UnionType(const std::vector<std::shared_ptr<Field>>& fields, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure it's a good idea to remove these constructors, while external code may rely on them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean; UnionType is an abstract base class now and should never be constructed except as a base object of SparseUnionType or DenseUnionType. If we want to maintain compatibility of code which constructs UnionType directly then we'll need to have a single concrete UnionType which corresponds to both Type::SPARSE_UNION and Type::DENSE_UNION. That's possible but very much not our pattern with DataType subclasses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. At least keep the static factory function then?
|
@bkietz can you rebase and address @pitrou's comments tomorrow? This is going to collide with ARROW-9075 so would prefer that this go in first and then I can rebase my patch on that |
10a6b31 to
f1612d0
Compare
wesm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. Awaiting CI
|
@wesm thanks, I've been trying to get those to pass locally. |
|
Sorry about the noise I think I've got it now. We might need to create an "arrow-deprecated-test" at some point where we verify that deprecated APIs still work as advertised |
|
Appveyor was passing two commits ago: https://ci.appveyor.com/project/BenjaminKietzman/arrow/builds/33489614. The Ursabot CI failure looks transient. I'll go ahead and merge this so I can rebase what I'm working on |
|
thanks @bkietz! |
Also splits
UnionType -> SparseUnionType and DenseUnionTypeand similar forUnionArray,UnionScalar.SparseUnionArrayno longer includes the unused offsets buffer