Skip to content

Conversation

@bkietz
Copy link
Member

@bkietz bkietz commented Jun 8, 2020

Also splits UnionType -> SparseUnionType and DenseUnionType and similar for UnionArray, UnionScalar.

SparseUnionArray no longer includes the unused offsets buffer

@bkietz bkietz force-pushed the 8866-Split-TypeUNION-into-Type branch from 652e5fb to bb5ce46 Compare June 8, 2020 19:01
@github-actions
Copy link

github-actions bot commented Jun 8, 2020

@wesm
Copy link
Member

wesm commented Jun 8, 2020

Ah wish I had synced up with you about compute/kernels/vector_selection_internal.h, hope that didn't take up too much of your time. I've deleted all the Take-related union code for now since I'm revamping the entire approach to performing takes. Don't worry about it -- I will rebase my patch since this one should probably go in first.

@bkietz
Copy link
Member Author

bkietz commented Jun 8, 2020

@wesm no problem it didn't take much time. Sparse and Dense were basically distinct code paths already in vector_selection_internal. No need to rebase yours; rebasing this one across that file shouldn't be a problem

@bkietz bkietz force-pushed the 8866-Split-TypeUNION-into-Type branch from 8ec562a to 5eb29ef Compare June 8, 2020 21:40
@wesm
Copy link
Member

wesm commented Jun 9, 2020

@bkietz OK, I likely won't be done with what I'm doing until tomorrow. To be clear I am removing Union take functionality altogether until it can be adapted to the new approach at a later time (and disabling the existing unit tests). This is such a seldom traveled path that I don't feel it's worth investing the effort right now.

@bkietz bkietz force-pushed the 8866-Split-TypeUNION-into-Type branch from 5eb29ef to 290ad5b Compare June 9, 2020 16:16
@wesm
Copy link
Member

wesm commented Jun 9, 2020

My patch #7382 should not conflict with this so no need to worry about rebasing

const uint8_t* raw_values() const { return raw_values_ + data_->offset * byte_width(); }

protected:
static constexpr int32_t byte_width() { return sizeof(TypeClass::DayMilliseconds); }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you made this protected? The byte width may be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used anywhere except in this class. If needed it can easily be reintroduced but it seemed better to minimize the public API

Copy link
Member

@pitrou pitrou Jun 12, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it was public so could be used by anybody really. I'm ok with not making the public API too plethoric but this seems a rather useful function (and trivial to maintain).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also don't think this change belongs in this PR

@@ -351,39 +358,21 @@ std::shared_ptr<DataType> ARROW_EXPORT time64(TimeUnit::type unit);
std::shared_ptr<DataType> ARROW_EXPORT
struct_(const std::vector<std::shared_ptr<Field>>& fields);

/// \brief Create a UnionType instance
std::shared_ptr<DataType> ARROW_EXPORT
union_(const std::vector<std::shared_ptr<Field>>& child_fields,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this factory function, for compatibility and for convenience. UnionMode still exists, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UnionMode still exists but I've already factored out usage of union_. If we need it later it'll be straightforward to add

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is more of a "don't break third-party code" concern.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quickly restoring backward compatibility functions

static constexpr int8_t kMaxTypeCode = 127;
static constexpr int kInvalidChildId = -1;

static constexpr const char* type_name() { return "union"; }

UnionType(const std::vector<std::shared_ptr<Field>>& fields,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it's a good idea to remove these constructors, while external code may rely on them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean; UnionType is an abstract base class now and should never be constructed except as a base object of SparseUnionType or DenseUnionType. If we want to maintain compatibility of code which constructs UnionType directly then we'll need to have a single concrete UnionType which corresponds to both Type::SPARSE_UNION and Type::DENSE_UNION. That's possible but very much not our pattern with DataType subclasses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. At least keep the static factory function then?

@wesm
Copy link
Member

wesm commented Jun 12, 2020

@bkietz can you rebase and address @pitrou's comments tomorrow? This is going to collide with ARROW-9075 so would prefer that this go in first and then I can rebase my patch on that

@bkietz bkietz force-pushed the 8866-Split-TypeUNION-into-Type branch from 10a6b31 to f1612d0 Compare June 12, 2020 15:15
Copy link
Member

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1. Awaiting CI

@bkietz
Copy link
Member Author

bkietz commented Jun 12, 2020

@wesm thanks, I've been trying to get those to pass locally.

@wesm
Copy link
Member

wesm commented Jun 12, 2020

Sorry about the noise I think I've got it now. We might need to create an "arrow-deprecated-test" at some point where we verify that deprecated APIs still work as advertised

@wesm
Copy link
Member

wesm commented Jun 12, 2020

Appveyor was passing two commits ago: https://ci.appveyor.com/project/BenjaminKietzman/arrow/builds/33489614. The Ursabot CI failure looks transient. I'll go ahead and merge this so I can rebase what I'm working on

@wesm wesm closed this in 89cf7bd Jun 12, 2020
@wesm
Copy link
Member

wesm commented Jun 12, 2020

thanks @bkietz!

@bkietz bkietz deleted the 8866-Split-TypeUNION-into-Type branch February 25, 2021 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants