-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-5336: [C++] Implement arrow::Concatenate for dictionary-encoded arrays with unequal dictionaries #8984
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
83bd54e
7075b20
9fd5dbd
a3a478f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1367,6 +1367,12 @@ class ARROW_EXPORT DictionaryUnifier { | |
| /// after this is called | ||
| virtual Status GetResult(std::shared_ptr<DataType>* out_type, | ||
| std::shared_ptr<Array>* out_dict) = 0; | ||
|
|
||
| /// \brief Return a unified dictionary with the given index type. If | ||
| /// the index type is not large enough then an invalid status will be returned. | ||
| /// The unifier cannot be used after this is called | ||
| virtual Status GetResultWithIndexType(std::shared_ptr<DataType> index_type, | ||
|
||
| std::shared_ptr<Array>* out_dict) = 0; | ||
| }; | ||
|
|
||
| // ---------------------------------------------------------------------- | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rest of the concatenation functions simply memcpy'd the buffers. However, the dictionary concatenation needs to map buffers to potentially new index values. As a result, this function needs to know the type of the buffer for the reinterpret case on line 190. Also, the fact that memo table indices are 32 bit and dictionary indices could be 64 bit is a potential problem but one that already existed and it seems unlikely that a dictionary array would be used when there are 4B unique values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we are at it, can all non-public functions/classes in this module be put in the anonymous namespace? This reduces the number of exported symbols and can also open more optimization opportunities for the compiler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done