-
Notifications
You must be signed in to change notification settings - Fork 4k
ARROW-6861: [C++] Fix length/null_count/capacity accounting through Reset and AppendIndices in DictionaryBuilder #5643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
cpp/src/arrow/array/builder_base.h
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to provide some documentation for how/when this method should be used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Ideally we should be able to avoid having this new method?
cpp/src/arrow/array_dict_test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not disable this with
| // TEST(TestStringDictionaryBuilder, BigDeltaDictionary) { | |
| // TEST(DISABLED_TestStringDictionaryBuilder, BigDeltaDictionary) { |
|
I think if this is disabling the current delta support then this shouldn't go into 0.15.1. Is there a way to solve the Parquet issue without touching delta support (which can be done in another PR for a later version)? For example by adding e.g. a |
|
@pitrou I just pushed a much simpler fix that does not touch the dictionary delta logic. I'll rebase a patch I have done for ARROW-6869 after this is merged into master |
pitrou
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the principle, just one suggestion in the tests.
cpp/src/arrow/array_dict_test.cc
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you try an actual re-use of the builder below? In case some internal state (e.g. the memo) is not properly reinitialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, no problem.
|
+1. Merging this |
…eset and AppendIndices in DictionaryBuilder Some more work is needed in ARROW-6869 to move the memo table resetting logic out of `DictionaryBuilder::Reset` but we can leave that for 1.0.0 work Closes apache#5643 from wesm/ARROW-6861 and squashes the following commits: 9a19773 <Wes McKinney> Test using dictionary builder again after finish 890a6dc <Wes McKinney> Simpler fix for ARROW-6861 Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>
…eset and AppendIndices in DictionaryBuilder Some more work is needed in ARROW-6869 to move the memo table resetting logic out of `DictionaryBuilder::Reset` but we can leave that for 1.0.0 work Closes apache#5643 from wesm/ARROW-6861 and squashes the following commits: 9a19773 <Wes McKinney> Test using dictionary builder again after finish 890a6dc <Wes McKinney> Simpler fix for ARROW-6861 Authored-by: Wes McKinney <wesm+git@apache.org> Signed-off-by: Wes McKinney <wesm+git@apache.org>
Some more work is needed in ARROW-6869 to move the memo table resetting logic out of
DictionaryBuilder::Resetbut we can leave that for 1.0.0 work