fix thread safety issue with nested column global dictionaries#13265
Merged
clintropolis merged 3 commits intoapache:masterfrom Oct 28, 2022
Merged
Conversation
clintropolis
commented
Oct 26, 2022
imply-cheddar
approved these changes
Oct 28, 2022
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Fixes a pretty good blunder I made which resulted in the
FixedIndexedimplementations used by the nested columns not be thread safe, leading to undefined behavior when multiple readers are accessing the nested column global dictionaries at the same time. My intention was thatFixedIndexedwas using the positional read methods ofTypeStrategy, so it should be thread-safe, however, theTypeStrategyemployed here were not actually overriding these methods, so it was falling back to a positional read that sets the buffer position, reads, and then resets to the original position, which is most unchill in this scenario.In addition to implementing these methods for long/double/int strategies being used here, I also made a change to switch nested columns to use
FixedIndexedin a supplier, to save heap footprint per #12277 (comment). This change also would have made it thread safe had I not overridden these methods ofTypeStrategysince the lazy creation ofFixedIndexednow happens per thread.I added a more direct test on
NestedDataColumnSupplier, which was prior to this PR only tested indirectly through native and SQL query tests, to perform basic selector and bitmap index operations, and included a concurrent read test. The concurrency test failed consistently prior to the changes in this PR, but i ran on repeat ~200 times after the changes with no failures.Release note
Fixes a bug with concurrent reads of nested columns with numeric global value dictionaries which could lead to undefined behavior when returning or filtering on nested numeric values.
This PR has: