Skip to content

fix thread safety issue with nested column global dictionaries#13265

Merged
clintropolis merged 3 commits intoapache:masterfrom
clintropolis:nested-column-fixed-indexed-thread-safety
Oct 28, 2022
Merged

fix thread safety issue with nested column global dictionaries#13265
clintropolis merged 3 commits intoapache:masterfrom
clintropolis:nested-column-fixed-indexed-thread-safety

Conversation

@clintropolis
Copy link
Copy Markdown
Member

@clintropolis clintropolis commented Oct 26, 2022

Description

Fixes a pretty good blunder I made which resulted in the FixedIndexed implementations used by the nested columns not be thread safe, leading to undefined behavior when multiple readers are accessing the nested column global dictionaries at the same time. My intention was that FixedIndexed was using the positional read methods of TypeStrategy, so it should be thread-safe, however, the TypeStrategy employed here were not actually overriding these methods, so it was falling back to a positional read that sets the buffer position, reads, and then resets to the original position, which is most unchill in this scenario.

In addition to implementing these methods for long/double/int strategies being used here, I also made a change to switch nested columns to use FixedIndexed in a supplier, to save heap footprint per #12277 (comment). This change also would have made it thread safe had I not overridden these methods of TypeStrategy since the lazy creation of FixedIndexed now happens per thread.

I added a more direct test on NestedDataColumnSupplier, which was prior to this PR only tested indirectly through native and SQL query tests, to perform basic selector and bitmap index operations, and included a concurrent read test. The concurrency test failed consistently prior to the changes in this PR, but i ran on repeat ~200 times after the changes with no failures.

Release note

Fixes a bug with concurrent reads of nested columns with numeric global value dictionaries which could lead to undefined behavior when returning or filtering on nested numeric values.


This PR has:

  • been self-reviewed.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • been tested in a test Druid cluster.

Comment thread processing/src/main/java/org/apache/druid/segment/data/FixedIndexed.java Outdated
@clintropolis clintropolis merged commit acb9cb0 into apache:master Oct 28, 2022
@clintropolis clintropolis deleted the nested-column-fixed-indexed-thread-safety branch October 28, 2022 00:58
@kfaraz kfaraz added this to the 25.0 milestone Nov 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants