Skip to content

More efficient join filter rewrites#9516

Merged
jon-wei merged 8 commits intoapache:masterfrom
jon-wei:join_rewrite_adjust2
Mar 17, 2020
Merged

More efficient join filter rewrites#9516
jon-wei merged 8 commits intoapache:masterfrom
jon-wei:join_rewrite_adjust2

Conversation

@jon-wei
Copy link
Copy Markdown
Contributor

@jon-wei jon-wei commented Mar 14, 2020

This PR adjusts the join filter rewrite/pushdown logic in JoinFilterAnalyzer to avoid redundant computation/memory waste for filter analysis information that's common across segments (converting filters to conjunctive normal form, and determining + storing correlated values for filter rewrites).

A new computeJoinFilterPreAnalysis method has been added which handles the computations described above (called once per query on each node). The result of this method is passed to the splitFilters method (called once per segment).

The pre-analysis step is called in Joinables.createSegmentMapFn.

Two new query context parameters are added:

  • enableJoinFilterRewriteValueColumnFilters : Controls whether we rewrite RHS filters on non-key columns. False by default for performance reasons, since rewriting such filters requires a scan of the RHS table.
  • joinFilterRewriteMaxSize: Controls the maximum size of the correlated value set used for filter rewrites. This limit is used to prevent excessive memory use. The default limit is 10000.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths.
  • added integration tests.
  • been tested in a test Druid cluster.

@jon-wei jon-wei force-pushed the join_rewrite_adjust2 branch from 4682c24 to b4ad07f Compare March 16, 2020 18:23
@jon-wei
Copy link
Copy Markdown
Contributor Author

jon-wei commented Mar 16, 2020

Rebased and fixed conflicts

String searchColumnValue,
String retrievalColumnName
String retrievalColumnName,
long maxCorrelationSetSize,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update javadoc with new parameters

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated javadocs

final boolean enableFilterPushDown,
final boolean enableFilterRewrite
final boolean enableFilterRewrite,
final boolean enableRewriteValueColumnFilters,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto javadoc

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated javadocs

regionToCountry(JoinType.LEFT)
);

JoinFilterPreAnalysis joinFilterPreAnalysis = JoinFilterAnalyzer.computeJoinFilterPreAnalysis(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you make a utility method that makes a JoinFilterPreAnalysis from a joinableClauses and originalFilter? seems like a lot of these blocks and all the other arguments are the same. Alternatively a builder for JoinFilterPreAnalysis would work, though idk if worth the effort since it would be mostly used by tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made a utility method

* takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
* portion that should be applied after the join.
*
* <p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean to put these <p> tags?

Copy link
Copy Markdown
Contributor Author

@jon-wei jon-wei Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the <p> tags

* @param enableFilterPushDown Whether to enable filter push down
* @return A JoinFilterSplit indicating what parts of the filter should be applied pre-join
* and post-join.
* See {@link JoinFilterPreAnalysis} for details on the result of this pre-analysis step.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: could you retain the old formatting where the descriptions are offset to the right and aligned? I find it a bit easier to read

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the alignment

* Given a list of JoinFilterColumnCorrelationAnalysis, prune the list so that we only have one
* JoinFilterColumnCorrelationAnalysis for each unique combination of base columns.
*
* <p>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto question if <p> are intentional

Copy link
Copy Markdown
Contributor Author

@jon-wei jon-wei Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the <p> tags

*/
private static Optional<List<JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
Set<String> baseColumnNames,
private static Optional<Map<String, JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remove javadocs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, restored the missing javadocs

*/
private static void getCorrelationForRHSColumn(
Set<String> baseColumnNames,
List<JoinableClause> joinableClauses,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about removing javadocs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, restored the docs

// will return false if there any correlated expressions on the base table.
// Pushdown of such filters is disabled until the expressions system supports converting an expression
// into a String representation that can be reparsed into the same expression.
// https://github.com/apache/druid/issues/9326 tracks this expressions issue.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is already done in #9367, should you make the modification to make this comment no longer true?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do that in a follow-on PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I changed my mind, I re-enabled filter rewrites when there are LHS expressions in the join condition, and removed the @Ignore annotations on the tests for that

*
* @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
*/
private static JoinFilterAnalysis rewriteSelectorFilter(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question about removal of javadocs

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restored missing javadocs

@lgtm-com
Copy link
Copy Markdown

lgtm-com Bot commented Mar 17, 2020

This pull request introduces 1 alert when merging e93d282 into 7626be2 - view on LGTM.com

new alerts:

  • 1 for Spurious Javadoc @param tags

final List<VirtualColumn> preJoinVirtualColumns = new ArrayList<>();
final List<VirtualColumn> postJoinVirtualColumns = new ArrayList<>();

final Set<String> baseColumns = determineBaseColumnsWithPreAndPostJoinVirtualColumns(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the deletion below, CI (spotbugs and intellij inspections) is flagging this as unused now

normalizedOrClauses = ((AndFilter) normalizedFilter).getFilters();
} else {
normalizedOrClauses = Collections.singletonList(normalizedFilter);
List<VirtualColumn> pushDownVirtualColumns = new ArrayList<>();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI (spotbugs and intellij inspections) is flagging this as unused

@codecov-io
Copy link
Copy Markdown

Codecov Report

Merging #9516 into master will increase coverage by 1.01%.
The diff coverage is 88.31%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #9516      +/-   ##
============================================
+ Coverage     63.59%    64.6%   +1.01%     
- Complexity    23583    23639      +56     
============================================
  Files          3056     2939     -117     
  Lines        126205   120119    -6086     
  Branches      17456    16099    -1357     
============================================
- Hits          80254    77608    -2646     
+ Misses        38756    35528    -3228     
+ Partials       7195     6983     -212
Impacted Files Coverage Δ Complexity Δ
...a/org/apache/druid/query/groupby/GroupByQuery.java 92.16% <ø> (ø) 128 <0> (ø) ⬇️
...in/java/org/apache/druid/query/topn/TopNQuery.java 49.23% <ø> (ø) 18 <0> (ø) ⬇️
...apache/druid/query/timeseries/TimeseriesQuery.java 47.05% <ø> (ø) 18 <0> (ø) ⬇️
...in/java/org/apache/druid/query/scan/ScanQuery.java 56.97% <ø> (ø) 26 <0> (ø) ⬇️
...g/apache/druid/server/LocalQuerySegmentWalker.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...ng/src/main/java/org/apache/druid/query/Query.java 0% <0%> (ø) 0 <0> (ø) ⬇️
...ache/druid/segment/join/lookup/LookupJoinable.java 47.61% <0%> (+16.04%) 6 <0> (+2) ⬆️
...druid/segment/join/table/IndexedTableJoinable.java 79.41% <0%> (+11.55%) 12 <0> (+2) ⬆️
...ain/java/org/apache/druid/query/QueryContexts.java 56.09% <0%> (-1.41%) 24 <0> (ø)
.../java/org/apache/druid/segment/join/Joinables.java 97.61% <100%> (+0.05%) 23 <0> (ø) ⬇️
... and 146 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7b3dd9...3a449c7. Read the comment docs.

Copy link
Copy Markdown
Member

@clintropolis clintropolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whew, this is complicated, but i think this lgtm 😅:+1:

@jon-wei jon-wei merged commit b184736 into apache:master Mar 17, 2020
@jihoonson jihoonson added this to the 0.18.0 milestone Mar 26, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants