More efficient join filter rewrites by jon-wei · Pull Request #9516 · apache/druid

jon-wei · 2020-03-14T00:31:50Z

This PR adjusts the join filter rewrite/pushdown logic in JoinFilterAnalyzer to avoid redundant computation/memory waste for filter analysis information that's common across segments (converting filters to conjunctive normal form, and determining + storing correlated values for filter rewrites).

A new computeJoinFilterPreAnalysis method has been added which handles the computations described above (called once per query on each node). The result of this method is passed to the splitFilters method (called once per segment).

The pre-analysis step is called in Joinables.createSegmentMapFn.

Two new query context parameters are added:

enableJoinFilterRewriteValueColumnFilters : Controls whether we rewrite RHS filters on non-key columns. False by default for performance reasons, since rewriting such filters requires a scan of the RHS table.
joinFilterRewriteMaxSize: Controls the maximum size of the correlated value set used for filter rewrites. This limit is used to prevent excessive memory use. The default limit is 10000.

This PR has:

been self-reviewed.
added documentation for new or modified features or behaviors.
added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
added or updated version, license, or notice information in licenses.yaml
added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
added unit tests or modified existing tests to cover new code paths.
added integration tests.
been tested in a test Druid cluster.

jon-wei · 2020-03-16T18:23:23Z

Rebased and fixed conflicts

clintropolis · 2020-03-16T18:48:28Z

      String searchColumnValue,
-      String retrievalColumnName
+      String retrievalColumnName,
+      long maxCorrelationSetSize,


please update javadoc with new parameters

Updated javadocs

clintropolis · 2020-03-16T18:48:50Z

      final boolean enableFilterPushDown,
-      final boolean enableFilterRewrite
+      final boolean enableFilterRewrite,
+      final boolean enableRewriteValueColumnFilters,


ditto javadoc

Updated javadocs

clintropolis · 2020-03-16T19:23:55Z

+        regionToCountry(JoinType.LEFT)
+    );
+
+    JoinFilterPreAnalysis joinFilterPreAnalysis = JoinFilterAnalyzer.computeJoinFilterPreAnalysis(


could you make a utility method that makes a JoinFilterPreAnalysis from a joinableClauses and originalFilter? seems like a lot of these blocks and all the other arguments are the same. Alternatively a builder for JoinFilterPreAnalysis would work, though idk if worth the effort since it would be mostly used by tests.

Made a utility method

clintropolis · 2020-03-16T19:30:35Z

- * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a
- * portion that should be applied after the join.
- *
+ * <p>


did you mean to put these <p> tags?

I removed the <p> tags

clintropolis · 2020-03-16T19:32:29Z

-   * @param enableFilterPushDown          Whether to enable filter push down
-   * @return A JoinFilterSplit indicating what parts of the filter should be applied pre-join
-   *         and post-join.
+   * See {@link JoinFilterPreAnalysis} for details on the result of this pre-analysis step.


super nit: could you retain the old formatting where the descriptions are offset to the right and aligned? I find it a bit easier to read

Adjusted the alignment

clintropolis · 2020-03-16T20:14:19Z

   * Given a list of JoinFilterColumnCorrelationAnalysis, prune the list so that we only have one
   * JoinFilterColumnCorrelationAnalysis for each unique combination of base columns.
-   *
+   * <p>


ditto question if <p> are intentional

Removed the <p> tags

clintropolis · 2020-03-16T20:15:36Z

-   */
-  private static Optional<List<JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(
-      Set<String> baseColumnNames,
+  private static Optional<Map<String, JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns(


why remove javadocs?

Whoops, restored the missing javadocs

clintropolis · 2020-03-16T20:15:57Z

-   */
  private static void getCorrelationForRHSColumn(
-      Set<String> baseColumnNames,
+      List<JoinableClause> joinableClauses,


same question about removing javadocs

Fixed, restored the docs

clintropolis · 2020-03-16T20:18:11Z

+          // will return false if there any correlated expressions on the base table.
+          // Pushdown of such filters is disabled until the expressions system supports converting an expression
+          // into a String representation that can be reparsed into the same expression.
+          // https://github.com/apache/druid/issues/9326 tracks this expressions issue.


hmm, this is already done in #9367, should you make the modification to make this comment no longer true?

I'll do that in a follow-on PR

Actually, I changed my mind, I re-enabled filter rewrites when there are LHS expressions in the join condition, and removed the @Ignore annotations on the tests for that

clintropolis · 2020-03-16T20:18:53Z

-   *
-   * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter
-   */
  private static JoinFilterAnalysis rewriteSelectorFilter(


same question about removal of javadocs

Restored missing javadocs

lgtm-com · 2020-03-17T01:53:33Z

This pull request introduces 1 alert when merging e93d282 into 7626be2 - view on LGTM.com

new alerts:

1 for Spurious Javadoc @param tags

ccaominh · 2020-03-17T01:52:21Z

    final List<VirtualColumn> preJoinVirtualColumns = new ArrayList<>();
    final List<VirtualColumn> postJoinVirtualColumns = new ArrayList<>();
+
    final Set<String> baseColumns = determineBaseColumnsWithPreAndPostJoinVirtualColumns(


With the deletion below, CI (spotbugs and intellij inspections) is flagging this as unused now

ccaominh · 2020-03-17T01:53:54Z

-      normalizedOrClauses = ((AndFilter) normalizedFilter).getFilters();
-    } else {
-      normalizedOrClauses = Collections.singletonList(normalizedFilter);
+    List<VirtualColumn> pushDownVirtualColumns = new ArrayList<>();


CI (spotbugs and intellij inspections) is flagging this as unused

codecov-io · 2020-03-17T03:37:08Z

Codecov Report

Merging #9516 into master will increase coverage by 1.01%.
The diff coverage is 88.31%.

@@             Coverage Diff              @@
##             master    #9516      +/-   ##
============================================
+ Coverage     63.59%    64.6%   +1.01%     
- Complexity    23583    23639      +56     
============================================
  Files          3056     2939     -117     
  Lines        126205   120119    -6086     
  Branches      17456    16099    -1357     
============================================
- Hits          80254    77608    -2646     
+ Misses        38756    35528    -3228     
+ Partials       7195     6983     -212

Impacted Files	Coverage Δ	Complexity Δ
...a/org/apache/druid/query/groupby/GroupByQuery.java	`92.16% <ø> (ø)`	`128 <0> (ø)`	⬇️
...in/java/org/apache/druid/query/topn/TopNQuery.java	`49.23% <ø> (ø)`	`18 <0> (ø)`	⬇️
...apache/druid/query/timeseries/TimeseriesQuery.java	`47.05% <ø> (ø)`	`18 <0> (ø)`	⬇️
...in/java/org/apache/druid/query/scan/ScanQuery.java	`56.97% <ø> (ø)`	`26 <0> (ø)`	⬇️
...g/apache/druid/server/LocalQuerySegmentWalker.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...ng/src/main/java/org/apache/druid/query/Query.java	`0% <0%> (ø)`	`0 <0> (ø)`	⬇️
...ache/druid/segment/join/lookup/LookupJoinable.java	`47.61% <0%> (+16.04%)`	`6 <0> (+2)`	⬆️
...druid/segment/join/table/IndexedTableJoinable.java	`79.41% <0%> (+11.55%)`	`12 <0> (+2)`	⬆️
...ain/java/org/apache/druid/query/QueryContexts.java	`56.09% <0%> (-1.41%)`	`24 <0> (ø)`
.../java/org/apache/druid/segment/join/Joinables.java	`97.61% <100%> (+0.05%)`	`23 <0> (ø)`	⬇️
... and 146 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update e7b3dd9...3a449c7. Read the comment docs.

clintropolis

whew, this is complicated, but i think this lgtm 😅:+1:

jon-wei added Area - Querying Improvement labels Mar 14, 2020

jon-wei added 2 commits March 16, 2020 11:13

More efficient join filter rewrites

ddae773

Rebase

b4ad07f

jon-wei force-pushed the join_rewrite_adjust2 branch from 4682c24 to b4ad07f Compare March 16, 2020 18:23

Remove unused functions

0684717

clintropolis reviewed Mar 16, 2020

View reviewed changes

jon-wei added 3 commits March 16, 2020 17:03

PR comments, fix compile

1041ca2

Adjust comment

7969358

Allow filter rewrite when join condition has LHS expression

e93d282

ccaominh reviewed Mar 17, 2020

View reviewed changes

jon-wei added 2 commits March 16, 2020 19:06

Fix inspections

c984512

Fix tests

3a449c7

clintropolis approved these changes Mar 17, 2020

View reviewed changes

jon-wei merged commit b184736 into apache:master Mar 17, 2020

jihoonson added this to the 0.18.0 milestone Mar 26, 2020

Conversation

jon-wei commented Mar 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jon-wei commented Mar 16, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jon-wei Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jon-wei Mar 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lgtm-com Bot commented Mar 17, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

codecov-io commented Mar 17, 2020

Codecov Report

Uh oh!

clintropolis left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jon-wei commented Mar 14, 2020 •

edited

Loading

jon-wei Mar 17, 2020 •

edited

Loading

jon-wei Mar 17, 2020 •

edited

Loading