More efficient join filter rewrites#9516
Conversation
4682c24 to
b4ad07f
Compare
|
Rebased and fixed conflicts |
| String searchColumnValue, | ||
| String retrievalColumnName | ||
| String retrievalColumnName, | ||
| long maxCorrelationSetSize, |
There was a problem hiding this comment.
please update javadoc with new parameters
| final boolean enableFilterPushDown, | ||
| final boolean enableFilterRewrite | ||
| final boolean enableFilterRewrite, | ||
| final boolean enableRewriteValueColumnFilters, |
| regionToCountry(JoinType.LEFT) | ||
| ); | ||
|
|
||
| JoinFilterPreAnalysis joinFilterPreAnalysis = JoinFilterAnalyzer.computeJoinFilterPreAnalysis( |
There was a problem hiding this comment.
could you make a utility method that makes a JoinFilterPreAnalysis from a joinableClauses and originalFilter? seems like a lot of these blocks and all the other arguments are the same. Alternatively a builder for JoinFilterPreAnalysis would work, though idk if worth the effort since it would be mostly used by tests.
There was a problem hiding this comment.
Made a utility method
| * takes a filter and splits it into a portion that should be applied to the base table prior to the join, and a | ||
| * portion that should be applied after the join. | ||
| * | ||
| * <p> |
There was a problem hiding this comment.
did you mean to put these <p> tags?
There was a problem hiding this comment.
I removed the <p> tags
| * @param enableFilterPushDown Whether to enable filter push down | ||
| * @return A JoinFilterSplit indicating what parts of the filter should be applied pre-join | ||
| * and post-join. | ||
| * See {@link JoinFilterPreAnalysis} for details on the result of this pre-analysis step. |
There was a problem hiding this comment.
super nit: could you retain the old formatting where the descriptions are offset to the right and aligned? I find it a bit easier to read
There was a problem hiding this comment.
Adjusted the alignment
| * Given a list of JoinFilterColumnCorrelationAnalysis, prune the list so that we only have one | ||
| * JoinFilterColumnCorrelationAnalysis for each unique combination of base columns. | ||
| * | ||
| * <p> |
There was a problem hiding this comment.
ditto question if <p> are intentional
There was a problem hiding this comment.
Removed the <p> tags
| */ | ||
| private static Optional<List<JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns( | ||
| Set<String> baseColumnNames, | ||
| private static Optional<Map<String, JoinFilterColumnCorrelationAnalysis>> findCorrelatedBaseTableColumns( |
There was a problem hiding this comment.
Whoops, restored the missing javadocs
| */ | ||
| private static void getCorrelationForRHSColumn( | ||
| Set<String> baseColumnNames, | ||
| List<JoinableClause> joinableClauses, |
There was a problem hiding this comment.
same question about removing javadocs
There was a problem hiding this comment.
Fixed, restored the docs
| // will return false if there any correlated expressions on the base table. | ||
| // Pushdown of such filters is disabled until the expressions system supports converting an expression | ||
| // into a String representation that can be reparsed into the same expression. | ||
| // https://github.com/apache/druid/issues/9326 tracks this expressions issue. |
There was a problem hiding this comment.
hmm, this is already done in #9367, should you make the modification to make this comment no longer true?
There was a problem hiding this comment.
I'll do that in a follow-on PR
There was a problem hiding this comment.
Actually, I changed my mind, I re-enabled filter rewrites when there are LHS expressions in the join condition, and removed the @Ignore annotations on the tests for that
| * | ||
| * @return A JoinFilterAnalysis that indicates how to handle the potentially rewritten filter | ||
| */ | ||
| private static JoinFilterAnalysis rewriteSelectorFilter( |
There was a problem hiding this comment.
same question about removal of javadocs
There was a problem hiding this comment.
Restored missing javadocs
|
This pull request introduces 1 alert when merging e93d282 into 7626be2 - view on LGTM.com new alerts:
|
| final List<VirtualColumn> preJoinVirtualColumns = new ArrayList<>(); | ||
| final List<VirtualColumn> postJoinVirtualColumns = new ArrayList<>(); | ||
|
|
||
| final Set<String> baseColumns = determineBaseColumnsWithPreAndPostJoinVirtualColumns( |
There was a problem hiding this comment.
With the deletion below, CI (spotbugs and intellij inspections) is flagging this as unused now
| normalizedOrClauses = ((AndFilter) normalizedFilter).getFilters(); | ||
| } else { | ||
| normalizedOrClauses = Collections.singletonList(normalizedFilter); | ||
| List<VirtualColumn> pushDownVirtualColumns = new ArrayList<>(); |
There was a problem hiding this comment.
CI (spotbugs and intellij inspections) is flagging this as unused
Codecov Report
@@ Coverage Diff @@
## master #9516 +/- ##
============================================
+ Coverage 63.59% 64.6% +1.01%
- Complexity 23583 23639 +56
============================================
Files 3056 2939 -117
Lines 126205 120119 -6086
Branches 17456 16099 -1357
============================================
- Hits 80254 77608 -2646
+ Misses 38756 35528 -3228
+ Partials 7195 6983 -212
Continue to review full report at Codecov.
|
clintropolis
left a comment
There was a problem hiding this comment.
whew, this is complicated, but i think this lgtm 😅:+1:
This PR adjusts the join filter rewrite/pushdown logic in
JoinFilterAnalyzerto avoid redundant computation/memory waste for filter analysis information that's common across segments (converting filters to conjunctive normal form, and determining + storing correlated values for filter rewrites).A new
computeJoinFilterPreAnalysismethod has been added which handles the computations described above (called once per query on each node). The result of this method is passed to thesplitFiltersmethod (called once per segment).The pre-analysis step is called in
Joinables.createSegmentMapFn.Two new query context parameters are added:
enableJoinFilterRewriteValueColumnFilters: Controls whether we rewrite RHS filters on non-key columns. False by default for performance reasons, since rewriting such filters requires a scan of the RHS table.joinFilterRewriteMaxSize: Controls the maximum size of the correlated value set used for filter rewrites. This limit is used to prevent excessive memory use. The default limit is 10000.This PR has: