Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/querying/query-context.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ These parameters apply to all query types.
|parallelMergeParallelism|`druid.processing.merge.pool.parallelism`|Maximum number of parallel threads to use for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
|parallelMergeInitialYieldRows|`druid.processing.merge.task.initialYieldNumRows`|Number of rows to yield per ForkJoinPool merge task for parallel result merging on the Broker, before forking off a new task to continue merging sequences. See [Broker configuration](../configuration/index.html#broker) for more details.|
|parallelMergeSmallBatchRows|`druid.processing.merge.task.smallBatchNumRows`|Size of result batches to operate on in ForkJoinPool merge tasks for parallel result merging on the Broker. See [Broker configuration](../configuration/index.html#broker) for more details.|
|useFilterCNF|`false`| If true, Druid will attempt to convert the query filter to Conjunctive Normal Form (CNF). During query processing, columns can be pre-filtered by intersecting the bitmap indexes of all values that match the eligible filters, often greatly reducing the raw number of rows which need to be scanned. But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter. As such, filters in CNF potentially have a higher chance to utilize a large amount of bitmap indexes on string columns during pre-filtering. However, this setting should be used with great caution, as it can sometimes have a negative effect on performance, and in some cases, the act of computing CNF of a filter can be expensive. We recommend hand tuning your filters to produce an optimal form if possible, or at least verifying through experimentation that using this parameter actually improves your query performance with no ill-effects.|
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this effect only happens for the top level filter, or individual clauses of a top level 'and' filter.

Suggest providing a few examples to clarify:

  • An OR filter A || B where A can be resolved using bitmap indexes but B cannot will prevent the whole OR filter from being considered for pre-filtering
  • If it were A && B instead, A would be considered for pre-filtering but B would not.
  • If it were A && (C || D) where C and D can be resolved using bitmap indexes, then the whole filter can be considered for pre-filtering
  • If were A && (B || C) only A will be considered for pre-filtering

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I think how filters work with query processing and how mechanically filters are split into pre and post filters should be documented somewhere, but I don't think this setting is quite the correct avenue. Additionally, since which filters can and can't use bitmaps isn't exactly documented anywhere, I'm not sure how much the examples would help.

If we added this general description of query processing and how filters are involved could link to this setting, and also could link to documentation we would add for the filterTuning added in #8209, as ways the user can help influence how filter processing behaves. Maybe segments.md would be an appropriate place since it mentions bitmaps and their role in filtering, or segment-optimization.md since it involves how to tune segment sizes? Or perhaps we need an advanced-tuning.md to put this and other stuff that users shouldn't really mess with unless they are prepared to roll up their sleeves and experimentally verify the settings to fine tune to their workload?

Should this be part of this PR, or done as a follow-up? It sort of blows up the scope a bit of what I was looking to do as part of this PR, but it also seems useful so I'm fine either way.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be part of this PR, or done as a follow-up? It sort of blows up the scope a bit of what I was looking to do as part of this PR, but it also seems useful so I'm fine either way.

I think the filter tuning guide could be done in a follow-up, it sounds like something that would be much larger than this PR, this PR LGTM.

(There's a merge conflict in the spelling exclusions now)


## Query-type-specific parameters

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ public class QueryContexts
public static final String JOIN_FILTER_REWRITE_ENABLE_KEY = "enableJoinFilterRewrite";
public static final String JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS_ENABLE_KEY = "enableJoinFilterRewriteValueColumnFilters";
public static final String JOIN_FILTER_REWRITE_MAX_SIZE_KEY = "joinFilterRewriteMaxSize";
public static final String USE_FILTER_CNF_KEY = "useFilterCNF";

public static final boolean DEFAULT_BY_SEGMENT = false;
public static final boolean DEFAULT_POPULATE_CACHE = true;
Expand All @@ -67,7 +68,8 @@ public class QueryContexts
public static final boolean DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN = true;
public static final boolean DEFAULT_ENABLE_JOIN_FILTER_REWRITE = true;
public static final boolean DEFAULT_ENABLE_JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS = false;
public static final long DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY = 10000;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

public static final long DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE = 10000;
public static final boolean DEFAULT_USE_FILTER_CNF = false;

@SuppressWarnings("unused") // Used by Jackson serialization
public enum Vectorize
Expand Down Expand Up @@ -249,7 +251,7 @@ public static <T> boolean getEnableJoinFilterRewriteValueColumnFilters(Query<T>

public static <T> long getJoinFilterRewriteMaxSize(Query<T> query)
{
return parseLong(query, JOIN_FILTER_REWRITE_MAX_SIZE_KEY, DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY);
return parseLong(query, JOIN_FILTER_REWRITE_MAX_SIZE_KEY, DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE);
}

public static <T> boolean getEnableJoinFilterPushDown(Query<T> query)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
import org.apache.druid.collections.bitmap.ImmutableBitmap;
import org.apache.druid.query.BitmapResultFactory;
import org.apache.druid.query.Query;
import org.apache.druid.query.QueryContexts;
import org.apache.druid.query.filter.BitmapIndexSelector;
import org.apache.druid.query.filter.DimFilter;
import org.apache.druid.query.filter.DruidPredicateFactory;
Expand Down Expand Up @@ -59,7 +60,6 @@
*/
public class Filters
{
private static final String CTX_KEY_USE_FILTER_CNF = "useFilterCNF";

/**
* Convert a list of DimFilters to a list of Filters.
Expand Down Expand Up @@ -423,7 +423,7 @@ public static Filter convertToCNFFromQueryContext(Query query, @Nullable Filter
if (filter == null) {
return null;
}
boolean useCNF = query.getContextBoolean(CTX_KEY_USE_FILTER_CNF, false);
boolean useCNF = query.getContextBoolean(QueryContexts.USE_FILTER_CNF_KEY, QueryContexts.DEFAULT_USE_FILTER_CNF);
return useCNF ? Filters.toCnf(filter) : filter;
}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -193,7 +193,7 @@ protected HashJoinSegmentStorageAdapter makeFactToCountrySegment()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

return new HashJoinSegmentStorageAdapter(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -307,7 +307,7 @@ public void test_makeCursors_factToCountryLeft()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -375,7 +375,7 @@ public void test_makeCursors_factToCountryInner()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -437,7 +437,7 @@ public void test_makeCursors_factToCountryInnerUsingLookup()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -501,7 +501,7 @@ public void test_makeCursors_factToCountryInnerUsingCountryNumber()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -571,7 +571,7 @@ public void test_makeCursors_factToCountryInnerUsingCountryNumberUsingLookup()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -637,7 +637,7 @@ public void test_makeCursors_factToCountryLeftWithFilterOnFacts()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -678,7 +678,7 @@ public void test_makeCursors_factToCountryRightWithFilterOnLeftIsNull()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -721,7 +721,7 @@ public void test_makeCursors_factToCountryFullWithFilterOnLeftIsNull()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -769,7 +769,7 @@ public void test_makeCursors_factToCountryRightWithFilterOnJoinable()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -817,7 +817,7 @@ public void test_makeCursors_factToCountryLeftWithFilterOnJoinable()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -864,7 +864,7 @@ public void test_makeCursors_factToCountryLeftWithFilterOnJoinableUsingLookup()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -924,7 +924,7 @@ public void test_makeCursors_factToCountryInnerWithFilterInsteadOfRealJoinCondit
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -990,7 +990,7 @@ public void test_makeCursors_factToRegionToCountryLeft()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1069,7 +1069,7 @@ public void test_makeCursors_factToCountryAlwaysTrue()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);
JoinTestHelper.verifyCursors(
new HashJoinSegmentStorageAdapter(
Expand Down Expand Up @@ -1136,7 +1136,7 @@ public void test_makeCursors_factToCountryAlwaysFalse()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1187,7 +1187,7 @@ public void test_makeCursors_factToCountryAlwaysTrueUsingLookup()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1255,7 +1255,7 @@ public void test_makeCursors_factToCountryAlwaysFalseUsingLookup()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1315,7 +1315,7 @@ public void test_makeCursors_factToCountryUsingVirtualColumn()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1373,7 +1373,7 @@ public void test_makeCursors_factToCountryUsingExpression()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1433,7 +1433,7 @@ public void test_makeCursors_factToRegionTheWrongWay()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down Expand Up @@ -1493,7 +1493,7 @@ public void test_makeCursors_errorOnNonEquiJoin()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.readCursors(
Expand Down Expand Up @@ -1539,7 +1539,7 @@ public void test_makeCursors_errorOnNonKeyBasedJoin()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.readCursors(
Expand Down Expand Up @@ -1572,7 +1572,7 @@ public void test_makeCursors_factToCountryLeft_filterExcludesAllLeftRows()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

JoinTestHelper.verifyCursors(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ public void setUp() throws IOException
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

hashJoinSegment = new HashJoinSegment(
Expand All @@ -113,7 +113,7 @@ public void test_constructor_noClauses()
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

final HashJoinSegment ignored = new HashJoinSegment(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -471,7 +471,7 @@ public void test_filterPushDown_factToRegionFilterOnRHSRegionNameExprVirtualColu
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);

HashJoinSegmentStorageAdapter adapter = new HashJoinSegmentStorageAdapter(
Expand Down Expand Up @@ -1476,7 +1476,7 @@ public void test_filterPushDown_factToRegionToCountryLeftFilterOnPageDisablePush
false,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);
HashJoinSegmentStorageAdapter adapter = new HashJoinSegmentStorageAdapter(
factSegment.asStorageAdapter(),
Expand Down Expand Up @@ -1548,7 +1548,7 @@ public void test_filterPushDown_factToRegionToCountryLeftEnablePushDownDisableRe
true,
false,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);
HashJoinSegmentStorageAdapter adapter = new HashJoinSegmentStorageAdapter(
factSegment.asStorageAdapter(),
Expand Down Expand Up @@ -1752,7 +1752,7 @@ private static JoinFilterPreAnalysis simplePreAnalysis(
true,
true,
true,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE
);
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,7 @@ public void test_createSegmentMapFn_noClauses()
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE,
null,
VirtualColumns.EMPTY
);
Expand Down Expand Up @@ -131,7 +131,7 @@ public void test_createSegmentMapFn_unusableClause()
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE,
null,
VirtualColumns.EMPTY
);
Expand Down Expand Up @@ -168,7 +168,7 @@ public void test_createSegmentMapFn_usableClause()
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_PUSH_DOWN,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_VALUE_COLUMN_FILTERS,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE_KEY,
QueryContexts.DEFAULT_ENABLE_JOIN_FILTER_REWRITE_MAX_SIZE,
null,
VirtualColumns.EMPTY
);
Expand Down
4 changes: 4 additions & 0 deletions website/.spelling
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ CORS
CPUs
CSVs
Ceph
CNF
ColumnDescriptor
Corretto
DDL
Expand Down Expand Up @@ -307,6 +308,8 @@ pre-computation
pre-compute
pre-computing
pre-configured
pre-filtered
pre-filtering
pre-generated
pre-made
pre-processing
Expand Down Expand Up @@ -380,6 +383,7 @@ unmergeable
unmerged
unparseable
unparsed
useFilterCNF
uptime
uris
useFieldDiscovery
Expand Down