-
Notifications
You must be signed in to change notification settings - Fork 3.8k
add single input string expression dimension vector selector and better expression planning #11213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a009ce2
09913fc
5a9ecf8
22e8f2f
516a4a5
79468b1
66eceb5
1d3f590
52cc7aa
a3e98ba
60910e2
99d0b74
b934b85
63f6277
3837e2e
fdb7a54
ced4a02
76677bf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -117,4 +117,25 @@ public GroupByVectorColumnSelector makeObjectProcessor( | |
| } | ||
| return NilGroupByVectorColumnSelector.INSTANCE; | ||
| } | ||
|
|
||
| /** | ||
| * The group by engine vector processor has a more relaxed approach to choosing to use a dictionary encoded string | ||
| * selector over an object selector than some of the other {@link VectorColumnProcessorFactory} implementations. | ||
| * | ||
| * Basically, if a valid dictionary exists, we will use it to group on dictionary ids (so that we can use | ||
| * {@link SingleValueStringGroupByVectorColumnSelector} whenever possible instead of | ||
| * {@link DictionaryBuildingSingleValueStringGroupByVectorColumnSelector}). | ||
| * | ||
| * We do this even for things like virtual columns that have a single string input, because it allows deferring | ||
| * accessing any of the actual string values, which involves at minimum reading utf8 byte values and converting | ||
| * them to string form (if not already cached), and in the case of expressions, computing the expression output for | ||
| * the string input. | ||
| */ | ||
| @Override | ||
| public boolean useDictionaryEncodedSelector(ColumnCapabilities capabilities) | ||
| { | ||
| Preconditions.checkArgument(capabilities != null, "Capabilities must not be null"); | ||
| Preconditions.checkArgument(capabilities.getType() == ValueType.STRING, "Must only be called on a STRING column"); | ||
| return capabilities.isDictionaryEncoded().isTrue(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Just for recording (and helping myself to remember later in the future), this means that the groupBy vector engine will use the dictionary IDs to compute per-segment results, and decode them when merging those results, which is what non-vectorized engine does today. When the column is dictionary encoded but not unique, this optimization might not be always good because there could be some sort of tradeoff depending on the column cardinality post expression evaluation. Even though I think this optimization is likely good in most cases, it could worth investigating further later to understand the tradeoff better. |
||
| } | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add
@Nullableforcapabilities.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, it should never be null for this method though (nor any of the other
VectorColumnProcessorFactorymethods),ColumnProcessorswill return a nil vector selector if the capabilities are null, since null capabilities in the vectorized engine means that the column doesn't exist.