snapshot column capabilities for realtime cursors#17386
snapshot column capabilities for realtime cursors#17386gianm merged 14 commits intoapache:masterfrom
Conversation
changes: * adds `CursorBuildSpec.getPhysicalColumns()` to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan) * `IncrementalIndexCursorFactory`/`IncrementalIndexCursorHolder` uses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors from `StringDimensionIndexer` as multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value from `StringDimensionIndexer` to some type other than string, which would often result in a single element array from the column being handled as multi-valued * `StringDimensionIndexer.setSparseIndexed()` now adds the default value to the dictionary when set * `StringDimensionIndexer` column value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct) * fixed a mistake that expression selectors for realtime queries with no null values could not use dictionary encoded selectors
…-cursor-snapshot-capabilities
…-cursor-snapshot-capabilities
…-cursor-snapshot-capabilities
…-cursor-snapshot-capabilities
| public CursorBuildSpecBuilder setGroupingColumns( | ||
| @Nullable List<String> groupingColumns | ||
| ) | ||
| public CursorBuildSpecBuilder setPhyiscalColumns(@Nullable Set<String> physicalColumns) |
| cursorBuildSpecBuilder.setVirtualColumns(preJoinVirtualColumns); | ||
|
|
||
| // add all physical columns columns if they were originally set | ||
| if (physicalColumns != null) { |
There was a problem hiding this comment.
Do we need the condition columns from clauses too?
There was a problem hiding this comment.
this CursorBuildSpec is only for the base table so we only need to include all the stuff that is getting pushed down to the cursor we build on the base table, i'll update the comment
There was a problem hiding this comment.
I meant, if we have a condition like x = j0.y, doesn't the base table cursor need to have x added as a physical column? I would think yes but I don't see the code that adds it.
There was a problem hiding this comment.
this was a good point and missing, and the way things were also resulted in some join only columns getting pushed down to the base table cursor as physical columns. I've updated things to add all of the columns from the clauses and also filter out all of the columns with a join clause prefix to ensure that only base table columns are specified.
| public ColumnCapabilities getColumnCapabilities(String column) | ||
| { | ||
| return IncrementalIndexCursorFactory.snapshotColumnCapabilities(rowSelector, column); | ||
| return capabilitiesMap.get(column); |
There was a problem hiding this comment.
I wonder if we should have a defensive check for calling getColumnCapabilities on a column that wasn't part of physicalColumns. At least one that is activated during tests, even if it's not activated during production.
There was a problem hiding this comment.
i added validation for getColumnCapabilities. I think I would actually like to take it a step further and add this validation for all the selector factory methods for IncrementalIndexColumnSelectorFactory, QueryableIndexColumnSelectorFactory and QueryableIndexVectorColumnSelectorFactory, but have decided to do this as a follow-up since I feel like this PR already has enough stuff going on with it.
…-cursor-snapshot-capabilities
…-cursor-snapshot-capabilities
|
|
||
| DruidException.conditionalDefensive( | ||
| cursorBuildSpec.getPhysicalColumns() == null || capabilities != null || capabilitiesMap.containsKey(column), | ||
| "Asked for physical column capabilities for column[%s] which wasn't specified as required by the query, this is a bug", |
There was a problem hiding this comment.
Not necessary to say "this is a bug"; it's implied by the category being "defensive".
| } | ||
| if (filter != null) { | ||
| for (String column : filter.getRequiredColumns()) { | ||
| if (spec.getVirtualColumns().getVirtualColumn(column) == null) { |
There was a problem hiding this comment.
nit: could use !spec.getVirtualColumns().exists(column)
Description
changes:
CursorBuildSpec.getPhysicalColumns()to allow specifying the set of required physical columns from a segment. if null, all columns are assumed to be required (e.g. full scan)IncrementalIndexCursorFactory/IncrementalIndexCursorHolderuses the physical columns from the cursor build spec to know which set of dimensions to 'snapshot' the capabilities for, allowing expression selectors on realtime queries to no longer be required to treat selectors fromStringDimensionIndexeras multi-valued unless they truly are multi-valued. this fixes several bugs with expressions on realtime queries that change a value fromStringDimensionIndexerto some type other than string, which would often result in a single element array from the column being handled as multi-valuedStringDimensionIndexer.setSparseIndexed()now adds the default value to the dictionary when setStringDimensionIndexercolumn value selectors now always report that they are dictionary encoded, and that name lookup is possible in advance on their selectors (since set sparse adds the null value so the cardinality is correct)Release note
todo
This PR has: