add support for filtered projections#18342
Conversation
| @@ -553,6 +555,7 @@ public void testInsertOnExternalDataSourceWithCatalogProjections(String contextN | |||
| new AggregateProjectionMetadata.Schema( | |||
There was a problem hiding this comment.
add a builder to avoid the need for null filter everywhere?
| } | ||
|
|
||
| // if the projection has a filter, the query must contain this filter match | ||
| if (filter != null) { |
There was a problem hiding this comment.
IMO, put this logic and rewriteFilter in their own dedicated file. The logic will only grow more complex over time.
There was a problem hiding this comment.
hmm, this whole method is giant so I started to split all the sub parts out into separate functions to like do the grouping match, aggregator match, and realized there is another place we are doing the query filter match, i think they need to be consolidated to be able to support projections where the filter inputs aren't actually stored in the projection as well as it would be more efficient to look at filter stuff once. I think we would need to shift the filter matching to happen last since it needs the remapped columns the be populated
Will consider splitting into a separate file, though in that case should almost all of the logic be moved out of this file into like Projections which already has findMatchingProjection which is what calls this matches method? That seems fine to me, just pass in the AggregateProjectionMetadata.Schema to most of these methods. Or I guess i could make a utility file per match type, like a GroupingMatcher and FilterMatcher and AggregatorMatcher 🤷
There was a problem hiding this comment.
ok, locally i've moved most of the logic into Projections (and moved the ConstantTimeColumn stuff in Projections to its own file), and it seems nicer so far, more to do tho so havent pushed yet
| } | ||
| // at least one child must have been rewritten to rewrite the AND | ||
| if (childRewritten) { | ||
| if (newChildren.size() > 1) { |
There was a problem hiding this comment.
can newChildren be empty, if all filters under the AND of queryFilter were part of projectionFilter?
There was a problem hiding this comment.
i think not with the current rather weak equals based matching logic - either the parent AND would match a projection AND, or at most a single child of this AND can match. Doing an overlap of AND children would be an improvement, and run into the problem you're asking about I think. I'll see if I can wire that up tho since it seems important for a query with something like x = 'a' and y = 'b' and z = 'c' to match a projection with x = 'a' and y = 'b'
| @Nullable | ||
| public static Filter rewriteFilter(@Nullable Filter projectionFilter, @Nullable Filter queryFilter) | ||
| { | ||
| if (projectionFilter == null || queryFilter == null) { |
There was a problem hiding this comment.
can queryFilter really be null? It seems like this method is only called for nonnull query filters.
| this.timeColumnName = timeColumnName; | ||
| this.timeColumnPosition = foundTimePosition; | ||
| this.granularity = granularity == null ? Granularities.ALL : granularity; | ||
| this.effectiveGranularity = granularity == null ? Granularities.ALL : granularity; |
There was a problem hiding this comment.
i think we're expecting the same size/ ordering bwteen groupingColumns and OrderBy between line 223 and 224, this assumption is guaranteed from ProjectionsSpec but not obvious here, maybe worth adding some checks.
There was a problem hiding this comment.
also it seems like this effectiveGranularity and orderingWithTimeSubstitution logic is very similar to the computeOrdering function, maybe it could be merged?
There was a problem hiding this comment.
i think we're expecting the same size/ ordering bwteen groupingColumns and OrderBy between line 223 and 224, this assumption is guaranteed from ProjectionsSpec but not obvious here, maybe worth adding some checks.
yea, good spot, it is true they unfortunately are coupled right now, though in the future I think i would like to make them more independent rather than consolidate to keep us free to potentially allow user choice in ordering in the future. Given that this is not new in this PR I would rather deal with this in a follow-up
| } | ||
| AggregateProjectionSpec that = (AggregateProjectionSpec) o; | ||
| return Objects.equals(name, that.name) | ||
| && Objects.equals(filter, that.filter) |
There was a problem hiding this comment.
Is it intentional timeColumnName is not included in equals/ hashCode/ toString method, but ordering is? Both are derived so seems like not needed.
There was a problem hiding this comment.
this is sort of aspirational - I think it would be possible someday for ordering to not be fixed to grouping if we ever open up the comparator used to sort data when building segments. However, no such plans for the time column.
(however the other part you noticed about relation between grouping and ordering maybe sort of breaks this)
gianm
left a comment
There was a problem hiding this comment.
Approved, with some thoughts on the tests.
| import java.util.Map; | ||
| import java.util.Set; | ||
|
|
||
| class ProjectionsTest |
There was a problem hiding this comment.
public class? not sure it matters, just odd not to see the keyword.
There was a problem hiding this comment.
used intellij to generate a test file and saw it without the public, i looked into it and apparently in junit5 it is not necessary so have been rolling with it 🤷 https://docs.junit.org/current/user-guide/#writing-tests-classes-and-methods
| } | ||
|
|
||
| @Test | ||
| void testSchemaMatchFilter() |
There was a problem hiding this comment.
We're going to need easier, more concise ways to write tests for filter matching. As the filter matching gets more sophisticated, there will be a lot of variations to test to ensure everything is working properly.
Maybe we can define a few "standard" datasources for the tests, each one with a base table and some projection(s). Then each test would just need to define a CursorBuildSpec and the expected ProjectionMatch.
There was a problem hiding this comment.
ah yea, i was writing these tests as mainly synthetic to smoke test stuff as they are not backed by real data. I think for the bulk of tests I would agree we want something like what CursorFactoryProjectionTest is doing with a matrix of incremental and immutable segments but with a more cursor + filter focus and make it test that results are the same with and without the projection similar to what the native filter tests are doing
| cursorBuildSpec, | ||
| new RowSignatureChecker(baseTable) | ||
| ); | ||
| Projections.ProjectionMatch expected = new Projections.ProjectionMatch( |
There was a problem hiding this comment.
nit- moving ProjectionMatch out into its own class would help a little with making the test code more concise.
Description
This PR adds a
filterproperty toAggregateProjectionSpecto allow projections to be pre-filtered to further reduce row counts for views to power queries with commonly used filters. The filter is built into aValueMatcherinOnHeapAggregateProjectionand rows are checked to match before adding to the facts table. At query time, if a projection has a filter we check to see that the query either contains the exact filter, or is part of an AND filter to ensure that the projection contains all of the rows which would normally match if querying the base table instead. If the query filter matches, the projection filter is removed from the query since it is no longer necessary.The filter matching is pretty brittle right now and requires an exact match, but this can be improved in follow-up work.
This PR has: