fix filtering bug in filtering unnest cols and dim cols: Received a non-applicable rewrite#14587
Conversation
a3dc3ac to
89aa0b5
Compare
somu-imply
left a comment
There was a problem hiding this comment.
Looks good. Thanks for the PR. The recursive one is a nice touch to handle the complex filters
209c889 to
93afd47
Compare
| final OrFilter preOrFilter = new OrFilter(preFilterList); | ||
| filterSplitter.addPreFilter(preOrFilter); |
There was a problem hiding this comment.
why do we always make an OrFilter if we are only checking for BooleanFilter to break down? Isn't this going to turn AndFilter into OrFilter? Am I missing something?
There was a problem hiding this comment.
you are previously this code was dealign with only OR filters, we need to change the final filter based on type. I Need refactor the code to create new PreFilterList.
There was a problem hiding this comment.
Oh shoot, I missed that. Good catch. We are extracting the filters and adding each to the pre-filter list. Maybe at each level of recursion, we need to recreate the And/Or filter and add it back to the pre-filter list. In such a case the number of filters in pre-filter list will be equal to the original filter size.
There was a problem hiding this comment.
@somu-imply @clintropolis Pushed a change to address the comment. Can you please review?
There was a problem hiding this comment.
There seems to be an issue with the test test_pushdown_or_filters_unnested_and_original_dimension_with_unnest_adapters as the expected results have changed. I'm going through the code today
35bee11 to
42c0a61
Compare
| )) | ||
| )); | ||
|
|
||
| final OrFilter expectedPushDownFilter = new OrFilter(ImmutableList.of( |
There was a problem hiding this comment.
Can we add a similar test case on AND of OR filters as well?
There was a problem hiding this comment.
adding more test case
|
Almost there. Suggesting 1 more change: There are 2 cases:
Here since d3 can be mapped directly to an input column dim3, the filter can be reconstructed as a pre-filter with dim3 IN (a,b) or m1<10 and pushed to the base cursor.
since the filter outside contains an OR filter and |
42c0a61 to
7d9518e
Compare
| bound("dimZipf", "27", "27", false, false, null, StringComparators.NUMERIC), | ||
| new SelectorDimFilter("j0.unnest", "Baz", null) |
There was a problem hiding this comment.
since #14542 has been merged, this should use range and equality instead of bound and making a SelectorDimFilter directly. same for other tests added in this PR
3b27907 to
dd1f92d
Compare
26c4fb1 to
77b62a6
Compare
somu-imply
left a comment
There was a problem hiding this comment.
Thanks for updating the code and adding those tests. I have some comments so far, mostly cosmetic.
| return new SelectorDimFilter(dimension, value, extractionFn).toFilter(); | ||
| } | ||
|
|
||
| public static DimFilter sdfd(String dimension, String value, ExtractionFn extractionFn) |
There was a problem hiding this comment.
nit. maybe some better names...but am good with these too
| * 3. Filters on unnest column which is derived from Array or any other Expression can not be pushe down to base. | ||
| * for example: a=1 and vc=3 , lets say vc is ExpressionVirtualColumn, and vc=3 can not be push down to base even if top level is AND filter. | ||
| * 4. | ||
| * |
There was a problem hiding this comment.
Thanks for the comments, for future developers can we make it a bit clear with why in case 1, the entire filter (b=2 OR c=2) cannot be pushed down to the preFilter. Probably we can add an example saying that either b2 or c2 references a column that is not a part of the input for the case of unnest.
Something like, in unnest we come across 2 cases:
- unnesting a single dimension e.g. select * from foo, UNNEST(MV_TO_ARRAY(dim3)) as u(d3)
- Unnesting an expression from multiple columns e.g. select * from foo, UNNEST(ARRAY[dim1,dim2]) as u(d12)
In case 1, d3 is a direct reference to dim3 so any filter using d3 can be rewritten using dim3 and added to pre filter while in case2, due to presence of the expression virtual column expressionVirtualColumn("j0.unnest", "array(\"dim1\",\"dim2\")", ColumnType.STRING_ARRAY) the filters on d12 cannot be pushed to the pre filters
There was a problem hiding this comment.
updated the comments
77b62a6 to
ccabe2b
Compare
ccabe2b to
7210ae7
Compare
clintropolis
left a comment
There was a problem hiding this comment.
would be nice to add some tests with deeper nesting to UnnestStorageAdapterTest just to flex the recursive stuff a bit more
| NullHandling.sqlCompatible() | ||
| ? equality("dimZipf", "27", ColumnType.LONG) | ||
| : bound("dimZipf", "27", "27", false, false, null, StringComparators.NUMERIC), |
There was a problem hiding this comment.
you can use numericEquality instead of this
| return new SelectorFilter(fieldName, value, null); | ||
| } | ||
|
|
||
| public static Filter sdf(String dimension, String value) |
There was a problem hiding this comment.
why does this need to exist instead of just using selector? I don't see any callers using the version that takes extractionFn, and only one caller using sdfd which seems overkill..
There was a problem hiding this comment.
removed static functions that I added
|
|
||
| public FilterSplitter( | ||
| String inputColumn, | ||
| ColumnCapabilities inputColumnCapabilites, VirtualColumns queryVirtualColumns |
There was a problem hiding this comment.
nit formatting, VirtualColumns should be on a separate line (the style checker has trouble enforcing this sometimes)
c3e0a33 to
0508ec4
Compare
| preFilterList.add(filter); | ||
| // requires check for OR and And filters, disqualify rewrite for non-unnest filters | ||
| if (queryFilter instanceof BooleanFilter) { | ||
| int originalFilterCount = Filters.countNumberOfFilters(queryFilter); |
There was a problem hiding this comment.
it seems like it would be a lot more efficient to do both the before and after count as part of the traversal that the filter splitter does, any reason I'm missing not to do that? Otherwise it seems like we have to basically traverse the filter tree 3 times per segment, before to get a count, then to try to rewrite, then after to get the new count. It seems like this could all just be baked into the FilterSplitter itself. Apologies for not spotting this earlier on in review... but if we do this then i think we no longer need Filters.countNumberOfFilters
| return retVal; | ||
| } | ||
|
|
||
| public static int countNumberOfFilters(Filter filter) |
There was a problem hiding this comment.
nit: if this method needs to stay (re if other comment suggesting pushing counting inside of filter splitter does not work out for some reason) then this argument should be marked @Nullable
| )); | ||
| testComputeBaseAndPostUnnestFilters( | ||
| testQueryFilter, | ||
| "unnested-multi-string1 = 3", |
There was a problem hiding this comment.
this doesn't seem right, shouldn't it be pushing down multi-string1 = 3 since unnested-multi-string1 doesn't exist on the base adapter? I think the problem is that testComputeBaseAndPostUnnestFilters is not getting the virtual column capabilities correctly, which makes it use the default capabilities to define it as float typed rather than string typed, so it isn't rewritten correctly in rewriteFilterOnUnnestColumnIfPossible because it is not string typed.
If i change the line to get the capabilities from the unnest storage adapter
vc.capabilities(UNNEST_STORAGE_ADAPTER, inputColumn)
, the test fails because the pushed down filter is (multi-string1 = 3 && multi-string1 = 2) which seems like the correct answer? Also makes most of the other tests fail expectations. I think we should probably add a check to ensure that filters on the unnest column never get pushed down, since I think that would effectively make the results null since the filter would match nothing since the column doesn't exist (unless it was filtering for nulls i guess)
| null, | ||
| VirtualColumns.EMPTY, | ||
| inputColumn, | ||
| vc.capabilities(inputColumn) |
There was a problem hiding this comment.
this should be
| vc.capabilities(inputColumn) | |
| vc.capabilities(UNNEST_STORAGE_ADAPTER, inputColumn) |
| )); | ||
| testComputeBaseAndPostUnnestFilters( | ||
| testQueryFilter, | ||
| "multi-string1 = 3", |
There was a problem hiding this comment.
why is it that only the filter on the unnest column pushed down instead of the whole and? same question for other test cases involving an AND? I think I would expect that in the case of AND especially, everything which can be pushed down is pushed down. This is what the comments in computeBaseAndPostUnnestFilters indicates should happen..
In the case of an AND filter, I think that everything can be pushed down except for the case where the unnest output column is not a simple direct access to a MVD. Since it is an AND though, I think all of the other clauses could still be pushed down in that case, leaving only the unnest filter which could not be rewritten as a post filter.
But in the case of OR filters, I think its either all or nothing can be pushed down. So if there is a filter on the unnest output column where the virtual column is not a simple direct column access to a multi-value string, then nothing can be pushed down, else if the unnest column filters can be rewritten because they are simple mvd direct access, everything can be pushed down.
There was a problem hiding this comment.
in this test, it is case of And filters, and We only push down the filters to base where rewriteFilterOnUnnestColumnIfPossible is successful.
There was a problem hiding this comment.
fixed the bug in filter counts and now tests looks good, Can you please verify again?
| filterSplitter.addPreFilter(newFilter != null ? newFilter : filter); | ||
| filterSplitter.addToPreFilterCount(1); | ||
| } | ||
| filterSplitter.addToOriginalFilterCount(1); |
There was a problem hiding this comment.
Isn't the filter getting added to the preFilter list twice or counted twice with addToPreFilterCount ? It already got added to preFilterList once in line 503. This would cause the count to go up and when you make the comparison of count on line 330, it would give the wrong count.
b4b0ec1 to
a0a8324
Compare
a0a8324 to
96a8208
Compare
somu-imply
left a comment
There was a problem hiding this comment.
Looks good, two optimizations that need to be done are:
- Make sure not filters on unnested column are not pushed to base
- Update post filters for and filters to include only the subset not pushed to base
These can be done in a followup PR to not delay this one
There was a problem hiding this comment.
👍
per comment (and agreeing with @somu-imply) I think there is still some follow-up work to be done. Also it would be nice to have a few more tests where it can't push anything down, such as when filter is on unnest column that is wrapped in some expression like array_slice which would make the virtual column more than a direct access and so shouldn't be eligible for pushdown.
| testComputeBaseAndPostUnnestFilters( | ||
| testQueryFilter, | ||
| "(multi-string1 = 3 && multi-string1 = 2)", | ||
| "(unnested-multi-string1 = 3 && multi-string1 = 2)" | ||
| ); |
There was a problem hiding this comment.
I think there is still some work to do here, but is ok to do as a follow-up.
Side note, this would be a very strange query since there is like coincidentally a filter on the underlying unnest column that is referring to it directly alongside a filter on the unnested column itself. I think it would make sense to migrate a lot of these tests away from having filters on both the unnest column and its underlying column, since I cant imagine its going to be common in practice.
But thinking about the more general case, like if it was WHERE unnested-multi-string1 = 3 && other-column = 2 i the best rewrite would be
testQueryFilter,
"(multi-string1 = 3 && other-column= 2)",
"unnested-multi-string1 = 3"
);```
but the current code leaves the `other-column = 2` in the post-filters, which isn't necessary. It would be a pretty big optimization to leave these off and only have to do post-filters when absolutely necessary. OR filters cannot have the same optimization.
But, i think it should be ok to do this in a follow-up PR, since there is some other correctness stuff to be done as well, such as skipping pushdown of NOT filters, which could incorrectly filter out too much before the post-filter has a chance to match stuff.
Fixes #XXXX.
Description
This PR fixes the bug in the following query, currently it throws the exception
org.apache.druid.query.QueryException: Received a non-applicable rewrite: {j0.unnest=c2}, filter's dimension: c1Whenever we have And / OR filters on unnest columns, we rewrite the filters on unnest cols.
We were covering OR filters and disqualifying the rewrite for non unnest cols and we have to to do same thing with AND filters as well. This PR checks for Boolean filter and disqualify the rewrite for non unnest cols.
Adding the 3 tests that covers the given scenario.
3rd test case cover the nested filer scenario for unnest filters, This PR address the unpack the nested OR/AND filters and attempt to rewrite only for qualified filters.
Thank you
Fixed the bug ...
Renamed the class ...
Added a forbidden-apis entry ...
Release note
Key changed/added classes in this PR
MyFooOurBarTheirBazThis PR has: