Deduce type from the aggregators when materializing subquery results #16703
Deduce type from the aggregators when materializing subquery results #16703LakshSingla merged 3 commits intoapache:masterfrom
Conversation
|
|
||
| @MethodSource("constructorFeeder") | ||
| @ParameterizedTest(name = "{0}") | ||
| public void testTimeseriesSubqueryWithEarliestAggregator(String testName, Map<String, Object> queryContext) |
Check notice
Code scanning / CodeQL
Useless parameter
|
|
||
| @MethodSource("constructorFeeder") | ||
| @ParameterizedTest(name = "{0}") | ||
| public void testTopNSubqueryWithEarliestAggregator(String testName, Map<String, Object> queryContext) |
Check notice
Code scanning / CodeQL
Useless parameter
|
|
||
| @MethodSource("constructorFeeder") | ||
| @ParameterizedTest(name = "{0}") | ||
| public void testGroupBySubqueryWithEarliestAggregator(String testName, Map<String, Object> queryContext) |
Check notice
Code scanning / CodeQL
Useless parameter
| RowSignature rowSignature = query.getResultRowSignature( | ||
| query.context().isFinalize(true) | ||
| ? RowSignature.Finalization.YES | ||
| : RowSignature.Finalization.NO | ||
| ); |
There was a problem hiding this comment.
I believe this logic supposed to be inside query.getResultRowSignature; query already knows context() ; why should we tell it from the outside the value of Finalization?
doesn't that work?
There was a problem hiding this comment.
It should work, but I am scared to make that change given that it will affect everything from native and SQL queries. Lemme try making the change and see if there are any failing tests.
There was a problem hiding this comment.
I realised that it shouldn't work. For example - look at GroupByPreShuffleFrameProcessor and GroupByPostShuffleFrameProcessor. The same query requires different finalization modes, since one partially aggregates and we need to intermediate type while the other completely aggregates and finalizes. This information isn't fully captured by the query and needs someone from the outside to tell which finalization mode to use. Therefore we can't trustily determine based on the query context.
| }; | ||
| } | ||
|
|
||
| private RowSignature resultSignature(final TimeseriesQuery query, final RowSignature.Finalization finalization) |
There was a problem hiding this comment.
can this method be moved to be: TimeseriesQuery#getResultSignature (like for GroupByQuery )
or TimeseriesQuery#getRowSignature (like for ScanQuery ) ?
| ) | ||
| { | ||
| final RowSignature rowSignature = resultArraySignature(query); | ||
| final RowSignature rowSignature = |
There was a problem hiding this comment.
similarily to TS: - shouldn't this be TopNQuery#getResultRowSignature ?
please also update resultArraySignature to use that method so we are not duplicating logic
|
Thanks for the review @kgyrtkirk. |
…pache#16703) For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when maxSubqueryBytes is set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.
Description
For aggregators like StringFirst/Last, whose intermediate type isn't the same as the final type, using them in GroupBy, TopN or Timeseries subqueries causes a fallback when
maxSubqueryBytesis set. This is because we assume that the finalization is not known, due to which the row signature cannot determine whether to use the intermediate or the final type, and it puts it as null. This PR figures out the finalization from the query context and uses the intermediate or the final type appropriately.Release note
Key changed/added classes in this PR
MyFooOurBarTheirBazThis PR has: