Deserialize dimensions in group by queries to their respective types when reading from their serialized format#16511
Conversation
| throw DruidException.defensive("Not implemented. Check groupable() first"); | ||
| } | ||
|
|
||
| default Class<?> complexDimensionType() |
There was a problem hiding this comment.
nit: maybe getComplexClass or the classic getClazz would be a better name since it probably isn't a different type when used as a dimension (and we have the separate groupable to determine if it can be used as a dimension or not)
| TypeSignature<?> signature, | ||
| @Nullable final Hash.Strategy<T> hashStrategy | ||
| @Nullable final Hash.Strategy<T> hashStrategy, | ||
| @Nullable final Class<?> complexDimensionType |
There was a problem hiding this comment.
should this just use getClazz of ObjectStrategy?
| final ObjectMapper jsonMapper, | ||
| final ObjectMapper spillMapper, |
There was a problem hiding this comment.
should jsonMapper and spillMapper be different? I know spillMapper is usually smile, though isn't results from historicals also smile? regardless, probably worth a comment somewhere of why we have different mappers
There was a problem hiding this comment.
When writing the code, this seemed natural, since we are using spillMapper for the spilling runner, and nothing else. After going through the group by code again, I think we should still be using the jsonMapper.
The input rows which get further deserialized according to the complex types are sourced from:
- https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByRowProcessor.java#L90 - The input rows are from the processed subqueries.
- https://github.com/apache/druid/blob/master/processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/GroupByMergingQueryRunner.java#L254C54-L254C55 - The input rows are to the mergeRunner, which has already been read from the segments by the createRunner.
There was a problem hiding this comment.
Retested stuff, serde in (2) is not required.
26ef2a0 to
7d70e53
Compare
|
Thanks for the review @clintropolis @asdf2014 |
…e types when reading from spilled files and cached results (apache#16620) Like apache#16511, but for keys that have been spilled or cached during the grouping process
Description
Grouping on complex columns allowed the dimensions to be of complex types. When reading the dimensions from their serialized format, we must deserialize them in their desired complex type. Otherwise, Druid deserializes the dimension into a generic java object equivalent to
objectMapper.readValue(serializedValue, Object.class)We read the dimensions from their serialized form when reading dimensions over the wire (in native queries)
This wasn't discovered at the time of the original PR because only JSON columns support groupability at the moment, and they are designed to handle the generic Java object that the object mapper deserializes to by default. However, this is required for most other complex types that require groupability.
Interface change
Benchmarking - I have ran the
GroupByBenchmark.queryMultiQueryableIndexWithSerdebenchmark with limited iterations and here are the results before and after the change.Release note
(none)
Key changed/added classes in this PR
MyFooOurBarTheirBazThis PR has: