Skip to content

Quantiles sketch agg fails on inner query numeric post-agg columns #7660

@jon-wei

Description

@jon-wei

Affected Version

0.12.0+

Description

When using a quantiles sketch agg (http://druid.io/docs/latest/development/extensions-core/datasketches-quantiles.html) in the outer query of a nested GroupBy that references a numeric column generated by a post-agg in the inner query, the following exception occurs:

java.lang.ClassCastException: java.lang.Double cannot be cast to com.yahoo.sketches.quantiles.DoublesSketch
	at org.apache.druid.query.aggregation.datasketches.quantiles.DoublesSketchMergeBufferAggregator.aggregate(DoublesSketchMergeBufferAggregator.java:65) ~[?:?]
	at org.apache.druid.query.groupby.epinephelinae.AbstractBufferHashGrouper.aggregate(AbstractBufferHashGrouper.java:165) ~[druid-processing-0.14.2-incubating.jar:0.14.2-incubating]
	at org.apache.druid.query.groupby.epinephelinae.SpillingGrouper.aggregate(SpillingGrouper.java:167) ~[druid-processing-0.14.2-incubating.jar:0.14.2-incubating]
	at org.apache.druid.query.groupby.epinephelinae.Grouper.aggregate(Grouper.java:82) ~[druid-processing-0.14.2-incubating.jar:0.14.2-incubating]
	at org.apache.druid.query.groupby.epinephelinae.RowBasedGrouperHelper$1.accumulate(RowBasedGrouperHelper.java:270) ~[druid-processing-0.14.2-incubating.jar:0.14.2-incubating]
	at org.apache.druid.query.groupby.epinephelinae.RowBasedGrouperHelper$1.accumulate(RowBasedGrouperHelper.java:247) ~[druid-processing-0.14.2-incubating.jar:0.14.2-incubating]
	at org.apache.druid.java.util.common.guava.FilteringAccumulator.accumulate(FilteringAccumulator.java:41) ~[druid-core-0.14.2-incubating.jar:0.14.2-incubating]
	at org.apache.druid.java.util.common.guava.MappingAccumulator.accumulate(MappingAccumulator.java:40) ~[druid-core-0.14.2-incubating.jar:0.14.2-incubating]

This occurs because the factorizeBuffered method in DoublesSketchAggregatorFactory relies on metricFactory.getColumnCapabilities(fieldName) to determine if an input column is numeric. If the column is not numeric, the aggregator assumes the input is a complex DoublesSketch object. For postaggs, the type information is not available, so the type mismatch occurs.

This issue may also be present in other aggregator types, I have not searched through the other implementations.

The following query structure will reproduce the issue:

{
  "queryType": "groupBy",
  "intervals": [
    "2015-09-12/2015-09-13"
  ],
  "dataSource": {
    "type": "query",
    "query": {
      "queryType": "groupBy",
      "dataSource": "wikipedia",
      "intervals": [
        "2015-09-12/2015-09-13"
      ],
      "dimensions": [
        "page"
      ],
      "aggregations": [
        {
          "type": "quantilesDoublesSketch",
          "name": "innerSketch",
          "fieldName": "added"
        },
        {
          "type": "count",
          "name": "sampleCount"
        }
      ],
      "postAggregations": [
        {
          "type": "quantilesDoublesSketchToQuantile",
          "name": "innerMedian",
          "field": {
            "type": "fieldAccess",
            "fieldName": "innerSketch"
          },
          "fraction": 0.5
        }
      ],
      "granularity": "all"
    }
  },
  "dimensions": [
    "page"
  ],
  "aggregations": [
    {
      "type": "quantilesDoublesSketch",
      "name": "outerSketch",
      "fieldName": "innerMedian"
    },
    {
      "type": "count",
      "name": "clientCount"
    }
  ],
  "postAggregations": [
    {
      "type": "quantilesDoublesSketchToQuantile",
      "name": "outerMedian",
      "field": {
        "type": "fieldAccess",
        "fieldName": "outerSketch"
      },
      "fraction": 0.5
    }
  ],
  "granularity": "all",
  "context": {
    "skipEmptyBuckets": "true"
  }
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions