Skip to content

HLLSketch to support String[] as produced by transformationSpec #8613

@gipeshka

Description

@gipeshka

Description

Current implementation of HllSketchBuildAggregator already supports List<String>, therefore it works fine with multi-value fields when building an aggregate. It would be nice if the aggregator would support String[] as well.

Motivation

What I tried was a following ingestion spec:

"dataSchema": {
  "transformSpec": {
    "transforms": [
      {
        "type": "expression",
        "name": "all_hashes",
        "expression": "array_concat(hashes_column_1, hashes_column_2)"
      }
    ]
  },
  "metricsSpec": [
    {
      "type": "HLLSketchBuild",
      "name": "hashes_sketch",
      "fieldName": "all_hashes"
    }
  ]
}

Which resulted in an error

org.apache.druid.java.util.common.IAE: Unsupported type class [Ljava.lang.String;
	at org.apache.druid.query.aggregation.datasketches.hll.HllSketchBuildAggregator.updateSketch(HllSketchBuildAggregator.java:119) ~[?:?]

Supposedly required change can be applied to HllSketchBuildAggregator as follows:

} else if (value instanceof String[]) {
      // noinspection unchecked
      String[] array = ((String[]) value);

      for (String s : array) {
        sketch.update(s.toCharArray());
      }

Just like it is currently done to List<String>

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions