Skip to content

Aggregation support improvement #124

@dai-chen

Description

@dai-chen

Is your feature request related to a problem? Please describe.
Currently there are limitations in aggregate query support in both SQL and PPL, including:

  1. Sort by aggregate function results: Composite Aggregation API only supports natural ordering of group fields. Reference. Issue: Select global top N records with grouping by multiple fields #121, ORDER BY doesn't work as expected on cardinality aggregations opendistro-for-elasticsearch/sql#378
  2. Sort null first or last independently: Composite Aggregation API doesn't have such option and only does ASC+NULL_FIRST or DESC+NULL_LAST. Reference.
  3. Only first 1000 buckets returned and users have no control. Issue: Aggregation doesn't honor default query limit size opendistro-for-elasticsearch/sql#1039

The use cases, especially #1, are very common in analytics. The limitation blocks the broad adoption of SQL/PPL.

Describe the solution you'd like
Either provide more complete aggregation or workarounds for different use cases specifically.

Describe alternatives you've considered
N/A

Additional context
Examples:

POST http://localhost:9200/_plugins/_sql/_explain
{
  "query": "SELECT state, AVG(balance) FROM accounts GROUP BY state"
}
{
  "root": {
    "name": "ProjectOperator",
    "description": {
      "fields": "[state, AVG(balance)]"
    },
    "children": [
      {
        "name": "OpenSearchIndexScan",
        "description": {
          "request": "OpenSearchQueryRequest(indexName=accounts, sourceBuilder={\"from\":0,\"size\":0,\"timeout\":\"1m\",\"aggregations\":{\"composite_buckets\":{\"composite\":{\"size\":1000,\"sources\":[{\"state\":{\"terms\":{\"field\":\"state.keyword\",\"missing_bucket\":true,\"order\":\"asc\"}}}]},\"aggregations\":{\"AVG(balance)\":{\"avg\":{\"field\":\"balance\"}}}}}}, searchDone=false)"
        },
        "children": []
      }
    ]
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Not Started

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions