Skip to content

[BUG]Span query in PPL is slower than date histogram aggregation in query DSL #3528

@gaobinlong

Description

@gaobinlong

What is the bug?

In Discover page of OSD, when executing a PPL query against an index pattern containing very huge data(about 750B documents), the query latency of PPL is much more than the latency of the similar query in DQL, which is one hundred seconds vs. seconds.

The PPL is: source = index* | where @timestamp>= '2025-03-25 03:31:32' and@timestamp<= '2025-04-09 03:31:32' | stats count() by span(@timestamp, 12h),

and the similar query DSL is:

{
  "query": {
    "bool": {
      "must": [],
      "filter": [
       
        {
          "range": {
            "@timestamp": {
              "gte": "2025-03-25T03:31:32.935Z",
              "lte": "2025-04-09T03:31:32.935Z",
              "format": "strict_date_optional_time"
            }
          }
        }
      ]
    }
  },
  "size":500,
  "aggs": {
    "2": {
      "date_histogram": {
        "field": "@timestamp",
        "fixed_interval": "12h",
        "time_zone": "+00:00",
        "min_doc_count": 1
      }
    }
  }
}

.

I see PPL will convert the span query to a composite aggregation, like this:

"aggregations": {
    "composite_buckets": {
      "composite": {
        "size": 1000,
        "sources": [
          {
            "span(@timestamp,12h)": {
              "date_histogram": {
                "field": "@timestamp",
                "missing_bucket": true,
                "missing_order": "first",
                "order": "asc",
                "fixed_interval": "12h"
              }
            }
          }
        ]
      },
      "aggregations": {
        "count()": {
          "value_count": {
            "field": "_index"
          }
        }
      }
    }
  }

, seems composite aggregation is slower than the date histogram aggregation.

How can one reproduce the bug?
Steps to reproduce the behavior:

  1. Find a big dataset which contains billions of documents
  2. Execute both the PPL and query DSL above to compare the latency

What is the expected behavior?
PPL should improve the performance.

What is your host/environment?
OpenSearch 3.0.0

Do you have any screenshots?
If applicable, add screenshots to help explain your problem.

Do you have any additional context?
Add any other context about the problem.

Metadata

Metadata

Assignees

Labels

PPLPiped processing languageenhancementNew feature or requestperformanceMake it fast!

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions