-
Notifications
You must be signed in to change notification settings - Fork 181
Description
What is the bug?
In Discover page of OSD, when executing a PPL query against an index pattern containing very huge data(about 750B documents), the query latency of PPL is much more than the latency of the similar query in DQL, which is one hundred seconds vs. seconds.
The PPL is: source = index* | where @timestamp>= '2025-03-25 03:31:32' and@timestamp<= '2025-04-09 03:31:32' | stats count() by span(@timestamp, 12h),
and the similar query DSL is:
{
"query": {
"bool": {
"must": [],
"filter": [
{
"range": {
"@timestamp": {
"gte": "2025-03-25T03:31:32.935Z",
"lte": "2025-04-09T03:31:32.935Z",
"format": "strict_date_optional_time"
}
}
}
]
}
},
"size":500,
"aggs": {
"2": {
"date_histogram": {
"field": "@timestamp",
"fixed_interval": "12h",
"time_zone": "+00:00",
"min_doc_count": 1
}
}
}
}
.
I see PPL will convert the span query to a composite aggregation, like this:
"aggregations": {
"composite_buckets": {
"composite": {
"size": 1000,
"sources": [
{
"span(@timestamp,12h)": {
"date_histogram": {
"field": "@timestamp",
"missing_bucket": true,
"missing_order": "first",
"order": "asc",
"fixed_interval": "12h"
}
}
}
]
},
"aggregations": {
"count()": {
"value_count": {
"field": "_index"
}
}
}
}
}
, seems composite aggregation is slower than the date histogram aggregation.
How can one reproduce the bug?
Steps to reproduce the behavior:
- Find a big dataset which contains billions of documents
- Execute both the PPL and query DSL above to compare the latency
What is the expected behavior?
PPL should improve the performance.
What is your host/environment?
OpenSearch 3.0.0
Do you have any screenshots?
If applicable, add screenshots to help explain your problem.
Do you have any additional context?
Add any other context about the problem.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status