Affected Version
If SQL Null Handling feature is turned on:
- Default since v28
- Since v17 with feature flag enabled
Description
I am currently running v27 image on my clusters. As part of process to upgrade to later versions, I tried to test out the SQL Null-Compatible feature.
For the case where my table does not have any null values, I come across a performance degradation.
The following benchmark is conducted for a cluster with the same resource. It is found that the Historical + Peon querying latency is a contributor for the difference in querying speed.
Cache Disabled
| Description |
V27 (JDK17, Null Disabled) |
V27 (JDK17, Null Enabled) |
V37 (JDK17) |
V37 (JDK21) |
| Select 1 Random Row |
5.455 |
5.142 |
5.416 |
5.414 |
| Select Count (1h) |
5.446 |
5.154 |
5.428 |
5.469 |
| Cardinality (1h Unique) |
6.600 |
6.056 |
6.003 |
5.808 |
| Time-Series (12h PT5M) |
5.592 |
5.623 |
7.769 |
5.552 |
| Heavy Scan (24h Agg) |
14.957 |
16.325 |
19.847 |
11.830 |
| TopN Ranking (24h) |
12.159 |
17.245 |
18.903 |
16.707 |
Cache Enabled
| Description |
V27 (JDK17, Null Disabled) |
V27 (JDK17, Null Enabled) |
V37 (JDK17) |
V37 (JDK21) |
| Select 1 Random Row |
5.318 |
5.198 |
5.288 |
5.433 |
| Select Count (1h) |
5.347 |
5.215 |
5.317 |
5.504 |
| Cardinality (1h Unique) |
7.795 |
6.119 |
6.086 |
5.844 |
| Time-Series (12h PT5M) |
5.643 |
5.362 |
5.392 |
5.550 |
| Heavy Scan (24h Agg) |
5.441 |
5.686 |
5.412 |
5.581 |
| TopN Ranking (24h) |
5.628 |
6.170 |
6.076 |
6.237 |
SQL Statements
Select 1 Random Row
SELECT __time, id
FROM "<TABLE_NAME>"
ORDER BY __time DESC
LIMIT 1;
Select Count (1h)
SELECT COUNT(*)
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR;
Cardinality (1h Unique)
SELECT COUNT(DISTINCT id)
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' HOUR;
Time-Series (12h PT5M)
SELECT TIME_FLOOR(__time, 'PT5M') AS bucket_5m, idc, SUM("count") AS total_count
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '12' HOUR
GROUP BY 1, 2;
Heavy Scan (24h Agg)
SELECT idc, SUM("count") AS total_count, SUM(sum_rtt) / SUM("count") AS avg_rtt
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY 1;
TopN Ranking (24h)
SELECT id, SUM(sum_rtt) AS total_sum_rtt
FROM "<TABLE_NAME>"
WHERE __time >= CURRENT_TIMESTAMP - INTERVAL '1' DAY
GROUP BY 1
ORDER BY 2 DESC
LIMIT 100;
Cause Analysis
When enabling null handling, we will need an extra check to deduce whether the current row in the column is null. While the isNull method is cheap, it is called for every single row.
// NullableNumericTopNColumnAggregatesProcessor.java
if (hasNulls && selector.isNull()) { // V27
if (selector.isNull()) { // V37
// REDACTED for brevity
} else {
Aggregator[] valueAggregates = getValueAggregators(query, selector, cursor);
for (Aggregator aggregator : valueAggregates) {
aggregator.aggregate();
}
}
Asking AI to estimate the latency for each isNull method, for a long value, the execution time is 1~3ns. This may be fast, but since the check is run for every row, this resulted in substantial increase for querying time.
For example, the total number of rows in 1 day for the TopN / Heavy Scan query is 2,147,995,191. If we estimate each operation to take 1-3ns, this matches expectations of the query being 2-6s faster.
Suggestions
Regarding suggestions for the fix, please verify if these are feasible:
- Store something in the metadata of segments to indicate which columns will not have any null values? We can then provide non-null implementations that fast-track to processing.
- Some work on the compaction side, prioritize compacting segments with null values together if possible?
- Other suggestions / feedback?
Thanks!
Affected Version
If SQL Null Handling feature is turned on:
Description
I am currently running v27 image on my clusters. As part of process to upgrade to later versions, I tried to test out the SQL Null-Compatible feature.
For the case where my table does not have any null values, I come across a performance degradation.
The following benchmark is conducted for a cluster with the same resource. It is found that the Historical + Peon querying latency is a contributor for the difference in querying speed.
Cache Disabled
Cache Enabled
SQL Statements
Select 1 Random Row
Select Count (1h)
Cardinality (1h Unique)
Time-Series (12h PT5M)
Heavy Scan (24h Agg)
TopN Ranking (24h)
Cause Analysis
When enabling null handling, we will need an extra check to deduce whether the current row in the column is null. While the
isNullmethod is cheap, it is called for every single row.Asking AI to estimate the latency for each
isNullmethod, for a long value, the execution time is 1~3ns. This may be fast, but since the check is run for every row, this resulted in substantial increase for querying time.For example, the total number of rows in 1 day for the TopN / Heavy Scan query is 2,147,995,191. If we estimate each operation to take 1-3ns, this matches expectations of the query being 2-6s faster.
Suggestions
Regarding suggestions for the fix, please verify if these are feasible:
Thanks!