benchmark: Support sort_tpch10 for benchmark #16671

zhuqi-lucas · 2025-07-03T11:30:22Z

Which issue does this PR close?

Currently we only have sort_tpch for benchmark, recently when optimizing sort, i found sort_tpch10 will show more stable result sometimes, so i added this to benchmark script also.

cc @alamb @Dandandan

Rationale for this change

Sometimes, i found sort_tpch10 will get the more accurate or good result when we optimize the merge part, because our in_mem sort buffer is 1MB, so the sort_tpch will have less count for merge compare count, i added the sort_tpch10 to bench.sh, hope it will be helpful.

Currently we only have sort_tpch for benchmark, recently when optimizing sort, i found sort_tpch10 will show more stable result sometimes, so i added this to benchmark script also.

What changes are included in this PR?

Sometimes, i found sort_tpch10 will get the more accurate or good result when we optimize the merge part, because our in_mem sort buffer is 1MB, so the sort_tpch will have less count for merge compare count, i added the sort_tpch10 to bench.sh, hope it will be helpful.

Currently we only have sort_tpch for benchmark, recently when optimizing sort, i found sort_tpch10 will show more stable result sometimes, so i added this to benchmark script also.

Are these changes tested?

Yes

./bench.sh run sort_tpch10
***************************
DataFusion Benchmark Script
COMMAND: run
BENCHMARK: sort_tpch10
QUERY: All
DATAFUSION_DIR: /Users/zhuqi/arrow-datafusion/benchmarks/..
BRANCH_NAME: sort_tpch_10_benchmark_support
DATA_DIR: /Users/zhuqi/arrow-datafusion/benchmarks/data
RESULTS_DIR: /Users/zhuqi/arrow-datafusion/benchmarks/results/sort_tpch_10_benchmark_support
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************
RESULTS_FILE: /Users/zhuqi/arrow-datafusion/benchmarks/results/sort_tpch_10_benchmark_support/sort_tpch10.json
Running sort tpch benchmark...
+ cargo run --release --bin dfbench -- sort-tpch --iterations 5 --path /Users/zhuqi/arrow-datafusion/benchmarks/data/tpch_sf10 -o /Users/zhuqi/arrow-datafusion/benchmarks/results/sort_tpch_10_benchmark_support/sort_tpch10.json
    Finished `release` profile [optimized] target(s) in 0.35s
     Running `/Users/zhuqi/arrow-datafusion/target/release/dfbench sort-tpch --iterations 5 --path /Users/zhuqi/arrow-datafusion/benchmarks/data/tpch_sf10 -o /Users/zhuqi/arrow-datafusion/benchmarks/results/sort_tpch_10_benchmark_support/sort_tpch10.json`
Q1 iteration 0 took 1554.1 ms and returned 59986052 rows
Q1 iteration 1 took 1532.3 ms and returned 59986052 rows
Q1 iteration 2 took 1518.5 ms and returned 59986052 rows
Q1 iteration 3 took 1528.3 ms and returned 59986052 rows
Q1 iteration 4 took 1533.0 ms and returned 59986052 rows
Q1 avg time: 1533.22 ms
Q2 iteration 0 took 1406.7 ms and returned 59986052 rows
Q2 iteration 1 took 1431.7 ms and returned 59986052 rows
Q2 iteration 2 took 1398.6 ms and returned 59986052 rows
Q2 iteration 3 took 1394.7 ms and returned 59986052 rows
Q2 iteration 4 took 1414.2 ms and returned 59986052 rows
Q2 avg time: 1409.18 ms
Q3 iteration 0 took 6564.8 ms and returned 59986052 rows
Q3 iteration 1 took 6535.8 ms and returned 59986052 rows
Q3 iteration 2 took 6638.9 ms and returned 59986052 rows
Q3 iteration 3 took 6677.1 ms and returned 59986052 rows
Q3 iteration 4 took 6712.8 ms and returned 59986052 rows
Q3 avg time: 6625.86 ms
Q4 iteration 0 took 1963.8 ms and returned 59986052 rows
Q4 iteration 1 took 1967.9 ms and returned 59986052 rows
Q4 iteration 2 took 1931.0 ms and returned 59986052 rows
Q4 iteration 3 took 1925.3 ms and returned 59986052 rows
Q4 iteration 4 took 1946.5 ms and returned 59986052 rows
Q4 avg time: 1946.88 ms
Q5 iteration 0 took 2431.6 ms and returned 59986052 rows
Q5 iteration 1 took 2472.9 ms and returned 59986052 rows
Q5 iteration 2 took 2504.5 ms and returned 59986052 rows
Q5 iteration 3 took 2485.8 ms and returned 59986052 rows
Q5 iteration 4 took 2350.9 ms and returned 59986052 rows
Q5 avg time: 2449.12 ms
Q6 iteration 0 took 2623.9 ms and returned 59986052 rows
Q6 iteration 1 took 2580.0 ms and returned 59986052 rows
Q6 iteration 2 took 2622.9 ms and returned 59986052 rows
Q6 iteration 3 took 2622.1 ms and returned 59986052 rows
Q6 iteration 4 took 2579.9 ms and returned 59986052 rows
Q6 avg time: 2605.76 ms
Q7 iteration 0 took 4385.8 ms and returned 59986052 rows
Q7 iteration 1 took 4385.7 ms and returned 59986052 rows
Q7 iteration 2 took 4233.3 ms and returned 59986052 rows
Q7 iteration 3 took 4209.4 ms and returned 59986052 rows
Q7 iteration 4 took 4233.9 ms and returned 59986052 rows
Q7 avg time: 4289.63 ms
Q8 iteration 0 took 2797.1 ms and returned 59986052 rows
Q8 iteration 1 took 2781.4 ms and returned 59986052 rows
Q8 iteration 2 took 2882.1 ms and returned 59986052 rows
Q8 iteration 3 took 2784.8 ms and returned 59986052 rows
Q8 iteration 4 took 2883.5 ms and returned 59986052 rows
Q8 avg time: 2825.80 ms
Q9 iteration 0 took 2897.6 ms and returned 59986052 rows
Q9 iteration 1 took 3006.3 ms and returned 59986052 rows
Q9 iteration 2 took 2968.0 ms and returned 59986052 rows
Q9 iteration 3 took 2965.9 ms and returned 59986052 rows
Q9 iteration 4 took 2964.7 ms and returned 59986052 rows
Q9 avg time: 2960.51 ms
Q10 iteration 0 took 7662.0 ms and returned 59986052 rows
Q10 iteration 1 took 7396.4 ms and returned 59986052 rows
Q10 iteration 2 took 8013.9 ms and returned 59986052 rows
Q10 iteration 3 took 7553.1 ms and returned 59986052 rows
Q10 iteration 4 took 6845.6 ms and returned 59986052 rows
Q10 avg time: 7494.20 ms
Q11 iteration 0 took 3567.1 ms and returned 59986052 rows
Q11 iteration 1 took 3424.9 ms and returned 59986052 rows
Q11 iteration 2 took 3425.2 ms and returned 59986052 rows
Q11 iteration 3 took 3375.3 ms and returned 59986052 rows
Q11 iteration 4 took 3357.2 ms and returned 59986052 rows
Q11 avg time: 3429.94 ms
+ set +x
Done

Are there any user-facing changes?

No

alamb · 2025-07-03T12:59:17Z

🚀

alamb · 2025-07-03T12:59:30Z

Thanks @zhuqi-lucas and @Dandandan

benchmark: Support sort_tpch10 for benchmark

3686461

zhuqi-lucas mentioned this pull request Jul 3, 2025

Reuse Rows allocation in RowCursorStream #16647

Merged

Dandandan approved these changes Jul 3, 2025

View reviewed changes

Dandandan merged commit 3ca09a6 into apache:main Jul 3, 2025
29 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

benchmark: Support sort_tpch10 for benchmark #16671

benchmark: Support sort_tpch10 for benchmark #16671

Uh oh!

zhuqi-lucas commented Jul 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

benchmark: Support sort_tpch10 for benchmark #16671

benchmark: Support sort_tpch10 for benchmark #16671

Uh oh!

Conversation

zhuqi-lucas commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

alamb commented Jul 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhuqi-lucas commented Jul 3, 2025 •

edited

Loading