Skip to content

Conversation

@comphead
Copy link
Contributor

Which issue does this PR close?

Closes #10100 .

Rationale for this change

Basically fix for #10380 fixed the issue, I just fixing also the usage info for SMJ

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

@comphead
Copy link
Contributor Author

I checked the TPCH benchmarks passes with SMJ on and row counts are the same

RUST_BACKTRACE=1 RESULTS_NAME=smj ./benchmarks/bench.sh run tpch_smj
RUST_BACKTRACE=1 RESULTS_NAME=hj ./benchmarks/bench.sh run tpch
RUST_BACKTRACE=1 RESULTS_NAME=smj10 ./benchmarks/bench.sh run tpch_smj10
RUST_BACKTRACE=1 RESULTS_NAME=hj10 ./benchmarks/bench.sh run tpch10

tpch_mem: TPCH inspired benchmark on Scale Factor (SF) 1 (~1GB), query from memory
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table
tpch10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table, hash join
tpch_smj10: TPCH inspired benchmark on Scale Factor (SF) 10 (~10GB), single parquet file per table, sort merge join
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm planning to get rid of tpch_smj* soon and get the hash join type from the user input any bench can run with a choice of join type

@github-actions
Copy link

Benchmark results

Benchmarks comparing d6ddd23 (main) and 8353d20 (PR)
Comparing d6ddd23 and 8353d20
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  d6ddd23 ┃  8353d20 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 311.23ms │ 314.48ms │    no change │
│ QQuery 2     │  39.75ms │  44.90ms │ 1.13x slower │
│ QQuery 3     │  58.71ms │  59.99ms │    no change │
│ QQuery 4     │  83.26ms │  85.53ms │    no change │
│ QQuery 5     │  97.94ms │ 100.15ms │    no change │
│ QQuery 6     │  15.20ms │  15.67ms │    no change │
│ QQuery 7     │ 215.63ms │ 217.48ms │    no change │
│ QQuery 8     │  40.10ms │  40.95ms │    no change │
│ QQuery 9     │ 117.77ms │ 118.47ms │    no change │
│ QQuery 10    │ 104.43ms │ 101.81ms │    no change │
│ QQuery 11    │  75.79ms │  77.27ms │    no change │
│ QQuery 12    │  60.18ms │  59.87ms │    no change │
│ QQuery 13    │ 112.28ms │ 109.35ms │    no change │
│ QQuery 14    │  18.76ms │  18.58ms │    no change │
│ QQuery 15    │  30.72ms │  30.86ms │    no change │
│ QQuery 16    │  46.01ms │  45.91ms │    no change │
│ QQuery 17    │ 167.60ms │ 164.57ms │    no change │
│ QQuery 18    │ 465.70ms │ 545.63ms │ 1.17x slower │
│ QQuery 19    │  61.25ms │  60.38ms │    no change │
│ QQuery 20    │ 116.80ms │ 120.37ms │    no change │
│ QQuery 21    │ 335.60ms │ 342.52ms │    no change │
│ QQuery 22    │  30.19ms │  30.47ms │    no change │
└──────────────┴──────────┴──────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 2604.90ms │
│ Total Time (8353d20)   │ 2705.21ms │
│ Average Time (d6ddd23) │  118.40ms │
│ Average Time (8353d20) │  122.96ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         2 │
│ Queries with No Change │        20 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query        ┃  d6ddd23 ┃  8353d20 ┃       Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1     │ 453.39ms │ 459.54ms │    no change │
│ QQuery 2     │  55.09ms │  57.11ms │    no change │
│ QQuery 3     │ 142.59ms │ 145.54ms │    no change │
│ QQuery 4     │  88.22ms │  89.39ms │    no change │
│ QQuery 5     │ 200.19ms │ 204.77ms │    no change │
│ QQuery 6     │ 105.65ms │ 105.17ms │    no change │
│ QQuery 7     │ 273.96ms │ 287.56ms │    no change │
│ QQuery 8     │ 182.75ms │ 179.28ms │    no change │
│ QQuery 9     │ 283.66ms │ 295.45ms │    no change │
│ QQuery 10    │ 228.35ms │ 233.43ms │    no change │
│ QQuery 11    │  41.04ms │  41.67ms │    no change │
│ QQuery 12    │ 127.32ms │ 129.31ms │    no change │
│ QQuery 13    │ 177.37ms │ 183.10ms │    no change │
│ QQuery 14    │ 124.31ms │ 124.03ms │    no change │
│ QQuery 15    │ 183.90ms │ 186.47ms │    no change │
│ QQuery 16    │  49.60ms │  49.47ms │    no change │
│ QQuery 17    │ 313.09ms │ 321.38ms │    no change │
│ QQuery 18    │ 447.60ms │ 493.86ms │ 1.10x slower │
│ QQuery 19    │ 226.96ms │ 228.16ms │    no change │
│ QQuery 20    │ 189.06ms │ 195.03ms │    no change │
│ QQuery 21    │ 317.85ms │ 315.83ms │    no change │
│ QQuery 22    │  40.09ms │  40.57ms │    no change │
└──────────────┴──────────┴──────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary      ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 4252.06ms │
│ Total Time (8353d20)   │ 4366.11ms │
│ Average Time (d6ddd23) │  193.28ms │
│ Average Time (8353d20) │  198.46ms │
│ Queries Faster         │         0 │
│ Queries Slower         │         1 │
│ Queries with No Change │        21 │
└────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Query        ┃   d6ddd23 ┃   8353d20 ┃    Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━┩
│ QQuery 1     │ 4470.71ms │ 4454.97ms │ no change │
│ QQuery 2     │  512.15ms │  491.60ms │ no change │
│ QQuery 3     │ 1709.79ms │ 1718.17ms │ no change │
│ QQuery 4     │  835.04ms │  831.02ms │ no change │
│ QQuery 5     │ 2157.94ms │ 2179.48ms │ no change │
│ QQuery 6     │ 1005.84ms │ 1005.22ms │ no change │
│ QQuery 7     │ 3452.80ms │ 3556.18ms │ no change │
│ QQuery 8     │ 2463.24ms │ 2497.21ms │ no change │
│ QQuery 9     │ 3975.42ms │ 3996.31ms │ no change │
│ QQuery 10    │ 2480.86ms │ 2486.30ms │ no change │
│ QQuery 11    │  343.56ms │  346.09ms │ no change │
│ QQuery 12    │ 1222.34ms │ 1224.75ms │ no change │
│ QQuery 13    │ 2313.42ms │ 2286.39ms │ no change │
│ QQuery 14    │ 1249.23ms │ 1263.20ms │ no change │
│ QQuery 15    │ 1908.59ms │ 1903.24ms │ no change │
│ QQuery 16    │  516.33ms │  509.42ms │ no change │
│ QQuery 17    │ 5413.51ms │ 5443.66ms │ no change │
│ QQuery 18    │ 6777.95ms │ 6896.24ms │ no change │
│ QQuery 19    │ 2243.45ms │ 2267.72ms │ no change │
│ QQuery 20    │ 2615.21ms │ 2579.24ms │ no change │
│ QQuery 21    │ 4479.53ms │ 4403.21ms │ no change │
│ QQuery 22    │  468.13ms │  451.80ms │ no change │
└──────────────┴───────────┴───────────┴───────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary      ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (d6ddd23)   │ 52615.04ms │
│ Total Time (8353d20)   │ 52791.42ms │
│ Average Time (d6ddd23) │  2391.59ms │
│ Average Time (8353d20) │  2399.61ms │
│ Queries Faster         │          0 │
│ Queries Slower         │          0 │
│ Queries with No Change │         22 │
└────────────────────────┴────────────┘

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @comphead

@alamb alamb merged commit 3777114 into apache:main Jun 1, 2024
findepi pushed a commit to findepi/datafusion that referenced this pull request Jul 16, 2024
* Fix: Sort Merge Join crashes on TPCH Q21

* Fix LeftAnti SMJ join when the join filter is set

* rm dbg

* Add SMJ to TPCH benchmark usage
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix Sort Merge Join to pass TPCH tests

2 participants