datafusion.optimizer.repartition_file_scans enabled by default#5295
datafusion.optimizer.repartition_file_scans enabled by default#5295alamb merged 1 commit intoapache:mainfrom
datafusion.optimizer.repartition_file_scans enabled by default#5295Conversation
|
What do you think @tustvold @Dandandan @andygrove -- any concerns about turning on automatic repartitioned file scans (which allows scanning a single large parquet file in parallel) |
|
This should help with mitigating the "low-performance" impression -- many people will not dig deep to configuration options and simply try things out with OOTB defaults. |
|
I will plan to merge this sometime over the weekend unless anyone else would like time to comment or offer more thoughts |
Look great to me. |
|
Benchmark runs are scheduled for baseline = cfbb14d and contender = 222205d. 222205d is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #5125.
Rationale for this change
I guess, it's fine to enable repartitioning by default in 19.0.0 (or, more likely, first release candidate for 19.0.0)
What changes are included in this PR?
Default value of
datafusion.optimizer.repartition_file_scansistruenow.Are these changes tested?
Covered by existing tests
Are there any user-facing changes?
Repartitioning of file scans will be enabled by default