[SPARK-48030][SQL] SPJ: cache rowOrdering and structType for InternalRowComparableWrapper#46265
[SPARK-48030][SQL] SPJ: cache rowOrdering and structType for InternalRowComparableWrapper#46265advancedxy wants to merge 1 commit intoapache:masterfrom
Conversation
|
Before applying changes in this PR, the benchmark code(in this PR) took: After this PR: |
|
@sunchao, @szehon-ho and @yabola would you mind to take a look at this and help review this one. |
There was a problem hiding this comment.
1024 should be sufficient, it could be a SQL configuration though.
There was a problem hiding this comment.
Makes sense to me to have a config, though not familiar with spark preference for these things.
hmmm, maybe. However, the memory usage of cache should be relatively low. Let's wait for other people's opinions |
|
@advancedxy It appears that this PR can enhance performance when there are a large number of partitions. Could you please share the test results from a real DatasourceV2 table, such as Iceberg? |
I cannot share the exact numbers. However, I have described the estimated number of partitions(~N00_000, where N <=2)and time to plan in the jira. After this patch, the planning time should be dropped to seconds(from tens of minutes). Hope that helps. |
szehon-ho
left a comment
There was a problem hiding this comment.
Sounds like a good find and promising results on real table.
There was a problem hiding this comment.
Makes sense to me to have a config, though not familiar with spark preference for these things.
sunchao
left a comment
There was a problem hiding this comment.
LGTM. It would be nice to have something to configure this but I don't think it is super important. I feel the default value should be more than enough for most use cases? Similarly for expireAfterAccess.
|
Thanks! merged to master. |
Thanks for reviewing this. |
### What changes were proposed in this pull request? This PR aims to regenerate benchmark results (except `ExternalAppendOnlyUnsafeRowArrayBenchmark`) as a preparation for Apache Spark 4.0.0-preview2. - During the testing, it's observed that `ExternalAppendOnlyUnsafeRowArrayBenchmark` hangs in both CI and local environment. SPARK-49228 is filed for its investigation. - In addition, `Storage Partition Join`-related benchmark are generated for the following commits. - #46265 - #47426 ### Why are the changes needed? To check the performance regression. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? This is generated by - https://github.com/dongjoon-hyun/spark/actions/runs/10364365815 (Java 17) - https://github.com/dongjoon-hyun/spark/actions/runs/10364368441 (Java 21) Manual review. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47743 from dongjoon-hyun/SPARK-49224. Authored-by: Dongjoon Hyun <dhyun@apple.com> Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
What changes were proposed in this pull request?
Cache rowOrdering and structType for InternalRowComparableWrapper
Why are the changes needed?
For performance improvement
Does this PR introduce any user-facing change?
NO
How was this patch tested?
Added a new benchmark to verify the performance improvement
Was this patch authored or co-authored using generative AI tooling?
NO