fix: Sort on single struct should fallback to Spark by viirya · Pull Request #811 · apache/datafusion-comet

viirya · 2024-08-11T23:13:00Z

Which issue does this PR close?

Closes #807.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

codecov-commenter · 2024-08-12T00:04:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.80%. Comparing base (4fe43ad) to head (44855de).

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #811      +/-   ##
============================================
- Coverage     33.94%   33.80%   -0.14%     
+ Complexity      874      870       -4     
============================================
  Files           112      112              
  Lines         42916    42914       -2     
  Branches       9464     9452      -12     
============================================
- Hits          14567    14507      -60     
- Misses        25379    25428      +49     
- Partials       2970     2979       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

huaxingao · 2024-08-12T05:45:21Z

 | spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. By default is disabled. | false |
 | spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
 | spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
+| spark.comet.sparkToColumnar.supportedOperatorList | A comma-separated list of operators that will be converted to Comet columnar format when 'spark.comet.sparkToColumnar.enabled' is true | Range,InMemoryTableScan |


nit: Shall we use ` instead of '

This is not changed by this PR. I think there is previous PR changing it, but didn't update the document.

The document is updated automatically when make release locally.

viirya · 2024-08-12T05:48:03Z

Thanks @huaxingao

andygrove · 2024-08-12T13:22:02Z

          if isCometOperatorEnabled(op.conf, CometConf.OPERATOR_SORT) =>
+        // TODO: Remove this constraint when we upgrade to new arrow-rs including
+        // https://github.com/apache/arrow-rs/pull/6225
+        if (child.output.length == 1 && child.output.head.dataType.isInstanceOf[StructType]) {


As we add support for other types, do we need to update this to make it recursive so that we check for Map or Array containing struct?

Let me add more data types here according to arrow-rs.

(cherry picked from commit 071c780)

fix: Sort on single struct should fallback to Spark

44855de

viirya requested review from andygrove, huaxingao and kazuyukitanimura August 12, 2024 02:21

huaxingao reviewed Aug 12, 2024

View reviewed changes

huaxingao approved these changes Aug 12, 2024

View reviewed changes

viirya merged commit 071c780 into apache:main Aug 12, 2024

viirya deleted the fix_sort branch August 12, 2024 05:48

andygrove reviewed Aug 12, 2024

View reviewed changes

This was referenced Aug 13, 2024

Check sort order of SortExec instead of child output #822

Closed

fix: Check sort order of SortExec instead of child output #821

Merged

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024

fix: Sort on single struct should fallback to Spark (apache#811)

8452ef8

(cherry picked from commit 071c780)

mbutrovich mentioned this pull request Jun 6, 2025

Relax sort fallback constraints #1854

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Sort on single struct should fallback to Spark#811

fix: Sort on single struct should fallback to Spark#811
viirya merged 1 commit intoapache:mainfrom
viirya:fix_sort

viirya commented Aug 11, 2024

Uh oh!

codecov-commenter commented Aug 12, 2024

Uh oh!

huaxingao Aug 12, 2024

Uh oh!

viirya Aug 12, 2024

Uh oh!

viirya Aug 12, 2024

Uh oh!

viirya commented Aug 12, 2024

Uh oh!

andygrove Aug 12, 2024

Uh oh!

viirya Aug 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

viirya commented Aug 11, 2024

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

codecov-commenter commented Aug 12, 2024

Codecov Report

Uh oh!

huaxingao Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

viirya commented Aug 12, 2024

Uh oh!

andygrove Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

viirya Aug 12, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants