Skip to content

fix: Sort on single struct should fallback to Spark#811

Merged
viirya merged 1 commit intoapache:mainfrom
viirya:fix_sort
Aug 12, 2024
Merged

fix: Sort on single struct should fallback to Spark#811
viirya merged 1 commit intoapache:mainfrom
viirya:fix_sort

Conversation

@viirya
Copy link
Copy Markdown
Member

@viirya viirya commented Aug 11, 2024

Which issue does this PR close?

Closes #807.

Rationale for this change

What changes are included in this PR?

How are these changes tested?

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 33.80%. Comparing base (4fe43ad) to head (44855de).

Additional details and impacted files
@@             Coverage Diff              @@
##               main     #811      +/-   ##
============================================
- Coverage     33.94%   33.80%   -0.14%     
+ Complexity      874      870       -4     
============================================
  Files           112      112              
  Lines         42916    42914       -2     
  Branches       9464     9452      -12     
============================================
- Hits          14567    14507      -60     
- Misses        25379    25428      +49     
- Partials       2970     2979       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

| spark.comet.scan.preFetch.enabled | Whether to enable pre-fetching feature of CometScan. By default is disabled. | false |
| spark.comet.scan.preFetch.threadNum | The number of threads running pre-fetching for CometScan. Effective if spark.comet.scan.preFetch.enabled is enabled. By default it is 2. Note that more pre-fetching threads means more memory requirement to store pre-fetched row groups. | 2 |
| spark.comet.shuffle.preferDictionary.ratio | The ratio of total values to distinct values in a string column to decide whether to prefer dictionary encoding when shuffling the column. If the ratio is higher than this config, dictionary encoding will be used on shuffling string column. This config is effective if it is higher than 1.0. By default, this config is 10.0. Note that this config is only used when `spark.comet.exec.shuffle.mode` is `jvm`. | 10.0 |
| spark.comet.sparkToColumnar.supportedOperatorList | A comma-separated list of operators that will be converted to Comet columnar format when 'spark.comet.sparkToColumnar.enabled' is true | Range,InMemoryTableScan |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Shall we use ` instead of '

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changed by this PR. I think there is previous PR changing it, but didn't update the document.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document is updated automatically when make release locally.

@viirya viirya merged commit 071c780 into apache:main Aug 12, 2024
@viirya
Copy link
Copy Markdown
Member Author

viirya commented Aug 12, 2024

Thanks @huaxingao

@viirya viirya deleted the fix_sort branch August 12, 2024 05:48
if isCometOperatorEnabled(op.conf, CometConf.OPERATOR_SORT) =>
// TODO: Remove this constraint when we upgrade to new arrow-rs including
// https://github.com/apache/arrow-rs/pull/6225
if (child.output.length == 1 && child.output.head.dataType.isInstanceOf[StructType]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we add support for other types, do we need to update this to make it recursive so that we check for Map or Array containing struct?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add more data types here according to arrow-rs.

himadripal pushed a commit to himadripal/datafusion-comet that referenced this pull request Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fallback to Spark if sort on unsupported cases

4 participants