Skip to content

fix: CometScanExec on Spark 3.5.2#915

Merged
andygrove merged 7 commits intoapache:mainfrom
Kimahriman:scan-operator-id
Sep 24, 2024
Merged

fix: CometScanExec on Spark 3.5.2#915
andygrove merged 7 commits intoapache:mainfrom
Kimahriman:scan-operator-id

Conversation

@Kimahriman
Copy link
Copy Markdown
Contributor

@Kimahriman Kimahriman commented Sep 5, 2024

Which issue does this PR close?

Closes #912

Rationale for this change

Fixes CometScanExec running on Spark 3.5.2+. Currently it will fail with a runtime exception, and will fail to compile if specifying 3.5.2 with

[ERROR] /Users/abinford/projects/arrow-datafusion-comet/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala:144: value OP_ID_TAG is not a member of object org.apache.spark.sql.catalyst.plans.QueryPlan

This is because OP_ID_TAG is removed in Spark 3.5.2+, and the operator ID tracking is replaced with a separate internal map of plan -> ID, so there's no way to manually pass the ID on to a delegating plan. Instead simply copies the implementation of DataSourceScanExec's method.

What changes are included in this PR?

The only effect of the change is the verbose string output for CometScanExec. Instead of delegating to the underlying DataSourceScanExec, just copied the implementation over. This is in line with other Comet operators that implement their own verbose string, and makes more sense in the formatted explain as the operator names line up.

Before:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(1) Scan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-a0b56a3c-caa2-4631-ad15-ec83e8522948/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

After:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(1) CometScan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-36667e9a-75e0-4f90-8442-5f47ecd1cf4b/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

How are these changes tested?

Manually verified by building with Spark 3.5.2 ./mvnw clean package -Pspark-3.5 -Dspark.version=3.5.2 -DskipTests.

@Kimahriman
Copy link
Copy Markdown
Contributor Author

Oof this breaks a lot of explain plan comparison tests. If this change is ok I can try to work on updating them

@kazuyukitanimura
Copy link
Copy Markdown
Contributor

Is the change in the metadata? If so, should we fix the metadata instead?

@kazuyukitanimura
Copy link
Copy Markdown
Contributor

kazuyukitanimura commented Sep 11, 2024

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

@Kimahriman
Copy link
Copy Markdown
Contributor Author

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Agreed. I thought about doing that here but I wasn't sure the best way to go about updating the diff for the Spark SQL tests

@kazuyukitanimura
Copy link
Copy Markdown
Contributor

kazuyukitanimura commented Sep 13, 2024

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

I meant CometScanExec.metadata inherited from DataSourceScanExec

override def verboseStringWithOperatorId(): String = {
getTagValue(QueryPlan.OP_ID_TAG).foreach(id => wrapped.setTagValue(QueryPlan.OP_ID_TAG, id))
wrapped.verboseStringWithOperatorId()
val metadataStr = metadata.toSeq.sorted
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the original issue was that OP_ID_TAG has been removed, is it not sufficient to just remove the offending line

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just removing setting the OP_ID_TAG leads to a formatted explain of:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(unknown) Scan parquet 
Output [20]: [_1#0, _2#1, _3#2, _4#3, _5#4L, _6#5, _7#6, _8#7, _9#8, _10#9, _11#10L, _12#11, _13#12, _14#13, _15#14, _16#15, _17#16, _18#17, _19#18, _20#19]
Batched: true
Location: InMemoryFileIndex [file:/Users/abinford/projects/arrow-datafusion-comet/spark/target/tmp/spark-c0b82b7c-3de1-431b-96bf-56cb37f3a463/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#0, _2#1, _3#2, _4#3, _5#4L, _6#5, _7#6, _8#7, _9#8, _10#9, _11#10L, _12#11, _13#12, _14#13, _15#14, _16#15, _17#16, _18#17, _19#18, _20#19]

which is the whole reason it was added in the first place, to fix the unknown operator ID bc35fa5

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus by what I could see, every other operator is prefixed with Comet in the formatted explain, so it's weird for Scan to be the one thing that doesn't actually match up to the physical plan

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that makes sense.
On an unrelated note, wonder why CometScanExec extends DataSourceScanExec instead of FileSourceScanExec (if it had we would have got the verboseString for free)

Copy link
Copy Markdown
Contributor

@parthchandra parthchandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Copy Markdown
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @Kimahriman

@andygrove
Copy link
Copy Markdown
Member

@Kimahriman would you be able to rebase this PR so that we can merge it?

@Kimahriman
Copy link
Copy Markdown
Contributor Author

@Kimahriman would you be able to rebase this PR so that we can merge it?

Oof 95 conflicts I would have to manually resolve, let me just regenerate these plans all again tonight

@Kimahriman
Copy link
Copy Markdown
Contributor Author

Ok wasn't too bad to find/replace fix the issues again, we'll see if I messed anything up in the CI

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Sep 20, 2024

Codecov Report

❌ Patch coverage is 70.58824% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.05%. Comparing base (fa275f1) to head (2757186).
⚠️ Report is 783 commits behind head on main.

Files with missing lines Patch % Lines
...ala/org/apache/spark/sql/comet/CometScanExec.scala 70.58% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #915      +/-   ##
============================================
- Coverage     34.16%   34.05%   -0.12%     
+ Complexity      880      879       -1     
============================================
  Files           112      112              
  Lines         43286    43301      +15     
  Branches       9572     9578       +6     
============================================
- Hits          14789    14745      -44     
- Misses        25478    25518      +40     
- Partials       3019     3038      +19     
Flag Coverage Δ
34.05% <70.58%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Kimahriman
Copy link
Copy Markdown
Contributor Author

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

@andygrove
Copy link
Copy Markdown
Member

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

Thanks. I am re-running the failed jobs now.

@andygrove andygrove merged commit 5b3f7bc into apache:main Sep 24, 2024
coderfender pushed a commit to coderfender/datafusion-comet that referenced this pull request Dec 13, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes apache#123` indicates that this PR will close issue apache#123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Apply OSS 0.3.0 changes.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

```
84cccf7 docs: Add notes for IntelliJ code size limits for code inspections. (apache#985)
dcc4a8a fix: The spilled_bytes metric of CometSortExec should be size instead of time (apache#984)
f64553b chore: fix compatibility guide (apache#978)
0ee7df8 chore: Enable additional CreateArray tests (apache#928)
a690e9d perf: Remove one redundant CopyExec for SMJ (apache#962)
a8156b5 chore: update rem expression guide (apache#976)
317a534 fix: Use the number of rows from underlying arrays instead of logical row count from RecordBatch (apache#972)
22561c4 doc: add documentation interlinks (apache#975)
b4de8e0 chore: Update benchmarks results based on 0.3.0-rc1 (apache#969)
94093f3 chore: fix publish-to-maven script (apache#966)
f31f6cc Generate changelog for 0.3.0 release (apache#964)
5663fc2 fix: div and rem by negative zero (apache#960)
50517f6 perf: Optimize decimal precision check in decimal aggregates (sum and avg) (apache#952)
5b3f7bc fix: CometScanExec on Spark 3.5.2 (apache#915)
8410c71 chore: clarify tarball installation (apache#959)
459b2b0 fix: window function range offset should be long instead of int (apache#733)
```


## How are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CometScanExec fails on Spark 3.5.2

5 participants