fix: CometScanExec on Spark 3.5.2 by Kimahriman · Pull Request #915 · apache/datafusion-comet

Kimahriman · 2024-09-05T14:50:26Z

Which issue does this PR close?

Closes #912

Rationale for this change

Fixes CometScanExec running on Spark 3.5.2+. Currently it will fail with a runtime exception, and will fail to compile if specifying 3.5.2 with

[ERROR] /Users/abinford/projects/arrow-datafusion-comet/spark/src/main/scala/org/apache/spark/sql/comet/CometScanExec.scala:144: value OP_ID_TAG is not a member of object org.apache.spark.sql.catalyst.plans.QueryPlan

This is because OP_ID_TAG is removed in Spark 3.5.2+, and the operator ID tracking is replaced with a separate internal map of plan -> ID, so there's no way to manually pass the ID on to a delegating plan. Instead simply copies the implementation of DataSourceScanExec's method.

What changes are included in this PR?

The only effect of the change is the verbose string output for CometScanExec. Instead of delegating to the underlying DataSourceScanExec, just copied the implementation over. This is in line with other Comet operators that implement their own verbose string, and makes more sense in the formatted explain as the operator names line up.

Before:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(1) Scan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-a0b56a3c-caa2-4631-ad15-ec83e8522948/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

After:

== Physical Plan ==
* ColumnarToRow (2)
+- CometScan parquet  (1)


(1) CometScan parquet 
Output [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]
Batched: true
Location: InMemoryFileIndex [file:.../arrow-datafusion-comet/spark/target/tmp/spark-36667e9a-75e0-4f90-8442-5f47ecd1cf4b/test.parquet]
ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date>

(2) ColumnarToRow [codegen id : 1]
Input [20]: [_1#146, _2#147, _3#148, _4#149, _5#150L, _6#151, _7#152, _8#153, _9#154, _10#155, _11#156L, _12#157, _13#158, _14#159, _15#160, _16#161, _17#162, _18#163, _19#164, _20#165]

How are these changes tested?

Manually verified by building with Spark 3.5.2 ./mvnw clean package -Pspark-3.5 -Dspark.version=3.5.2 -DskipTests.

Kimahriman · 2024-09-05T16:10:25Z

Oof this breaks a lot of explain plan comparison tests. If this change is ok I can try to work on updating them

kazuyukitanimura · 2024-09-11T23:04:10Z

Is the change in the metadata? If so, should we fix the metadata instead?

kazuyukitanimura · 2024-09-11T23:08:05Z

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Kimahriman · 2024-09-11T23:24:00Z

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

And perhaps we can try to upgrade the Comet dependency to Spark 3.5.2 (separately)

Agreed. I thought about doing that here but I wasn't sure the best way to go about updating the diff for the Spark SQL tests

kazuyukitanimura · 2024-09-13T23:29:14Z

Is the change in the metadata? If so, should we fix the metadata instead?

Are you referring to the compilation error or the change in explain output?

I meant CometScanExec.metadata inherited from DataSourceScanExec

parthchandra · 2024-09-13T23:38:16Z

  override def verboseStringWithOperatorId(): String = {
-    getTagValue(QueryPlan.OP_ID_TAG).foreach(id => wrapped.setTagValue(QueryPlan.OP_ID_TAG, id))
-    wrapped.verboseStringWithOperatorId()
+    val metadataStr = metadata.toSeq.sorted


Since the original issue was that OP_ID_TAG has been removed, is it not sufficient to just remove the offending line

Just removing setting the OP_ID_TAG leads to a formatted explain of:

== Physical Plan == * ColumnarToRow (2) +- CometScan parquet (1) (unknown) Scan parquet Output [20]: [_1#0, _2#1, _3#2, _4#3, _5#4L, _6#5, _7#6, _8#7, _9#8, _10#9, _11#10L, _12#11, _13#12, _14#13, _15#14, _16#15, _17#16, _18#17, _19#18, _20#19] Batched: true Location: InMemoryFileIndex [file:/Users/abinford/projects/arrow-datafusion-comet/spark/target/tmp/spark-c0b82b7c-3de1-431b-96bf-56cb37f3a463/test.parquet] ReadSchema: struct<_1:boolean,_2:tinyint,_3:smallint,_4:int,_5:bigint,_6:float,_7:double,_8:string,_9:smallint,_10:int,_11:bigint,_12:decimal(20,0),_13:string,_14:binary,_15:decimal(5,2),_16:decimal(18,10),_17:decimal(38,37),_18:timestamp,_19:timestamp,_20:date> (2) ColumnarToRow [codegen id : 1] Input [20]: [_1#0, _2#1, _3#2, _4#3, _5#4L, _6#5, _7#6, _8#7, _9#8, _10#9, _11#10L, _12#11, _13#12, _14#13, _15#14, _16#15, _17#16, _18#17, _19#18, _20#19]

which is the whole reason it was added in the first place, to fix the unknown operator ID bc35fa5

Plus by what I could see, every other operator is prefixed with Comet in the formatted explain, so it's weird for Scan to be the one thing that doesn't actually match up to the physical plan

I see, that makes sense.
On an unrelated note, wonder why CometScanExec extends DataSourceScanExec instead of FileSourceScanExec (if it had we would have got the verboseString for free)

parthchandra

lgtm

andygrove

LGTM. Thanks @Kimahriman

andygrove · 2024-09-20T14:19:20Z

@Kimahriman would you be able to rebase this PR so that we can merge it?

Kimahriman · 2024-09-20T18:56:58Z

@Kimahriman would you be able to rebase this PR so that we can merge it?

Oof 95 conflicts I would have to manually resolve, let me just regenerate these plans all again tonight

Kimahriman · 2024-09-20T19:26:29Z

Ok wasn't too bad to find/replace fix the issues again, we'll see if I messed anything up in the CI

codecov-commenter · 2024-09-20T20:18:47Z

Codecov Report

❌ Patch coverage is 70.58824% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.05%. Comparing base (fa275f1) to head (2757186).
⚠️ Report is 783 commits behind head on main.

Files with missing lines	Patch %	Lines
...ala/org/apache/spark/sql/comet/CometScanExec.scala	70.58%	0 Missing and 5 partials ⚠️

Additional details and impacted files

@@             Coverage Diff              @@
##               main     #915      +/-   ##
============================================
- Coverage     34.16%   34.05%   -0.12%     
+ Complexity      880      879       -1     
============================================
  Files           112      112              
  Lines         43286    43301      +15     
  Branches       9572     9578       +6     
============================================
- Hits          14789    14745      -44     
- Misses        25478    25518      +40     
- Partials       3019     3038      +19

Flag	Coverage Δ
	`34.05% <70.58%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Kimahriman · 2024-09-20T23:01:22Z

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

andygrove · 2024-09-24T09:54:57Z

Done, I think CI failures are unrelated, looks like failures downloading dependencies in Hive tests

Thanks. I am re-running the failed jobs now.

## Which issue does this PR close?  N/A ## Rationale for this change  Apply OSS 0.3.0 changes. ## What changes are included in this PR?  ``` 84cccf7 docs: Add notes for IntelliJ code size limits for code inspections. (apache#985) dcc4a8a fix: The spilled_bytes metric of CometSortExec should be size instead of time (apache#984) f64553b chore: fix compatibility guide (apache#978) 0ee7df8 chore: Enable additional CreateArray tests (apache#928) a690e9d perf: Remove one redundant CopyExec for SMJ (apache#962) a8156b5 chore: update rem expression guide (apache#976) 317a534 fix: Use the number of rows from underlying arrays instead of logical row count from RecordBatch (apache#972) 22561c4 doc: add documentation interlinks (apache#975) b4de8e0 chore: Update benchmarks results based on 0.3.0-rc1 (apache#969) 94093f3 chore: fix publish-to-maven script (apache#966) f31f6cc Generate changelog for 0.3.0 release (apache#964) 5663fc2 fix: div and rem by negative zero (apache#960) 50517f6 perf: Optimize decimal precision check in decimal aggregates (sum and avg) (apache#952) 5b3f7bc fix: CometScanExec on Spark 3.5.2 (apache#915) 8410c71 chore: clarify tarball installation (apache#959) 459b2b0 fix: window function range offset should be long instead of int (apache#733) ``` ## How are these changes tested?

Kimahriman added 4 commits September 4, 2024 21:36

Implement full verboseStringWithOperatorId instead of delegating

9c0bb9a

Bump Spark 3.5 version

926b367

Undo version change

cf2a0e0

Merge branch 'main' into scan-operator-id

1915849

Kimahriman added 2 commits September 6, 2024 20:14

Merge branch 'main' into scan-operator-id

0dfc8c4

Update plane explains

c8b40ee

parthchandra reviewed Sep 13, 2024

View reviewed changes

parthchandra approved these changes Sep 16, 2024

View reviewed changes

andygrove approved these changes Sep 19, 2024

View reviewed changes

Merge branch 'main' into scan-operator-id

2757186

andygrove merged commit 5b3f7bc into apache:main Sep 24, 2024

Conversation

Kimahriman commented Sep 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Kimahriman commented Sep 5, 2024

Uh oh!

kazuyukitanimura commented Sep 11, 2024

Uh oh!

kazuyukitanimura commented Sep 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kimahriman commented Sep 11, 2024

Uh oh!

kazuyukitanimura commented Sep 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

parthchandra Sep 13, 2024

Choose a reason for hiding this comment

Uh oh!

Kimahriman Sep 14, 2024

Choose a reason for hiding this comment

Uh oh!

Kimahriman Sep 14, 2024

Choose a reason for hiding this comment

Uh oh!

parthchandra Sep 16, 2024

Choose a reason for hiding this comment

Uh oh!

parthchandra left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove left a comment

Choose a reason for hiding this comment

Uh oh!

andygrove commented Sep 20, 2024

Uh oh!

Kimahriman commented Sep 20, 2024

Uh oh!

Kimahriman commented Sep 20, 2024

Uh oh!

codecov-commenter commented Sep 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Kimahriman commented Sep 20, 2024

Uh oh!

andygrove commented Sep 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Kimahriman commented Sep 5, 2024 •

edited

Loading

kazuyukitanimura commented Sep 11, 2024 •

edited

Loading

kazuyukitanimura commented Sep 13, 2024 •

edited

Loading

codecov-commenter commented Sep 20, 2024 •

edited

Loading