fix: CometExec's outputPartitioning might not be same as Spark expects after AQE interferes#299
Conversation
45b4588 to
d1257ea
Compare
| override val serializedPlanOpt: SerializedPlan) | ||
| extends CometUnaryExec { | ||
| extends CometUnaryExec | ||
| with PartitioningPreservingUnaryExecNode { |
There was a problem hiding this comment.
PartitioningPreservingUnaryExecNode implements outputPartitioning for ProjectExec.
| override val serializedPlanOpt: SerializedPlan) | ||
| extends CometUnaryExec { | ||
| extends CometUnaryExec | ||
| with PartitioningPreservingUnaryExecNode { |
There was a problem hiding this comment.
PartitioningPreservingUnaryExecNode implements outputPartitioning for HashAggregateExec.
| * This is copied from Spark's `PartitioningPreservingUnaryExecNode` because it is only available | ||
| * in Spark 3.4+. This is a workaround to make it available in Spark 3.2+. | ||
| */ | ||
| trait PartitioningPreservingUnaryExecNode extends UnaryExecNode with AliasAwareOutputExpression { |
There was a problem hiding this comment.
Spark 3.4 has a bit more change on outputPartitioning of some nodes like ProjectExec. Copied it from Spark 3.4.
| extends CometUnaryExec { | ||
|
|
||
| override def outputPartitioning: Partitioning = child.outputPartitioning |
There was a problem hiding this comment.
Would it make sense to add this outputPartitioning to CometUnaryExec as the default?
There was a problem hiding this comment.
Because Spark's default outputPartitioning is UnknownPartitioning, if we add child.outputPartitioning to CometUnaryExec as default, it will possibly change outputPartitioning if we don't notice it.
Currently CometExec uses original Spark plan's outputPartitioning as default which is safer option, I think. Except for the case that Spark dynamically changes output partitioning during execution like AQE, it should be correct because Comet doesn't change output partitioning from Spark.
andygrove
left a comment
There was a problem hiding this comment.
I'm not an expert on the specific changes but it looks like we are just porting Spark logic over, so LGTM.
208d9e2 to
f0cb1cb
Compare
|
Thank you @andygrove |
|
Merged. Thanks. |
…s after AQE interferes (apache#299) * fix: CometExec's outputPartitioning might not be same as Spark expects after AQE interferes * Add compatibility with Spark 3.2 and 3.3 * Remove unused import
Which issue does this PR close?
Closes #298.
Rationale for this change
What changes are included in this PR?
How are these changes tested?