[CORE] Refactor columnar noop write rule#8422
Conversation
|
Thanks for opening a pull request! Could you open an issue for this pull request on Github Issues? https://github.com/apache/incubator-gluten/issues Then could you also rename commit message and pull request title in the following format? See also: |
|
Run Gluten Clickhouse CI on x86 |
8199ce6 to
39761ad
Compare
|
Run Gluten Clickhouse CI on x86 |
39761ad to
99ded60
Compare
|
Run Gluten Clickhouse CI on x86 |
99ded60 to
21a5a58
Compare
|
Run Gluten Clickhouse CI on x86 |
21a5a58 to
5beecab
Compare
|
Run Gluten Clickhouse CI on x86 |
5beecab to
6774a4e
Compare
|
Run Gluten Clickhouse CI on x86 |
6774a4e to
96c0985
Compare
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
1cdf913 to
13119c4
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Run Gluten Clickhouse CI on x86 |
1 similar comment
|
Run Gluten Clickhouse CI on x86 |
| super.sparkConf | ||
| .set("spark.gluten.sql.columnar.forceShuffledHashJoin", "false") | ||
| .set(GlutenConfig.COLUMNAR_FORCE_SHUFFLED_HASH_JOIN_ENABLED.key, "false") | ||
| .set(GlutenConfig.NOOP_WRITER_ENABLED.key, "false") |
There was a problem hiding this comment.
The following test will report an error, as GlutenNoopWriterRule will add a FakeRowAdaptor node, which will cause the test check to fail, thus we default false here.
SPARK-30953: InsertAdaptiveSparkPlan should apply AQE on child plan of v2 write commands
| case class NativeWritePostRule(session: SparkSession) extends Rule[SparkPlan] { | ||
| private[datasources] def injectFakeRowAdaptor(command: SparkPlan, child: SparkPlan): SparkPlan = { | ||
| child match { | ||
| // if the child is columnar, we can just wrap&transfer the columnar data |
shims/common/src/main/scala/org/apache/gluten/config/GlutenConfig.scala
Outdated
Show resolved
Hide resolved
|
|
||
| case class GlutenNoopWriterRule(session: SparkSession) extends Rule[SparkPlan] { | ||
| override def apply(p: SparkPlan): SparkPlan = p match { | ||
| case rc @ AppendDataExec(_, _, NoopWrite) if GlutenConfig.get.enableNoopWriter => |
There was a problem hiding this comment.
I note the below check is removed. Could you clarify this change?
write.getClass.getName == NOOP_WRITE && BackendsApiManager.getSettings.enableNativeWriteFiles()
There was a problem hiding this comment.
I note the below check is removed. Could you clarify this change?
write.getClass.getName == NOOP_WRITE && BackendsApiManager.getSettings.enableNativeWriteFiles()
We can directly check the NoopWrite here, so we don't need the class name check now. As for BackendsApiManager.getSettings.enableNativeWriteFiles(), we have a better config now.
|
Run Gluten Clickhouse CI on x86 |
25903b1 to
805a115
Compare
|
Run Gluten Clickhouse CI on x86 |
805a115 to
aec0545
Compare
|
Run Gluten Clickhouse CI on x86 |
aec0545 to
b09a2d9
Compare
|
Run Gluten Clickhouse CI on x86 |
| override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = { | ||
| qe.executedPlan match { | ||
| case plan @ (_: DataWritingCommandExec | _: V2TableWriteExec) => | ||
| noLocalread = collect(plan) { |
There was a problem hiding this comment.
Remove the child plan check as we would add FackRowAdaptor, and the check has already been remove since 3.4.0.
| assert(plan.isInstanceOf[V2TableWriteExec]) | ||
| val childPlan = plan.asInstanceOf[V2TableWriteExec].child | ||
| assert(childPlan.isInstanceOf[FakeRowAdaptor]) | ||
| assert(childPlan.asInstanceOf[FakeRowAdaptor].child.isInstanceOf[AdaptiveSparkPlanExec]) |
There was a problem hiding this comment.
Refine the child plan check
| var fakeRowAdaptor: Option[FakeRowAdaptor] = None | ||
|
|
||
| override def onSuccess(funcName: String, qe: QueryExecution, durationNs: Long): Unit = { | ||
| fakeRowAdaptor = qe.executedPlan.collectFirst { case f: FakeRowAdaptor => f } |
There was a problem hiding this comment.
@jackylee-ch FakeRowAdaptor is used in spark 32 and 33. Why we need to add this check in spark 35 test folder?
There was a problem hiding this comment.
The GlutenNoopWriterRule would add a FakeRowAdaptor after v2 write command while writing to noop source. This PR would let GlutenNoopWriterRule work for all Spark versions.
backends-velox/src/main/scala/org/apache/gluten/backendsapi/velox/VeloxRuleApi.scala
Show resolved
Hide resolved
JkSelf
left a comment
There was a problem hiding this comment.
LGTM. Thanks for your work.
|
Run Gluten Clickhouse CI on x86 |
4c3153b to
fd48d5b
Compare
|
Run Gluten Clickhouse CI on x86 |
|
Any more question about this pr? @philo-he |
fd48d5b to
887e94b
Compare
|
Run Gluten Clickhouse CI on x86 |
| * ColumnarToRow operation for NoopWrite. Since NoopWrite does not actually perform any data | ||
| * operations, it can accept input data in either row-based or columnar format. | ||
| */ | ||
| case class GlutenNoopWriterRule(session: SparkSession) extends Rule[SparkPlan] { |
There was a problem hiding this comment.
We cannot move to that folder as the NoopWrite can only be accessed under org.apache.spark.sql.execution.datasources.noop
| } | ||
|
|
||
| case class NativeWritePostRule(session: SparkSession) extends Rule[SparkPlan] { | ||
| private[datasources] def injectFakeRowAdaptor(command: SparkPlan, child: SparkPlan): SparkPlan = { |
There was a problem hiding this comment.
Is this API only called by GlutenNoopWriterRule after the change? Could move to the rule file if so.
There was a problem hiding this comment.
This API is also needed in NativeWritePostRule
|
@jackylee-ch would you pelase writing some comments for your PR? thanks! |
What changes were proposed in this pull request?
Refactor
NoopWritesupport, moveNoopWriterule fromNativeWritePostRuletoGlutenNoopWriteRuleto support all Spark versions, and change class name check to pattern matching.How was this patch tested?
CI and new added tests