fix: enable Spark 4 SQL tests previously ignored for issues #3313 and #3314#4092
Merged
andygrove merged 1 commit intoapache:mainfrom Apr 26, 2026
Merged
Conversation
… and apache#3314 Issue 3313 was already resolved by recent non-AQE DPP work (apache#4011 and apache#4053). The test "Subquery reuse across the whole plan" now passes. Issue 3314 covered three tests. Two of them (ignoreMissingFiles parquet, alter temporary view) now pass because Spark 4's ShimSparkErrorConverter translates native FileNotFound into the expected FAILED_READ_FILE.FILE_NOT_EXIST. The third test (ParquetV1QuerySuite "Enabling/disabling ignoreCorruptFiles") still failed because CometExecIterator wraps native Parquet errors using _LEGACY_ERROR_TEMP_2254 with a "message" parameter, but Spark 4 strict- checks that error class has no placeholders and raises INTERNAL_ERROR during construction, masking the underlying "is not a Parquet file" cause that the test asserts on. Drop the message parameter so the SparkException can be constructed, allowing the cause-chain to surface as expected. Regenerate dev/diffs/4.0.1.diff to remove the four IgnoreCometNativeDataFusion tags.
mbutrovich
approved these changes
Apr 26, 2026
Contributor
mbutrovich
left a comment
There was a problem hiding this comment.
LGTM, thanks for keep tabs on this, @andygrove!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #3313.
Closes #3314.
Rationale for this change
Issues #3313 and #3314 tracked four Spark 4 SQL tests ignored via
IgnoreCometNativeDataFusionbecause they failed withnative_datafusionin auto scan mode.
Three of the four now pass on
mainthanks to recent work:DynamicPartitionPruningV1SuiteAEOff: "Subquery reuse across the whole plan"passes after the non-AQE DPP /
CometSubqueryBroadcastExecwork in feat: non-AQE DPP for native Parquet scans, broadcast exchange reuse for DPP subqueries #4011 andthe subquery reuse fix in fix: scalar subquery pushdown and reuse for CometNativeScanExec (SPARK-43402) #4053.
FileBasedDataSourceSuite: "Enabling/disabling ignoreMissingFiles usingparquet" passes because
ShimSparkErrorConvertertranslates the nativeFileNotFoundJSON payload into Spark'sFAILED_READ_FILE.FILE_NOT_EXISTerror class.SimpleSQLViewSuite: "alter temporary view should follow currentstoreAnalyzedPlanForView config" passes for the same reason.
The fourth test (
ParquetV1QuerySuite: "Enabling/disabling ignoreCorruptFiles")still failed because of a bug in
CometExecIteratorexposed by Spark 4'sstrict error-parameter validation. When
native_datafusionraises a parqueterror, the iterator wraps it with
_LEGACY_ERROR_TEMP_2254and amessageparameter. That error class has zero placeholders, so Spark 4's
SparkExceptionconstructor raisesINTERNAL_ERROR("Found unused messageparameters of the error class '_LEGACY_ERROR_TEMP_2254'") before the intended
exception is ever thrown, hiding the cause-chain entry that carries
"is not a Parquet file" which the test asserts on.
What changes are included in this PR?
CometExecIterator.scala: drop the unusedmessagemap entry so theSparkExceptionconstructs successfully under Spark 4's strict checks. Thecause-chain (which already carried the underlying error and the
"File is not a Parquet file." marker) is preserved.
dev/diffs/4.0.1.diff: remove the fourIgnoreCometNativeDataFusiontagsfor the now-passing tests, regenerated against
v4.0.1.How are these changes tested?
Verified locally with Spark 4.0.1 source plus the regenerated diff, running
each test under
ENABLE_COMET=true ENABLE_COMET_ONHEAP=truesonative_datafusionis used in auto scan mode:sql/testOnly org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -- -z "Subquery reuse across the whole plan"sql/testOnly org.apache.spark.sql.FileBasedDataSourceSuite -- -z "Enabling/disabling ignoreMissingFiles using parquet"sql/testOnly org.apache.spark.sql.execution.SimpleSQLViewSuite -- -z "alter temporary view should follow current storeAnalyzedPlanForView config"sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetV1QuerySuite -- -z "Enabling/disabling ignoreCorruptFiles"sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetV2QuerySuite -- -z "Enabling/disabling ignoreCorruptFiles"All five pass. CI Spark SQL test runs cover the same matrix.