Skip to content

fix: enable Spark 4 SQL tests previously ignored for issues #3313 and #3314#4092

Merged
andygrove merged 1 commit intoapache:mainfrom
andygrove:fix-issues-3313-3314-spark4-tests
Apr 26, 2026
Merged

fix: enable Spark 4 SQL tests previously ignored for issues #3313 and #3314#4092
andygrove merged 1 commit intoapache:mainfrom
andygrove:fix-issues-3313-3314-spark4-tests

Conversation

@andygrove
Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #3313.
Closes #3314.

Rationale for this change

Issues #3313 and #3314 tracked four Spark 4 SQL tests ignored via
IgnoreCometNativeDataFusion because they failed with native_datafusion
in auto scan mode.

Three of the four now pass on main thanks to recent work:

The fourth test (ParquetV1QuerySuite: "Enabling/disabling ignoreCorruptFiles")
still failed because of a bug in CometExecIterator exposed by Spark 4's
strict error-parameter validation. When native_datafusion raises a parquet
error, the iterator wraps it with _LEGACY_ERROR_TEMP_2254 and a message
parameter. That error class has zero placeholders, so Spark 4's
SparkException constructor raises INTERNAL_ERROR ("Found unused message
parameters of the error class '_LEGACY_ERROR_TEMP_2254'") before the intended
exception is ever thrown, hiding the cause-chain entry that carries
"is not a Parquet file" which the test asserts on.

What changes are included in this PR?

  • CometExecIterator.scala: drop the unused message map entry so the
    SparkException constructs successfully under Spark 4's strict checks. The
    cause-chain (which already carried the underlying error and the
    "File is not a Parquet file." marker) is preserved.
  • dev/diffs/4.0.1.diff: remove the four IgnoreCometNativeDataFusion tags
    for the now-passing tests, regenerated against v4.0.1.

How are these changes tested?

Verified locally with Spark 4.0.1 source plus the regenerated diff, running
each test under ENABLE_COMET=true ENABLE_COMET_ONHEAP=true so
native_datafusion is used in auto scan mode:

  • sql/testOnly org.apache.spark.sql.DynamicPartitionPruningV1SuiteAEOff -- -z "Subquery reuse across the whole plan"
  • sql/testOnly org.apache.spark.sql.FileBasedDataSourceSuite -- -z "Enabling/disabling ignoreMissingFiles using parquet"
  • sql/testOnly org.apache.spark.sql.execution.SimpleSQLViewSuite -- -z "alter temporary view should follow current storeAnalyzedPlanForView config"
  • sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetV1QuerySuite -- -z "Enabling/disabling ignoreCorruptFiles"
  • sql/testOnly org.apache.spark.sql.execution.datasources.parquet.ParquetV2QuerySuite -- -z "Enabling/disabling ignoreCorruptFiles"

All five pass. CI Spark SQL test runs cover the same matrix.

… and apache#3314

Issue 3313 was already resolved by recent non-AQE DPP work (apache#4011 and
apache#4053). The test "Subquery reuse across the whole plan" now passes.

Issue 3314 covered three tests. Two of them (ignoreMissingFiles parquet,
alter temporary view) now pass because Spark 4's
ShimSparkErrorConverter translates native FileNotFound into the expected
FAILED_READ_FILE.FILE_NOT_EXIST. The third test
(ParquetV1QuerySuite "Enabling/disabling ignoreCorruptFiles") still
failed because CometExecIterator wraps native Parquet errors using
_LEGACY_ERROR_TEMP_2254 with a "message" parameter, but Spark 4 strict-
checks that error class has no placeholders and raises INTERNAL_ERROR
during construction, masking the underlying "is not a Parquet file"
cause that the test asserts on.

Drop the message parameter so the SparkException can be constructed,
allowing the cause-chain to surface as expected.

Regenerate dev/diffs/4.0.1.diff to remove the four
IgnoreCometNativeDataFusion tags.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for keep tabs on this, @andygrove!

@andygrove andygrove merged commit bf3cf9b into apache:main Apr 26, 2026
134 checks passed
@andygrove andygrove deleted the fix-issues-3313-3314-spark4-tests branch April 26, 2026 15:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

2 participants