Summary
With the introduction of native_datafusion in auto scan mode (PR #3307), several test helpers that check the scan implementation config are broken when running in auto mode. The root cause is that helpers like usingDataSourceExec check if the config string is literally native_datafusion or native_iceberg_compat, but in auto mode the config reads as "auto" even though it resolves to native_datafusion at plan time.
Failing Tests (in auto mode)
- "schema evolution" (
ParquetReadSuite.scala:1256) — expects SparkException but native_datafusion handles type widening gracefully
- "row group skipping doesn't overflow when reading into larger type" (
ParquetReadSuite.scala:1523) — same issue
Proposed Fix
Since native_comet is deprecated and the default path is now DataSource-based (via auto), invert the check:
- Rename
usingDataSourceExec → usingLegacyNativeCometScan which returns true only when config is explicitly native_comet
- Flip all ~40 call sites accordingly
- Update
usingDataSourceExecWithIncompatTypes similarly
- Fix the explicit
SCAN_NATIVE_DATAFUSION check in the schema evolution test
This avoids needing to enumerate all non-legacy modes and is forward-compatible with future scan implementations.
Summary
With the introduction of
native_datafusionin auto scan mode (PR #3307), several test helpers that check the scan implementation config are broken when running inautomode. The root cause is that helpers likeusingDataSourceExeccheck if the config string is literallynative_datafusionornative_iceberg_compat, but inautomode the config reads as"auto"even though it resolves tonative_datafusionat plan time.Failing Tests (in auto mode)
ParquetReadSuite.scala:1256) — expectsSparkExceptionbutnative_datafusionhandles type widening gracefullyParquetReadSuite.scala:1523) — same issueProposed Fix
Since
native_cometis deprecated and the default path is now DataSource-based (via auto), invert the check:usingDataSourceExec→usingLegacyNativeCometScanwhich returnstrueonly when config is explicitlynative_cometusingDataSourceExecWithIncompatTypessimilarlySCAN_NATIVE_DATAFUSIONcheck in the schema evolution testThis avoids needing to enumerate all non-legacy modes and is forward-compatible with future scan implementations.