Description
When running Spark SQL tests with native_datafusion scan, tests that expect errors for duplicate or ambiguous fields in case-insensitive mode fail because DataFusion's Parquet reader doesn't enforce Spark's case-sensitivity validation rules.
Affected Tests
Spark native readers should respect spark.sql.caseSensitive (FileBasedDataSourceSuite)
Writes a Parquet file with columns A, b, B, then reads with caseSensitive=false. Spark expects a SparkException when selecting b (ambiguous between b and B), but native_datafusion reads without error.
SPARK-25207: exception when duplicate fields in case-insensitive mode (ParquetFilterSuite, V1 and V2)
Writes Parquet with columns A, B, b, then reads with caseSensitive=false. Spark expects a SparkException with cause RuntimeException containing Found duplicate field(s) "B": [B, b]. The native reader either doesn't detect the duplicate, or wraps the error with a different exception type/cause than expected.
Context
PR #3687 added a fallback from native_datafusion for duplicate fields in case-insensitive mode, avoiding the test failures by falling back to the Spark reader. These tests remain ignored because the native reader itself doesn't implement the validation.
Related
Description
When running Spark SQL tests with
native_datafusionscan, tests that expect errors for duplicate or ambiguous fields in case-insensitive mode fail because DataFusion's Parquet reader doesn't enforce Spark's case-sensitivity validation rules.Affected Tests
Spark native readers should respect spark.sql.caseSensitive(FileBasedDataSourceSuite)Writes a Parquet file with columns
A,b,B, then reads withcaseSensitive=false. Spark expects aSparkExceptionwhen selectingb(ambiguous betweenbandB), butnative_datafusionreads without error.SPARK-25207: exception when duplicate fields in case-insensitive mode(ParquetFilterSuite, V1 and V2)Writes Parquet with columns
A,B,b, then reads withcaseSensitive=false. Spark expects aSparkExceptionwith causeRuntimeExceptioncontainingFound duplicate field(s) "B": [B, b]. The native reader either doesn't detect the duplicate, or wraps the error with a different exception type/cause than expected.Context
PR #3687 added a fallback from
native_datafusionfor duplicate fields in case-insensitive mode, avoiding the test failures by falling back to the Spark reader. These tests remain ignored because the native reader itself doesn't implement the validation.Related