Description
When all struct fields are missing from a Parquet file, Spark's vectorized reader returns struct<> (empty struct) as the schema, but Comet's native scan returns the full schema with null values (e.g., struct<_1:struct<_3:int,_4:bigint>>).
This causes 5 tests to fail in ParquetIOSuite when running Spark 4.1.1 SQL tests with Comet enabled:
vectorized reader: missing all struct fields
SPARK-53535: vectorized reader: missing all struct fields, struct with complex fields
SPARK-53535: vectorized reader: missing all struct fields, struct with map field only
SPARK-53535: vectorized reader: missing all struct fields, struct with cheap map and more expensive array field
SPARK-54220: vectorized reader: missing all struct fields, struct with NullType only
These tests are new in Spark 4.1 (SPARK-53535, SPARK-54220).
Description
When all struct fields are missing from a Parquet file, Spark's vectorized reader returns
struct<>(empty struct) as the schema, but Comet's native scan returns the full schema with null values (e.g.,struct<_1:struct<_3:int,_4:bigint>>).This causes 5 tests to fail in
ParquetIOSuitewhen running Spark 4.1.1 SQL tests with Comet enabled:vectorized reader: missing all struct fieldsSPARK-53535: vectorized reader: missing all struct fields, struct with complex fieldsSPARK-53535: vectorized reader: missing all struct fields, struct with map field onlySPARK-53535: vectorized reader: missing all struct fields, struct with cheap map and more expensive array fieldSPARK-54220: vectorized reader: missing all struct fields, struct with NullType onlyThese tests are new in Spark 4.1 (SPARK-53535, SPARK-54220).