You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For CSV and Parquet files that are obtained over DuckDB (S3, GCS, Azure and local environment), the check field_is_present always passes, independent of whether the column is present in the CSV or not.
Cause: In create_view_with_schema_union(...), we create an empty table based on the datacontract and later insert the actual data, e.g. from a CSV file. If a column is not present in the CSV file, it will still be in the checked table (but filled with NULLs).
Fix the check and enforce field presence in the header even for non-required fields. This would break, e.g., test_csv_optional_field_missing_from_old_data() in test_test_schema_evolution.py, which assumes that a CSV-file that misses a non-required field should pass all tests.
For CSV and Parquet files that are obtained over DuckDB (S3, GCS, Azure and local environment), the check
field_is_presentalways passes, independent of whether the column is present in the CSV or not.Cause: In
create_view_with_schema_union(...), we create an empty table based on the datacontract and later insert the actual data, e.g. from a CSV file. If a column is not present in the CSV file, it will still be in the checked table (but filled with NULLs).Possible solutions:
field_is_presentcheck (for non-required fields). However, it is useful to check for field presence as discussed in Support for historical data validation causes error in CSV files without headers #1018.test_csv_optional_field_missing_from_old_data()intest_test_schema_evolution.py, which assumes that a CSV-file that misses a non-required field should pass all tests.