-
Notifications
You must be signed in to change notification settings - Fork 3k
Add nested column schema support in DeleteFilter file projection #14847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
7713ba8 to
96a9db1
Compare
This commit resolves the TODO in DeleteFilter.fileProjection() that previously prevented equality deletes from working with nested columns. The original code used tableSchema.asStruct().field(fieldId) which only finds top-level fields, causing 'Cannot find required field for ID X' errors when equality delete files referenced nested fields. Changes: - Use Schema.findColumnName() to get full column path for nested fields - Use Schema.select() to preserve nested structure in projection - Separate data column IDs from metadata column IDs to avoid passing metadata column names to Schema.select() - Add metadata columns explicitly after projection (they don't exist in tableSchema) - Add MetadataColumns.metadataColumns() to expose metadata columns dynamically - Add test coverage for nested columns and metadata columns Fix DeleteFilter projection to preserve requested schema column order The previous implementation used Schema.select() which returns columns in the original table schema order, not the requested order. This caused TestSparkReaderDeletes.testEqualityDeleteWithDifferentScanAndDeleteColumns to fail because it expected columns in the requested schema order. The fix directly builds the schema by: 1. Adding requested schema columns first (preserves order) 2. Appending missing fields needed for equality deletes 3. Appending metadata columns This ensures the final projection maintains the requested schema's column ordering while still including all fields required for delete filtering.
96a9db1 to
71f20d9
Compare
| Types.NestedField field = tableSchema.asStruct().field(fieldId); | ||
| // Add missing data columns from table schema | ||
| for (int fieldId : missingDataIds) { | ||
| Types.NestedField field = tableSchema.findField(fieldId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does using tableSchema.findField(fieldId) here lose the nested structure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! We need to call findColumnName() to preserve the nested column names.
a37fd10 to
3a591c5
Compare
This change addresses the issue where using tableSchema.findField(fieldId) in fileProjection() would lose the nested structure when adding fields from equality deletes. The previous implementation would extract only the leaf field without preserving its parent struct/list/map containers. Changes: - Use Schema.findColumnName() to get full dotted path for nested fields - Use Schema.select() with dotted paths to preserve nested structure - Manually rebuild columns in requested schema order to maintain consistency - Add separate handling for metadata columns - Extract rebuildSchemaInRequestedOrder helper to reduce cyclomatic complexity - Add comprehensive test coverage for nested structures Tests added include: - Basic nested struct fields - Deep nesting (4 levels) - Array element nested fields - Map value nested fields - Multiple nested fields from different parent structures
1d1ede1 to
1ba01c4
Compare
This commit resolves the TODO in DeleteFilter.fileProjection() that previously prevented equality deletes from working with nested columns.
The original code used tableSchema.asStruct().field(fieldId) which only finds top-level fields, causing 'Cannot find required field for ID X' errors when equality delete files referenced nested fields.
Changes: