Skip to content

Conversation

@kosiew
Copy link
Contributor

@kosiew kosiew commented May 20, 2025

Which issue does this PR close?

This is part of a series of PRs re-implementing #15295 to close #14657 by adding schema‐evolution support for listing‐based tables with nested structs in DataFusion.

Rationale for this change

This refactor improves the flexibility and robustness of schema adaptation in DataFusion by extracting casting logic into a reusable helper. It enhances clarity, testability, and reusability of core logic for mapping file schemas to table schemas—especially important for supporting schema evolution with nested fields.

What changes are included in this PR?

  • Introduced can_cast_field to encapsulate field-level type casting logic with clear error messaging.
  • Added create_field_mapping, a helper for generating field mappings and projections between file and table schemas.
  • Refactored DefaultSchemaAdapter::map_schema to use create_field_mapping, reducing duplication and improving readability.
  • Added a SchemaMapping::new constructor for cleaner instantiation.
  • Significantly expanded unit test coverage:
    • Verified casting logic, including valid/invalid cast scenarios.
    • Confirmed behavior of create_field_mapping under various mapping strategies.
    • Validated end-to-end schema mapping behavior via integration tests.

Are these changes tested?

✅ Yes, comprehensive tests are included for:

  • Field casting logic (can_cast_field)
  • Field mapping creation (create_field_mapping)
  • Full integration of schema adaptation via map_schema and SchemaMapping::map_batch

These tests cover both happy-path and failure scenarios.

Are there any user-facing changes?

No user-facing changes. This is an internal refactor that improves schema handling logic and prepares the codebase for supporting complex schema evolution use cases in a modular way.

@github-actions github-actions bot added the datasource Changes to the datasource crate label May 20, 2025
Comment on lines +270 to +274
let (field_mappings, projection) = create_field_mapping(
file_schema,
&self.projected_table_schema,
can_cast_field,
)?;
Copy link
Contributor Author

@kosiew kosiew May 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactor into a helper function so that we can re-use in later PRs for deep-nested SchemaAdapter

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks very nice to me. The additional tests -despite this being existing code- is a great contribution.

@kosiew
Copy link
Contributor Author

kosiew commented May 23, 2025

@adriangb
Thanks for the review.

@kosiew kosiew force-pushed the schema-adapter-helper branch from b128412 to b598814 Compare May 26, 2025 08:58
@kosiew kosiew force-pushed the schema-adapter-helper branch from b598814 to 9d93dac Compare May 26, 2025 09:55
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kosiew and @adriangb

@alamb alamb merged commit bf7859e into apache:main Jun 4, 2025
27 checks passed
@alamb
Copy link
Contributor

alamb commented Jun 4, 2025

Thanks again @kosiew and @adriangb

@kosiew
Copy link
Contributor Author

kosiew commented Jun 5, 2025

You're welcome @alamb
Thank you for the review.

@kosiew kosiew deleted the schema-adapter-helper branch July 16, 2025 03:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

datasource Changes to the datasource crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants