Skip to content

Stabilize FFI Boundary #17374

@timsaucer

Description

@timsaucer

Is your feature request related to a problem or challenge?

Between DataFusion 48 and 49 there was a breaking change in the FFI boundary for record batch streams. We have another planned breaking change from 49 to 50 in the user defined functions.

These changes make us unable to use these features across different library versions. For Python this means that the datafusion-python version must match exactly the rust version used in the library. This is problematic because it means that users will need python environments specific to just the project that uses this interface OR they cannot use more recent advances in DataFusion.

Describe the solution you'd like

It would be helpful to identify any gaps in the current API and to add basic implementation for them now so that we do not continue to have breaking changes introduced. For the memory issue with the RecordBatches, we can require the release method on all FFI interfaces, even those that do not seem to need it.

It is worth evaluating if there are ways we can add additional traits without breaking the existing API. It is unclear to me at this point how that could be done.

Describe alternatives you've considered

The alternative I know of is to keep this hard requirement that all libraries exist at the same level.

We could also require that any API additions come at the end of the struct. I would need to dive deeper to know if this is guaranteed to work.

Additional context

Also useful would be some way to know which interfaces have remained stable across versions. This is less clear because a change to RecordBatchStream impacts TableProvider, but the inverse is not true.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestffiChanges to the ffi crate

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions