Is your feature request related to a problem or challenge?
Between DataFusion 48 and 49 there was a breaking change in the FFI boundary for record batch streams. We have another planned breaking change from 49 to 50 in the user defined functions.
These changes make us unable to use these features across different library versions. For Python this means that the datafusion-python version must match exactly the rust version used in the library. This is problematic because it means that users will need python environments specific to just the project that uses this interface OR they cannot use more recent advances in DataFusion.
Describe the solution you'd like
It would be helpful to identify any gaps in the current API and to add basic implementation for them now so that we do not continue to have breaking changes introduced. For the memory issue with the RecordBatches, we can require the release method on all FFI interfaces, even those that do not seem to need it.
It is worth evaluating if there are ways we can add additional traits without breaking the existing API. It is unclear to me at this point how that could be done.
Describe alternatives you've considered
The alternative I know of is to keep this hard requirement that all libraries exist at the same level.
We could also require that any API additions come at the end of the struct. I would need to dive deeper to know if this is guaranteed to work.
Additional context
Also useful would be some way to know which interfaces have remained stable across versions. This is less clear because a change to RecordBatchStream impacts TableProvider, but the inverse is not true.
Is your feature request related to a problem or challenge?
Between DataFusion 48 and 49 there was a breaking change in the FFI boundary for record batch streams. We have another planned breaking change from 49 to 50 in the user defined functions.
These changes make us unable to use these features across different library versions. For Python this means that the datafusion-python version must match exactly the rust version used in the library. This is problematic because it means that users will need python environments specific to just the project that uses this interface OR they cannot use more recent advances in DataFusion.
Describe the solution you'd like
It would be helpful to identify any gaps in the current API and to add basic implementation for them now so that we do not continue to have breaking changes introduced. For the memory issue with the RecordBatches, we can require the
releasemethod on all FFI interfaces, even those that do not seem to need it.It is worth evaluating if there are ways we can add additional traits without breaking the existing API. It is unclear to me at this point how that could be done.
Describe alternatives you've considered
The alternative I know of is to keep this hard requirement that all libraries exist at the same level.
We could also require that any API additions come at the end of the struct. I would need to dive deeper to know if this is guaranteed to work.
Additional context
Also useful would be some way to know which interfaces have remained stable across versions. This is less clear because a change to RecordBatchStream impacts TableProvider, but the inverse is not true.