-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
Recent INSERT work #6049 is a good example of a useful datafusion feature that has an extensibility story (a new function on a trait)
However, it takes a non trivial effort to add such support (requires an new physical operator).
Describe the solution you'd like
Thus I would like to propose the following API to support writing to sources
DataSink trait
A new trait that exposes just the information needed writing. Something like:
/// The DataSink implements writing streams of [`RecordBatch`]es to
/// partitioned destinations
pub trait DataSink: std::fmt::Debug + std::fmt::Display + Send + Sync {
/// How does this sink want its input distributed?
fn required_input_distribution(&self) -> Distribution;
/// return a future which writes a RecordBatchStream to a particular partition
/// and return the number of rows written
fn write_stream(&self, partition: usize, input: SendableRecordBatchStream) -> BoxFuture<Result<u64>>;
}Change signature of TableProvider
Then if we change the signature of TableProvider from
/// Insert into this table
async fn insert_into(
&self,
_state: &SessionState,
_input: Arc<dyn ExecutionPlan>,
) -> Result<Arc<dyn ExecutionPlan>> {
let msg = "Insertion not implemented for this table".to_owned();
Err(DataFusionError::NotImplemented(msg))
}To something like
/// Get a sink to use to write to this table, if supported
async fn sink(
&self,
) -> Result<Arc<dyn DataSink>> {
let msg = "Insertion not implemented for this table".to_owned();
Err(DataFusionError::NotImplemented(msg))
}I think almost all of the inert plans can share a common ExecutionPlan
Describe alternatives you've considered
do nothing
Additional context
No response
tustvold, JanKaul, roeap and qrpike
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request