Skip to content

Simplified TableProvider::Insert API #6339

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

Recent INSERT work #6049 is a good example of a useful datafusion feature that has an extensibility story (a new function on a trait)

However, it takes a non trivial effort to add such support (requires an new physical operator).

Describe the solution you'd like

Thus I would like to propose the following API to support writing to sources

DataSink trait

A new trait that exposes just the information needed writing. Something like:

/// The DataSink implements writing streams of [`RecordBatch`]es to
/// partitioned destinations
pub trait DataSink: std::fmt::Debug + std::fmt::Display + Send + Sync {

    /// How does this sink want its input distributed?
    fn required_input_distribution(&self) -> Distribution;

    /// return a future which writes a RecordBatchStream to a particular partition
    /// and return the number of rows written
    fn write_stream(&self, partition: usize, input: SendableRecordBatchStream) -> BoxFuture<Result<u64>>;
}

Change signature of TableProvider

Then if we change the signature of TableProvider from

    /// Insert into this table
    async fn insert_into(
        &self,
        _state: &SessionState,
        _input: Arc<dyn ExecutionPlan>,
    ) -> Result<Arc<dyn ExecutionPlan>> {
        let msg = "Insertion not implemented for this table".to_owned();
        Err(DataFusionError::NotImplemented(msg))
    }

To something like

    /// Get a sink to use to write to this table, if supported
    async fn sink(
        &self,
    ) -> Result<Arc<dyn DataSink>> {
        let msg = "Insertion not implemented for this table".to_owned();
        Err(DataFusionError::NotImplemented(msg))
    }

I think almost all of the inert plans can share a common ExecutionPlan

Describe alternatives you've considered

do nothing

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions