Skip to content

Support proto serialization of InsertExec #7303

@thinkharderdev

Description

@thinkharderdev

Is your feature request related to a problem or challenge?

Currently plans that include an InsertExec cannot be serialized to protobuf (and hence used in Ballista)

Describe the solution you'd like

The easiest way to support this would be to modify PhysicalExtensionCodec to support serializing/deserializing a dyn DataSink. So something like:

pub trait PhysicalExtensionCodec: Debug + Send + Sync {
    fn try_decode(
        &self,
        buf: &[u8],
        inputs: &[Arc<dyn ExecutionPlan>],
        registry: &dyn FunctionRegistry,
    ) -> Result<Arc<dyn ExecutionPlan>>;

    fn try_decode_data_sink(&self, buf: &[u8]) -> Result<Arc<dyn DataSink>> {
        // Default impl for backcompat
        Err(DataFusionError::NotImplemented("PhysicalExtensionCodec::try_decode_data_sink not implemented".into()))
    }

    fn try_encode(&self, node: Arc<dyn ExecutionPlan>, buf: &mut Vec<u8>) -> Result<()>;

    fn try_encode_data_sink(&self, sink: Arc<dyn DataSink>, buf: &mut Vec<u8>) -> Result<()> {
        // Default impl for backcompat
        Err(DataFusionError::NotImplemented("PhysicalExtensionCodec::try_encode_data_sink not implemented".into()))
    }
}

In this case the "standard" implementations would be handled directly within the main serde logic and if a given Arc<dyn DataSink> wasn't one of the standard cases then it would try and use the extension codec.

Alternatively we might push serialization to the DataSink trait itself:

#[async_trait]
pub trait DataSink: DisplayAs + Debug + Send + Sync {
    // TODO add desired input ordering
    // How does this sink want its input ordered?

    /// Writes the data to the sink, returns the number of values written
    ///
    /// This method will be called exactly once during each DML
    /// statement. Thus prior to return, the sink should do any commit
    /// or rollback required.
    async fn write_all(
        &self,
        data: Vec<SendableRecordBatchStream>,
        context: &Arc<TaskContext>,
    ) -> Result<u64>;

    /// Encode `self` into the provided buffer
    fn encode(&self, buf: &mut Vec<u8>) -> Result<()>;

    /// Decode a instance of `Self` from a buffer
    fn try_decode(buf: &[u8]) -> Result<Self>;
}

Describe alternatives you've considered

Not do anything

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions