Skip to content

FileSource and DataSource traits require deep copies #14939

@alamb

Description

@alamb

Is your feature request related to a problem or challenge?

While working on various upgrade PRs and preparing for the DataFusion 46 release, I have noticed something I would like to change before we release

The FileSource and DataSource traits were introduced in the datasource refactor

They have APIs to update the underlying source in a few ways, but the APIs require cloning. For example, FileSource looks like this:

/// Common behaviors that every file format needs to implement.
///
/// See initialization examples on `ParquetSource`, `CsvSource`
pub trait FileSource: Send + Sync {
...
    /// Initialize new type with batch size configuration
    fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>;
...
}

The only way to implement with_batch_size is to (deep) clone the object

    fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource> {
        let mut conf = self.clone();
        conf.batch_size = Some(batch_size);
        Arc::new(conf)
    }

fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource> {
let mut conf = self.clone();
conf.batch_size = Some(batch_size);
Arc::new(conf)
}

Describe the solution you'd like

I would like to avoid having to deep clone the object

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions