-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Labels
enhancementNew feature or requestNew feature or request
Description
Is your feature request related to a problem or challenge?
While working on various upgrade PRs and preparing for the DataFusion 46 release, I have noticed something I would like to change before we release
The FileSource and DataSource traits were introduced in the datasource refactor
They have APIs to update the underlying source in a few ways, but the APIs require cloning. For example, FileSource looks like this:
/// Common behaviors that every file format needs to implement.
///
/// See initialization examples on `ParquetSource`, `CsvSource`
pub trait FileSource: Send + Sync {
...
/// Initialize new type with batch size configuration
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource>;
...
}The only way to implement with_batch_size is to (deep) clone the object
fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource> {
let mut conf = self.clone();
conf.batch_size = Some(batch_size);
Arc::new(conf)
}datafusion/datafusion/core/src/datasource/physical_plan/csv.rs
Lines 584 to 588 in 1ae06a4
| fn with_batch_size(&self, batch_size: usize) -> Arc<dyn FileSource> { | |
| let mut conf = self.clone(); | |
| conf.batch_size = Some(batch_size); | |
| Arc::new(conf) | |
| } |
Describe the solution you'd like
I would like to avoid having to deep clone the object
Describe alternatives you've considered
No response
Additional context
No response
AdamGS and blaginin
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request