Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The Arrow IPC format supports a custom_metadata field on the Message flatbuffer envelope (Message.fbs), allowing per-batch metadata separate from schema-level metadata. Currently, the Rust RecordBatch struct has no custom_metadata field and the IPC reader/writer ignore it.
PyArrow has supported this since v11.0.0 via write_batch(batch, custom_metadata=...) and read_next_batch_with_custom_metadata(). This means IPC files written by PyArrow with per-batch metadata lose that metadata when read by arrow-rs.
Describe the solution you'd like
- Add a
custom_metadata: HashMap<String, String> field to RecordBatch with accessor methods (custom_metadata(), custom_metadata_mut(), with_custom_metadata(), into_parts_with_custom_metadata())
- IPC writer: serialize
custom_metadata to the Message flatbuffer when writing record batches
- IPC reader: extract
custom_metadata from the Message at all reader call sites (FileDecoder, StreamReader, StreamDecoder)
- arrow-flight: extract and propagate
custom_metadata in flight_data_to_arrow_batch
- arrow-select: preserve
custom_metadata through filter_record_batch and take_record_batch
- Preserve metadata through
slice(), project(), normalize(), with_schema(), and remove_column()
Describe alternatives you've considered
- Storing per-batch metadata in schema-level metadata with a naming convention — this conflates two levels of metadata and doesn't match the IPC format's intent.
- An
Option<HashMap<String, String>> instead of HashMap<String, String> — HashMap::new() is zero-allocation so the overhead is minimal, and Option complicates every accessor for little gain.
Additional context
HashMap::new() does not heap-allocate, so there is no performance concern for the default (empty metadata) case.
- The existing
into_parts() signature is unchanged for backward compatibility; a new into_parts_with_custom_metadata() is added.
- Multi-batch merge operations (
concat_batches, interleave_record_batch, BatchCoalescer) intentionally do not propagate per-batch metadata since the semantics are ambiguous when merging batches with different metadata.
- Reuses existing
metadata_to_fb (convert.rs) for writing and the KV extraction pattern for reading.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
The Arrow IPC format supports a
custom_metadatafield on theMessageflatbuffer envelope (Message.fbs), allowing per-batch metadata separate from schema-level metadata. Currently, the RustRecordBatchstruct has nocustom_metadatafield and the IPC reader/writer ignore it.PyArrow has supported this since v11.0.0 via
write_batch(batch, custom_metadata=...)andread_next_batch_with_custom_metadata(). This means IPC files written by PyArrow with per-batch metadata lose that metadata when read by arrow-rs.Describe the solution you'd like
custom_metadata: HashMap<String, String>field toRecordBatchwith accessor methods (custom_metadata(),custom_metadata_mut(),with_custom_metadata(),into_parts_with_custom_metadata())custom_metadatato theMessageflatbuffer when writing record batchescustom_metadatafrom theMessageat all reader call sites (FileDecoder,StreamReader,StreamDecoder)custom_metadatainflight_data_to_arrow_batchcustom_metadatathroughfilter_record_batchandtake_record_batchslice(),project(),normalize(),with_schema(), andremove_column()Describe alternatives you've considered
Option<HashMap<String, String>>instead ofHashMap<String, String>—HashMap::new()is zero-allocation so the overhead is minimal, andOptioncomplicates every accessor for little gain.Additional context
HashMap::new()does not heap-allocate, so there is no performance concern for the default (empty metadata) case.into_parts()signature is unchanged for backward compatibility; a newinto_parts_with_custom_metadata()is added.concat_batches,interleave_record_batch,BatchCoalescer) intentionally do not propagate per-batch metadata since the semantics are ambiguous when merging batches with different metadata.metadata_to_fb(convert.rs) for writing and the KV extraction pattern for reading.