Support per-batch `custom_metadata` on `RecordBatch` (IPC Message field)

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**

The Arrow IPC format supports a `custom_metadata` field on the `Message` flatbuffer envelope ([Message.fbs](https://github.com/apache/arrow/blob/main/format/Message.fbs#L154)), allowing per-batch metadata separate from schema-level metadata. Currently, the Rust `RecordBatch` struct has no `custom_metadata` field and the IPC reader/writer ignore it.

PyArrow has supported this since v11.0.0 via `write_batch(batch, custom_metadata=...)` and `read_next_batch_with_custom_metadata()`. This means IPC files written by PyArrow with per-batch metadata lose that metadata when read by arrow-rs.

**Describe the solution you'd like**

1. Add a `custom_metadata: HashMap<String, String>` field to `RecordBatch` with accessor methods (`custom_metadata()`, `custom_metadata_mut()`, `with_custom_metadata()`, `into_parts_with_custom_metadata()`)
2. IPC writer: serialize `custom_metadata` to the `Message` flatbuffer when writing record batches
3. IPC reader: extract `custom_metadata` from the `Message` at all reader call sites (`FileDecoder`, `StreamReader`, `StreamDecoder`)
4. arrow-flight: extract and propagate `custom_metadata` in `flight_data_to_arrow_batch`
5. arrow-select: preserve `custom_metadata` through `filter_record_batch` and `take_record_batch`
6. Preserve metadata through `slice()`, `project()`, `normalize()`, `with_schema()`, and `remove_column()`

**Describe alternatives you've considered**

- Storing per-batch metadata in schema-level metadata with a naming convention — this conflates two levels of metadata and doesn't match the IPC format's intent.
- An `Option<HashMap<String, String>>` instead of `HashMap<String, String>` — `HashMap::new()` is zero-allocation so the overhead is minimal, and `Option` complicates every accessor for little gain.

**Additional context**

- `HashMap::new()` does not heap-allocate, so there is no performance concern for the default (empty metadata) case.
- The existing `into_parts()` signature is unchanged for backward compatibility; a new `into_parts_with_custom_metadata()` is added.
- Multi-batch merge operations (`concat_batches`, `interleave_record_batch`, `BatchCoalescer`) intentionally do not propagate per-batch metadata since the semantics are ambiguous when merging batches with different metadata.
- Reuses existing `metadata_to_fb` (convert.rs) for writing and the KV extraction pattern for reading.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support per-batch `custom_metadata` on `RecordBatch` (IPC Message field) #9444

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Support per-batch custom_metadata on RecordBatch (IPC Message field) #9444

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Support per-batch `custom_metadata` on `RecordBatch` (IPC Message field) #9444