Skip to content

feat: add custom_metadata support to RecordBatch with IPC read/write#9445

Open
rustyconover wants to merge 2 commits intoapache:mainfrom
rustyconover:feat/recordbatch-custom-metadata
Open

feat: add custom_metadata support to RecordBatch with IPC read/write#9445
rustyconover wants to merge 2 commits intoapache:mainfrom
rustyconover:feat/recordbatch-custom-metadata

Conversation

@rustyconover
Copy link
Copy Markdown

Which issue does this PR close?

What changes are included in this PR?

Add per-batch custom_metadata to RecordBatch, matching the custom_metadata field on the IPC Message flatbuffer envelope. This allows attaching per-batch metadata separate from schema-level metadata, bringing parity with PyArrow's write_batch(custom_metadata=...) API (available since PyArrow v11.0.0).

Changes:

  • Add custom_metadata: HashMap<String, String> field to RecordBatch with custom_metadata(), custom_metadata_mut(), with_custom_metadata(), and into_parts_with_custom_metadata() accessors
  • IPC writer: serialize custom_metadata to Message flatbuffer
  • IPC reader: extract custom_metadata from Message at FileDecoder, StreamReader, and StreamDecoder call sites
  • arrow-flight: extract and propagate custom_metadata in flight_data_to_arrow_batch
  • arrow-select: preserve custom_metadata through filter_record_batch and take_record_batch
  • Metadata preserved through slice(), project(), normalize(), with_schema(), and remove_column()
  • PyArrow-generated test data for cross-language interop validation

Are these changes tested?

Yes there are tests in the PR.

Are there any user-facing changes?

There are no breaking changes.

Written with AI assistance; all changes reviewed by the author.

@github-actions github-actions Bot added arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate labels Feb 20, 2026
@rustyconover rustyconover force-pushed the feat/recordbatch-custom-metadata branch from 1ebc16a to 31227eb Compare February 25, 2026 17:51
@rustyconover rustyconover force-pushed the feat/recordbatch-custom-metadata branch from 31227eb to 9390ec4 Compare April 26, 2026 20:07
rustyconover added a commit to Query-farm/vgi-rpc-rust that referenced this pull request Apr 26, 2026
…ustom_metadata

Pin arrow-rs to rustyconover/arrow-rs#feat/recordbatch-custom-metadata
(apache/arrow-rs#9445) and rewrite vgi-rpc/src/wire.rs as a thin wrapper
around arrow_ipc::reader::StreamReader / writer::StreamWriter. Per-batch
metadata now travels on RecordBatch directly via with_custom_metadata()
/ custom_metadata(); the Metadata alias becomes HashMap<String, String>
and the ReadBatch wrapper is gone. relax_nullability flips
with_skip_validation(true) on the inner reader since upstream validates
before our schema rewrap.

Also bundles in-progress conformance worker, http, and arrow_type
changes that were already pending on the branch.

Conformance: 723/723 across pipe/subprocess/http/unix/externalize.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rustyconover -- I see the custom_metadata field on the IPC messages and that makes sense to expose somehow

I am less sure about adding a new field to RecordBatch -- mostly as I am not sure about the implications of doing so (though your point that an empty HashMap has no allocations is a good one)

I mostly am thinking about our experience in other libraries trying to handle custom metadata on Fields where many kernels / processing don't preserve the metadata and it has been quite tough

I fear the same thing would happen to this field -- basically that it would not be used by most libraries but they would all pay the size cost on every RecordBatch

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 28, 2026

I wonder if @tustvold @jhorstmann @viirya or @kylebarron have any thoughts on this matter (adding custom metadata to every RecordBatch)

@rustyconover
Copy link
Copy Markdown
Author

Hi @alamb I've patched arrow-go and arrow-js. And I have others with working patches into arrow-java. So its mostly about connectivity for me.

@rustyconover
Copy link
Copy Markdown
Author

rustyconover commented Apr 29, 2026

Hi @alamb, thanks for your review. I think the CI passed. Is there more you'd like me to do? Being a new contributor to arrow-rs I'm a bit unsure.

This is going to be the primary user to start: https://github.com/Query-farm/vgi-rpc-rust

@alamb
Copy link
Copy Markdown
Contributor

alamb commented Apr 29, 2026

Thanks @rustyconover -- there isn't anything I think you need to do

What i think is next needed is some buy in from other maintainers / stakeholders that changing RecordBatch is a reasonable thing to do given the tradeoffs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

arrow Changes to the arrow crate arrow-flight Changes to the arrow-flight crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support per-batch custom_metadata on RecordBatch (IPC Message field)

2 participants