Explore zero-copy batch coalescing during shuffle write

## Description

Currently, when writing shuffle data with many output partitions, each input batch gets split into many small per-partition batches (e.g. 8192 rows across 200 partitions ≈ ~41 rows per partition). Before serialization, we use Arrow's `BatchCoalescer` to combine these small batches into larger ones to reduce per-batch IPC overhead.

However, `BatchCoalescer` allocates new arrays and copies row data to produce the coalesced batch. This is unnecessary work — the data is about to be serialized anyway.

## Proposed Optimization

Explore writing multiple small batches as a single IPC record batch message by concatenating their Arrow buffers directly in the IPC writer, avoiding the intermediate copy. This would be a "coalesce-on-write" approach:

- Instead of: small batches → `BatchCoalescer` (alloc + copy) → large batch → IPC serialize
- Do: small batches → IPC serialize directly as one message (concatenate buffers)

This would eliminate one full data copy per batch in the shuffle write hot path.

## Context

This applies to both the block-based and IPC stream shuffle formats. The `BatchCoalescer` is used in `BufBatchWriter` (block format) and in the IPC stream multi-partition write path.

Benchmark data (200 partitions, LZ4, 10M rows from TPC-H SF100 lineitem):
- Block format: 2.64M rows/s, 609 MiB output
- IPC stream format: 2.60M rows/s, 634 MiB output

Both paths use `BatchCoalescer` and would benefit from this optimization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore zero-copy batch coalescing during shuffle write #3836

Description

Proposed Optimization

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Explore zero-copy batch coalescing during shuffle write #3836

Description

Description

Proposed Optimization

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions