Skip to content

refactor: reorganize shuffle crate module structure#3772

Merged
andygrove merged 3 commits intoapache:mainfrom
andygrove:refactor-shuffle-crate
Mar 27, 2026
Merged

refactor: reorganize shuffle crate module structure#3772
andygrove merged 3 commits intoapache:mainfrom
andygrove:refactor-shuffle-crate

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Mar 23, 2026

Which issue does this PR close?

N/A

Rationale for this change

The shuffle crate grew organically and ended up with related code scattered across modules. This PR groups similar things together to make the crate easier to navigate. It is also a step towards potentially having multiple implementations of some traits so that we can compare performance of different approaches with different workloads.

What changes are included in this PR?

  • Move CompressionCodec and ShuffleBlockWriter from codec.rs into writers/shuffle_block_writer.rs, inlining the codec enum alongside its primary consumer
  • Move Checksum from codec.rs into writers/checksum.rs to keep all write-path types together
  • Rename codec.rsipc.rs (now solely contains read_ipc_compressed)
  • Rename writers/partition_writer.rswriters/spill.rs to better reflect its spill-management responsibility
  • Extract SparkUnsafeObject trait and impl_primitive_accessors\! macro from the overloaded spark_unsafe/row.rs into spark_unsafe/unsafe_object.rs
  • Extract ShufflePartitioner trait from partitioners/mod.rs into partitioners/traits.rs
  • Add concise rustdoc comments to all structs, enums, and traits that were missing them

How are these changes tested?

Existing tests

@andygrove andygrove marked this pull request as ready for review March 23, 2026 14:19
@andygrove andygrove marked this pull request as draft March 23, 2026 14:20
Split and reorganize the shuffle crate for better cohesion:

- Move `CompressionCodec` and `ShuffleBlockWriter` from `codec.rs`
  into `writers/shuffle_block_writer.rs`; inline the codec enum
  alongside its primary consumer
- Move `Checksum` from `codec.rs` into `writers/checksum.rs` to
  keep all write-path types together
- Rename `codec.rs` → `ipc.rs` (now only contains `read_ipc_compressed`)
- Rename `writers/partition_writer.rs` → `writers/spill.rs` to better
  reflect its spill-management responsibility
- Extract `SparkUnsafeObject` trait and `impl_primitive_accessors\!` macro
  from `spark_unsafe/row.rs` into `spark_unsafe/unsafe_object.rs`
- Extract `ShufflePartitioner` trait from `partitioners/mod.rs` into
  `partitioners/traits.rs`
- Add concise rustdoc comments to all structs, enums, and traits that
  were missing them
@andygrove andygrove force-pushed the refactor-shuffle-crate branch from 0bd4012 to 8f22ffe Compare March 23, 2026 14:21
Make `unsafe_object` module public and update the bench import to use
the correct path for `SparkUnsafeObject`.
Copy link
Copy Markdown
Contributor

@mbutrovich mbutrovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Being able to swap in implementations cleanly will be nice to experiment with. Thanks @andygrove!

Copy link
Copy Markdown
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @andygrove

@andygrove andygrove merged commit 4edd904 into apache:main Mar 27, 2026
119 checks passed
@andygrove andygrove deleted the refactor-shuffle-crate branch March 27, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants