fix: use min instead of max when capping write buffer size to Int range by andygrove · Pull Request #3914 · apache/datafusion-comet

andygrove · 2026-04-08T20:27:00Z

Which issue does this PR close?

Closes #.

Rationale for this change

COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (configured via bytesConf) but the protobuf ShuffleWriter.write_buffer_size field is int32. The code converts with .max(Int.MaxValue).toInt, which always evaluates to Int.MaxValue (~2GB) regardless of the actual configured value. The intent was to cap the value at Int.MaxValue, which requires .min(Int.MaxValue).

The practical impact is limited: the write buffer is a Vec<u8> that grows organically and flushes when it exceeds the configured threshold. With a 2GB threshold, the buffer effectively never flushes early — it accumulates all serialized IPC bytes for a partition until flush() is called at the end of processing. This means the spark.comet.exec.shuffle.writeBufferSize config is silently ignored, but it does not cause excessive memory allocation since the buffer only grows to match the actual data written.

What changes are included in this PR?

One-line fix: .max(Int.MaxValue) → .min(Int.MaxValue) in CometNativeShuffleWriter.scala.

How are these changes tested?

Existing tests. The fix is a straightforward logic correction with no behavioral change for values already within Int range (which is all practical values).

COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (bytesConf) but the protobuf field is int32, so the value must be capped at Int.MaxValue. The code used .max(Int.MaxValue) which always returns Int.MaxValue (~2GB) regardless of the configured value. Should be .min(Int.MaxValue) to preserve smaller values while capping at the Int range.

kazuyukitanimura

pending ci

…ge (apache#3914) COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (bytesConf) but the protobuf field is int32, so the value must be capped at Int.MaxValue. The code used .max(Int.MaxValue) which always returns Int.MaxValue (~2GB) regardless of the configured value. Should be .min(Int.MaxValue) to preserve smaller values while capping at the Int range.

…ge (#3914) (#3936) COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (bytesConf) but the protobuf field is int32, so the value must be capped at Int.MaxValue. The code used .max(Int.MaxValue) which always returns Int.MaxValue (~2GB) regardless of the configured value. Should be .min(Int.MaxValue) to preserve smaller values while capping at the Int range.

kazuyukitanimura approved these changes Apr 8, 2026

View reviewed changes

mbutrovich merged commit 967a81e into apache:main Apr 8, 2026
176 of 178 checks passed

andygrove deleted the fix-write-buffer-size-config branch April 8, 2026 23:12

andygrove mentioned this pull request Apr 13, 2026

fix: [branch-0.14] backport #3914 - use min instead of max when capping write buffer size to Int range #3936

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: use min instead of max when capping write buffer size to Int range#3914

fix: use min instead of max when capping write buffer size to Int range#3914
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:fix-write-buffer-size-config

andygrove commented Apr 8, 2026 •

edited

Loading

Uh oh!

kazuyukitanimura left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

andygrove commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

kazuyukitanimura left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

andygrove commented Apr 8, 2026 •

edited

Loading