Skip to content

fix: use min instead of max when capping write buffer size to Int range#3914

Merged
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:fix-write-buffer-size-config
Apr 8, 2026
Merged

fix: use min instead of max when capping write buffer size to Int range#3914
mbutrovich merged 1 commit intoapache:mainfrom
andygrove:fix-write-buffer-size-config

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 8, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (configured via bytesConf) but the protobuf ShuffleWriter.write_buffer_size field is int32. The code converts with .max(Int.MaxValue).toInt, which always evaluates to Int.MaxValue (~2GB) regardless of the actual configured value. The intent was to cap the value at Int.MaxValue, which requires .min(Int.MaxValue).

The practical impact is limited: the write buffer is a Vec<u8> that grows organically and flushes when it exceeds the configured threshold. With a 2GB threshold, the buffer effectively never flushes early — it accumulates all serialized IPC bytes for a partition until flush() is called at the end of processing. This means the spark.comet.exec.shuffle.writeBufferSize config is silently ignored, but it does not cause excessive memory allocation since the buffer only grows to match the actual data written.

What changes are included in this PR?

One-line fix: .max(Int.MaxValue).min(Int.MaxValue) in CometNativeShuffleWriter.scala.

How are these changes tested?

Existing tests. The fix is a straightforward logic correction with no behavioral change for values already within Int range (which is all practical values).

COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (bytesConf) but the protobuf
field is int32, so the value must be capped at Int.MaxValue. The code
used .max(Int.MaxValue) which always returns Int.MaxValue (~2GB)
regardless of the configured value. Should be .min(Int.MaxValue) to
preserve smaller values while capping at the Int range.
Copy link
Copy Markdown
Contributor

@kazuyukitanimura kazuyukitanimura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pending ci

@mbutrovich mbutrovich merged commit 967a81e into apache:main Apr 8, 2026
176 of 178 checks passed
@andygrove andygrove deleted the fix-write-buffer-size-config branch April 8, 2026 23:12
andygrove added a commit to andygrove/datafusion-comet that referenced this pull request Apr 13, 2026
…ge (apache#3914)

COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (bytesConf) but the protobuf
field is int32, so the value must be capped at Int.MaxValue. The code
used .max(Int.MaxValue) which always returns Int.MaxValue (~2GB)
regardless of the configured value. Should be .min(Int.MaxValue) to
preserve smaller values while capping at the Int range.
andygrove added a commit that referenced this pull request Apr 14, 2026
…ge (#3914) (#3936)

COMET_SHUFFLE_WRITE_BUFFER_SIZE is a Long (bytesConf) but the protobuf
field is int32, so the value must be capped at Int.MaxValue. The code
used .max(Int.MaxValue) which always returns Int.MaxValue (~2GB)
regardless of the configured value. Should be .min(Int.MaxValue) to
preserve smaller values while capping at the Int range.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants