Skip to content

fix: replace an inefficient loop in kafka internals#13162

Closed
dpcollins-google wants to merge 4 commits intoapache:trunkfrom
dpcollins-google:patch-1
Closed

fix: replace an inefficient loop in kafka internals#13162
dpcollins-google wants to merge 4 commits intoapache:trunkfrom
dpcollins-google:patch-1

Conversation

@dpcollins-google
Copy link
Copy Markdown

Instead use Channels.newChannel to write in larger chunks

Instead use Channels.newChannel to write in larger chunks
int pos = buffer.position();
for (int i = pos; i < length + pos; i++)
out.writeByte(buffer.get(i));
Channels.newChannel(out).write(buffer);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Is this preserving the same behaviour i.e. copy in the range [pos, pos + length]?
  • This is mutating the position of the buffer, as opposed to the current implementation.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes, per the docs of WritableByteChannel
Writes a sequence of bytes to this channel from the given buffer.
An attempt is made to write up to r bytes to the channel, where r is the number of bytes remaining in the buffer, that is, src.remaining(), at the moment this method is invoked.

Suppose that a byte sequence of length n is written, where 0 <= n <= r. This byte sequence will be transferred from the buffer starting at index p, where p is the buffer's position at the moment this method is invoked; the index of the last byte written will be p + n - 1. Upon return the buffer's position will be equal to p + n; its limit will not have changed.
  1. Good point, although I don't think this has an effect on any of the 4 current users (DefaultRecord.writeTo and LegacyRecord.writeTo for writing key and value), I've added a defensive call to asReadOnlyBuffer.

Copy link
Copy Markdown

@Hangleton Hangleton Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up.

  1. This new implementation ignores the length argument provided to the method if the buffer is not backed by an array. What if length does not equal the number of remaining bytes on the buffer?

  2. Is there an actual optimization offered by calling write? The implementation of direct buffers use a similar linear iteration. Do you have data showing performance improvements with this implementation?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per 1): This parameter is always buffer.remaining(), I've cleaned up the call sites and removed this parameter.

Per 2): Yes, its substantial. The reason is WritableByteChannelImpl writes in 8k chunks when feasible, instead of 1 byte chunks https://github.com/AdoptOpenJDK/openjdk-jdk8u/blob/2544d2a351eca1a3d62276f969dd2d95e4a4d2b6/jdk/src/share/classes/java/nio/channels/Channels.java#L442

I can't show benchmarks unfortunately to demonstrate this, as they're of a production application and collected using internal tooling

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Thanks.

Would you have performance gains at a high-level, without sharing details on the application?

Publicly available data showing the expectable gains would back up the PR further. Has this been discussed in the dev mailing list already?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Can you please provide a bit more detail regarding the workload? I assume we're talking about the producer here. If so, can you please share whether compression was used, the compression algorithm (if applicable) and the average message size.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is producer code, with no compression (the data is pre-encrypted so it would be useless anyway) and the message size is 1-2 kB

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping @ijuma / @Hangleton , are there any blockers to getting this merged?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies for the delay - would you have any JMH benchmark for this change? E.g. something like in #13312.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. I was trying to understand how to show the improvement, but we seem to always pass a heap byte buffer. How can I reproduce this improvement?

@github-actions
Copy link
Copy Markdown

This PR is being marked as stale since it has not had any activity in 90 days. If you would like to keep this PR alive, please ask a committer for review. If the PR has merge conflicts, please update it with the latest from trunk (or appropriate release branch)
If this PR is no longer valid or desired, please feel free to close it. If no activity occurrs in the next 30 days, it will be automatically closed.

@github-actions github-actions Bot added the stale Stale PRs label Jun 10, 2023
@github-actions github-actions Bot removed the stale Stale PRs label Dec 27, 2024
@github-actions
Copy link
Copy Markdown

This PR is being marked as stale since it has not had any activity in 90 days. If you
would like to keep this PR alive, please leave a comment asking for a review. If the PR has
merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out on the [mailing list](https://kafka.apache.org/contact).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 30 days, it will be automatically closed.

@github-actions github-actions Bot added the stale Stale PRs label Mar 27, 2025
@github-actions
Copy link
Copy Markdown

This PR has been closed since it has not had any activity in 120 days. If you feel like this
was a mistake, or you would like to continue working on it, please feel free to re-open the
PR and ask for a review.

@github-actions github-actions Bot added the closed-stale PRs that were closed due to inactivity label Apr 26, 2025
@github-actions github-actions Bot closed this Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

closed-stale PRs that were closed due to inactivity stale Stale PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants