-
Notifications
You must be signed in to change notification settings - Fork 3.7k
[improve][pip] PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance #23526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…compression performance
76ffe40 to
afff757
Compare
|
Please add the PIP number to the PR title as we usually do. |
|
@liangyepianzhou Please go ahead and close the vote thread. We don't need 3 binding votes for PIPs anymore, that was clarified in #23387. |
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… improve compression performance (apache#23526) Co-authored-by: xiangying <mengxiangying@xiaohongshu.com>
…on performance (#23525) PIP: #23526 ### Motivation The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages. We want to add a new configuration compressMinMsgBodySize to the producer configuration. This configuration will allow the user to set the minimum size of the message body that will be compressed. If the message body size is less than the compressMinMsgBodySize, the message will not be compressed.
|
I have a question: Is the compression process before the messages batched or after? |
If batching is enabled, compression is done after the batch. The user may or may not enable batching. The batch may be large or small. So batch can't solve all problems. |
…batching modes (#24102) Co-authored-by: xiangying <mengxiangying@xiaohongshu.com> #23526 ## Motivation Fix inconsistent compression threshold behavior across batching modes. For single message sending, compression is enabled when the message size is greater than or equal to the threshold, while for batch sending, compression occurs when the size is greater than the threshold. ## Modifications 1. Standardize the criteria for determining the threshold to enable compression. 2. Optimize formatting. 3. Enhance testing.
…for compression configuration flexibility (#24164) PIP: #23526 #### Motivation Currently, the `ProducerBuilder` API lacks the ability to directly configure the `compressMinMsgBodySize` property during producer initialization. Users are forced to modify the internal `ProducerConfigurationData` post-creation, which: 1. **Violates encapsulation** by exposing internal configuration fields. 2. **Introduces inconsistency** with other compression settings (e.g., `compressionType`) that are already configurable via the builder. 3. **Creates usability hurdles**, requiring workarounds like: ```java Producer<byte[]> producer = client.newProducer().topic("test").create(); producer.conf.setCompressMinMsgBodySize(1024); // Insecure and error-prone ``` This change aligns the builder API with standard compression configuration patterns and improves developer ergonomics. #### Modifications 1. **API Enhancement**: - Added `compressMinMsgBodySize(int)` method to the `ProducerBuilder` interface. - Propagate the configured value to `ProducerConfigurationData` during producer initialization. 2. **Behavior Preservation**: - Default value remains unchanged if the method is not invoked. - Fully backward-compatible with existing code. 3. **Testing & Documentation**: - Added unit tests to validate compression threshold behavior via the new builder method. - Updated Javadoc and configuration examples to reflect the new API.
…for compression configuration flexibility (apache#24164) PIP: apache#23526 #### Motivation Currently, the `ProducerBuilder` API lacks the ability to directly configure the `compressMinMsgBodySize` property during producer initialization. Users are forced to modify the internal `ProducerConfigurationData` post-creation, which: 1. **Violates encapsulation** by exposing internal configuration fields. 2. **Introduces inconsistency** with other compression settings (e.g., `compressionType`) that are already configurable via the builder. 3. **Creates usability hurdles**, requiring workarounds like: ```java Producer<byte[]> producer = client.newProducer().topic("test").create(); producer.conf.setCompressMinMsgBodySize(1024); // Insecure and error-prone ``` This change aligns the builder API with standard compression configuration patterns and improves developer ergonomics. #### Modifications 1. **API Enhancement**: - Added `compressMinMsgBodySize(int)` method to the `ProducerBuilder` interface. - Propagate the configured value to `ProducerConfigurationData` during producer initialization. 2. **Behavior Preservation**: - Default value remains unchanged if the method is not invoked. - Fully backward-compatible with existing code. 3. **Testing & Documentation**: - Added unit tests to validate compression threshold behavior via the new builder method. - Updated Javadoc and configuration examples to reflect the new API.
…batching modes (apache#24102) Co-authored-by: xiangying <mengxiangying@xiaohongshu.com> apache#23526 ## Motivation Fix inconsistent compression threshold behavior across batching modes. For single message sending, compression is enabled when the message size is greater than or equal to the threshold, while for batch sending, compression occurs when the size is greater than the threshold. ## Modifications 1. Standardize the criteria for determining the threshold to enable compression. 2. Optimize formatting. 3. Enhance testing.
…for compression configuration flexibility (apache#24164) PIP: apache#23526 #### Motivation Currently, the `ProducerBuilder` API lacks the ability to directly configure the `compressMinMsgBodySize` property during producer initialization. Users are forced to modify the internal `ProducerConfigurationData` post-creation, which: 1. **Violates encapsulation** by exposing internal configuration fields. 2. **Introduces inconsistency** with other compression settings (e.g., `compressionType`) that are already configurable via the builder. 3. **Creates usability hurdles**, requiring workarounds like: ```java Producer<byte[]> producer = client.newProducer().topic("test").create(); producer.conf.setCompressMinMsgBodySize(1024); // Insecure and error-prone ``` This change aligns the builder API with standard compression configuration patterns and improves developer ergonomics. #### Modifications 1. **API Enhancement**: - Added `compressMinMsgBodySize(int)` method to the `ProducerBuilder` interface. - Propagate the configured value to `ProducerConfigurationData` during producer initialization. 2. **Behavior Preservation**: - Default value remains unchanged if the method is not invoked. - Fully backward-compatible with existing code. 3. **Testing & Documentation**: - Added unit tests to validate compression threshold behavior via the new builder method. - Updated Javadoc and configuration examples to reflect the new API.
Motivation
The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages.
We want to add a new configuration
compressMinMsgBodySizeto the producer configuration.This configuration will allow the user to set the minimum size of the message body that will be compressed.
If the message body size is less than the
compressMinMsgBodySize, the message will not be compressed.Verifying this change
(Please pick either of the following options)
This change is a trivial rework / code cleanup without any test coverage.
(or)
This change is already covered by existing tests, such as (please describe tests).
(or)
This change added tests and can be verified as follows:
(example:)
Does this pull request potentially affect one of the following parts:
If the box was checked, please highlight the changes
Documentation
docdoc-requireddoc-not-neededdoc-completeMatching PR in forked repository
PR in forked repository: