Skip to content

Conversation

@liangyepianzhou
Copy link
Contributor

Motivation

The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages.
We want to add a new configuration compressMinMsgBodySize to the producer configuration.
This configuration will allow the user to set the minimum size of the message body that will be compressed.
If the message body size is less than the compressMinMsgBodySize, the message will not be compressed.

Verifying this change

  • Make sure that the change passes the CI checks.

(Please pick either of the following options)

This change is a trivial rework / code cleanup without any test coverage.

(or)

This change is already covered by existing tests, such as (please describe tests).

(or)

This change added tests and can be verified as follows:

(example:)

  • Added integration tests for end-to-end deployment with large payloads (10MB)
  • Extended integration test for recovery after broker failure

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository:

@github-actions github-actions bot added PIP doc-required Your PR changes impact docs and you will update later. labels Oct 29, 2024
@lhotari
Copy link
Member

lhotari commented Oct 30, 2024

Please add the PIP number to the PR title as we usually do.

@liangyepianzhou liangyepianzhou changed the title [improve][pip] Add Producer config compressMinMsgBodySize to improve compression performance [improve][pip]PIP-389 Add Producer config compressMinMsgBodySize to improve compression performance Oct 30, 2024
@lhotari lhotari changed the title [improve][pip]PIP-389 Add Producer config compressMinMsgBodySize to improve compression performance [improve][pip]PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance Nov 27, 2024
@lhotari
Copy link
Member

lhotari commented Nov 27, 2024

@liangyepianzhou Please go ahead and close the vote thread. We don't need 3 binding votes for PIPs anymore, that was clarified in #23387.

Copy link
Member

@lhotari lhotari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lhotari lhotari changed the title [improve][pip]PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance [improve][pip] PIP-389: Add Producer config compressMinMsgBodySize to improve compression performance Nov 27, 2024
@liangyepianzhou liangyepianzhou merged commit c50fa56 into master Nov 28, 2024
@liangyepianzhou liangyepianzhou deleted the pip/pip-389 branch November 28, 2024 11:33
hanmz pushed a commit to hanmz/pulsar that referenced this pull request Feb 12, 2025
… improve compression performance (apache#23526)

Co-authored-by: xiangying <mengxiangying@xiaohongshu.com>
liangyepianzhou added a commit that referenced this pull request Feb 17, 2025
…on performance (#23525)

PIP: #23526
### Motivation

The motivation of this PIP is to provide a way to improve the compression performance by skipping the compression of small messages.
We want to add a new configuration compressMinMsgBodySize to the producer configuration.
This configuration will allow the user to set the minimum size of the message body that will be compressed.
If the message body size is less than the compressMinMsgBodySize, the message will not be compressed.
@thetumbled
Copy link
Member

I have a question: Is the compression process before the messages batched or after?
If messages are compressed after they are batched, small message problem may be avoided?

@liangyepianzhou
Copy link
Contributor Author

I have a question: Is the compression process before the messages batched or after? If messages are compressed after they are batched, small message problem may be avoided?

If batching is enabled, compression is done after the batch. The user may or may not enable batching. The batch may be large or small. So batch can't solve all problems.

liangyepianzhou added a commit that referenced this pull request Mar 25, 2025
…batching modes (#24102)

Co-authored-by: xiangying <mengxiangying@xiaohongshu.com>
#23526
## Motivation
Fix inconsistent compression threshold behavior across batching modes. For single message sending, compression is enabled when the message size is greater than or equal to the threshold, while for batch sending, compression occurs when the size is greater than the threshold.

## Modifications
1. Standardize the criteria for determining the threshold to enable compression.
2. Optimize formatting.
3. Enhance testing.
liangyepianzhou added a commit that referenced this pull request Apr 9, 2025
…for compression configuration flexibility (#24164)

PIP: #23526

#### Motivation  
Currently, the `ProducerBuilder` API lacks the ability to directly configure the `compressMinMsgBodySize` property during producer initialization. Users are forced to modify the internal `ProducerConfigurationData` post-creation, which:  
1. **Violates encapsulation** by exposing internal configuration fields.  
2. **Introduces inconsistency** with other compression settings (e.g., `compressionType`) that are already configurable via the builder.  
3. **Creates usability hurdles**, requiring workarounds like:  
   ```java  
   Producer<byte[]> producer = client.newProducer().topic("test").create();  
   producer.conf.setCompressMinMsgBodySize(1024); // Insecure and error-prone  
   ```  
This change aligns the builder API with standard compression configuration patterns and improves developer ergonomics.  

#### Modifications  
1. **API Enhancement**:  
   - Added `compressMinMsgBodySize(int)` method to the `ProducerBuilder` interface.  
   - Propagate the configured value to `ProducerConfigurationData` during producer initialization.  
2. **Behavior Preservation**:  
   - Default value remains unchanged if the method is not invoked.  
   - Fully backward-compatible with existing code.  
3. **Testing & Documentation**:  
   - Added unit tests to validate compression threshold behavior via the new builder method.  
   - Updated Javadoc and configuration examples to reflect the new API.
poorbarcode pushed a commit to poorbarcode/pulsar that referenced this pull request Apr 15, 2025
…for compression configuration flexibility (apache#24164)

PIP: apache#23526

#### Motivation  
Currently, the `ProducerBuilder` API lacks the ability to directly configure the `compressMinMsgBodySize` property during producer initialization. Users are forced to modify the internal `ProducerConfigurationData` post-creation, which:  
1. **Violates encapsulation** by exposing internal configuration fields.  
2. **Introduces inconsistency** with other compression settings (e.g., `compressionType`) that are already configurable via the builder.  
3. **Creates usability hurdles**, requiring workarounds like:  
   ```java  
   Producer<byte[]> producer = client.newProducer().topic("test").create();  
   producer.conf.setCompressMinMsgBodySize(1024); // Insecure and error-prone  
   ```  
This change aligns the builder API with standard compression configuration patterns and improves developer ergonomics.  

#### Modifications  
1. **API Enhancement**:  
   - Added `compressMinMsgBodySize(int)` method to the `ProducerBuilder` interface.  
   - Propagate the configured value to `ProducerConfigurationData` during producer initialization.  
2. **Behavior Preservation**:  
   - Default value remains unchanged if the method is not invoked.  
   - Fully backward-compatible with existing code.  
3. **Testing & Documentation**:  
   - Added unit tests to validate compression threshold behavior via the new builder method.  
   - Updated Javadoc and configuration examples to reflect the new API.
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
…batching modes (apache#24102)

Co-authored-by: xiangying <mengxiangying@xiaohongshu.com>
apache#23526
## Motivation
Fix inconsistent compression threshold behavior across batching modes. For single message sending, compression is enabled when the message size is greater than or equal to the threshold, while for batch sending, compression occurs when the size is greater than the threshold.

## Modifications
1. Standardize the criteria for determining the threshold to enable compression.
2. Optimize formatting.
3. Enhance testing.
walkinggo pushed a commit to walkinggo/pulsar that referenced this pull request Oct 8, 2025
…for compression configuration flexibility (apache#24164)

PIP: apache#23526

#### Motivation  
Currently, the `ProducerBuilder` API lacks the ability to directly configure the `compressMinMsgBodySize` property during producer initialization. Users are forced to modify the internal `ProducerConfigurationData` post-creation, which:  
1. **Violates encapsulation** by exposing internal configuration fields.  
2. **Introduces inconsistency** with other compression settings (e.g., `compressionType`) that are already configurable via the builder.  
3. **Creates usability hurdles**, requiring workarounds like:  
   ```java  
   Producer<byte[]> producer = client.newProducer().topic("test").create();  
   producer.conf.setCompressMinMsgBodySize(1024); // Insecure and error-prone  
   ```  
This change aligns the builder API with standard compression configuration patterns and improves developer ergonomics.  

#### Modifications  
1. **API Enhancement**:  
   - Added `compressMinMsgBodySize(int)` method to the `ProducerBuilder` interface.  
   - Propagate the configured value to `ProducerConfigurationData` during producer initialization.  
2. **Behavior Preservation**:  
   - Default value remains unchanged if the method is not invoked.  
   - Fully backward-compatible with existing code.  
3. **Testing & Documentation**:  
   - Added unit tests to validate compression threshold behavior via the new builder method.  
   - Updated Javadoc and configuration examples to reflect the new API.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

doc-required Your PR changes impact docs and you will update later. PIP

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants