InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) #18245

eolivelli · 2022-10-28T10:23:43Z

Motivation

Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache.

When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them

Modifications

Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads.
Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB
The feature is disabled by default
Add new metrics to track the status of the broker

Documentation

doc
doc-required
doc-not-needed
doc-complete

Matching PR in forked repository

PR in forked repository: eolivelli#19

…m storage/cache to the write to the consumer channel) Motivation: Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache. When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them Modifications: - Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads. - Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB - The feature is disabled by default - Add new metrics to track the values

codelipenghui · 2022-11-01T10:43:39Z

Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread

@eolivelli Does it only happen when the broker has many subscriptions? For one subscription, we always trigger the new read entries operation after sending messages to consumers. We only have a single active read entries operation at a time. Is it able to reproduce?

eolivelli · 2022-11-10T07:30:54Z

@codelipenghui you are correct, the problems arise when you have a broker with many subscriptions on the same topic (and many topics).
There is no broker level guardrail at the moment.
With this patch the memory used to handle outbound traffic is capped and the limit is independent from the number of active subscriptions on the broker

codecov-commenter · 2022-11-10T07:58:17Z

Codecov Report

Merging #18245 (9cc5757) into master (b31c5a6) will increase coverage by 0.98%.
The diff coverage is 45.65%.

@@             Coverage Diff              @@
##             master   #18245      +/-   ##
============================================
+ Coverage     46.98%   47.97%   +0.98%     
+ Complexity    10343     9370     -973     
============================================
  Files           692      613      -79     
  Lines         67766    58415    -9351     
  Branches       7259     6087    -1172     
============================================
- Hits          31842    28026    -3816     
+ Misses        32344    27377    -4967     
+ Partials       3580     3012     -568

Flag	Coverage Δ
unittests	`47.97% <45.65%> (+0.98%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...lsar/broker/service/RedeliveryTrackerDisabled.java	`50.00% <ø> (ø)`
...va/org/apache/pulsar/broker/service/ServerCnx.java	`49.20% <ø> (+0.51%)`	⬆️
...ersistentStreamingDispatcherMultipleConsumers.java	`0.00% <0.00%> (ø)`
.../java/org/apache/pulsar/client/impl/ClientCnx.java	`30.16% <ø> (ø)`
...a/org/apache/pulsar/client/impl/TableViewImpl.java	`0.00% <0.00%> (ø)`
...ar/client/impl/conf/ProducerConfigurationData.java	`84.70% <ø> (-0.18%)`	⬇️
...va/org/apache/pulsar/client/impl/ConsumerImpl.java	`15.09% <12.50%> (+0.05%)`	⬆️
...sistent/PersistentDispatcherMultipleConsumers.java	`58.49% <66.66%> (+1.67%)`	⬆️
...ache/pulsar/broker/ManagedLedgerClientFactory.java	`62.16% <100.00%> (+1.05%)`	⬆️
.../pulsar/broker/service/AbstractBaseDispatcher.java	`60.81% <100.00%> (+2.89%)`	⬆️
... and 134 more

nicoloboschi

LGTM

codelipenghui

Great change.
I have some minor comments.

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/EntryImpl.java

codelipenghui · 2022-11-10T13:12:09Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/cache/RangeEntryCacheImpl.java

+                public void readEntriesComplete(List<Entry> entries, Object ctx) {
+                    if (!entries.isEmpty()) {
+                        long size = entries.get(0).getLength();
+                        estimatedEntrySize = size;


Can we use the avgMessagesPerEntry from the consumer?
The RangeEntryCacheImpl.java is shared across all the topics. If calculated at the topic level, we should be able to get a more precise estimated entry size.

Unfortunately here we are in "managed-ledger" module and in order to get a value from the Dispatcher/Consumer I would have to change many internal APIs.

If the size of entries in the topics is similar for all the entries that I think that this is a good estimate.

codelipenghui · 2022-11-11T00:52:13Z

/pulsarbot run-failure-checks

Jason918 · 2022-11-11T09:48:00Z

managed-ledger/src/main/java/org/apache/bookkeeper/mledger/impl/EntryImpl.java

    private long entryId;
    ByteBuf data;

+    private Runnable onDeallocate;


This is not volatile, but it seems that deallocate() could be called in other threads?

this is the pattern we use for every recyclable object, the same comment may apply to all the other fields.

…m storage/cache to the write to the consumer channel) (#18245) * InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) Motivation: Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache. When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them Modifications: - Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads. - Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB - The feature is disabled by default - Add new metrics to track the values * Change error message * checkstyle * Fix license * remove duplicate method after cherry-pick * Rename onDeallocate

…m storage/cache to the write to the consumer channel) (#18245) * InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) Motivation: Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache. When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them Modifications: - Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads. - Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB - The feature is disabled by default - Add new metrics to track the values * Change error message * checkstyle * Fix license * remove duplicate method after cherry-pick * Rename onDeallocate (cherry picked from commit 6fec66b)

* InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) (apache#18245) * InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) Motivation: Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache. When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them Modifications: - Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads. - Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB - The feature is disabled by default - Add new metrics to track the values * Change error message * checkstyle * Fix license * remove duplicate method after cherry-pick * Rename onDeallocate (cherry picked from commit 6fec66b) (cherry picked from commit 47c98e5) * [fix][broker][branch-2.10] limit the memory used by reads end-to-end (cherry picked from commit eeb80e1) * remove gpg plugin * remove release profile * remove release plugin * Revert "remove release plugin" This reverts commit 20522ea. * Revert "remove release profile" This reverts commit 64627fd. * Revert "remove gpg plugin" This reverts commit 8054d59. --------- Co-authored-by: Enrico Olivelli <eolivelli@apache.org>

…che#5920) * InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) (apache#18245) * InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) Motivation: Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache. When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them Modifications: - Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads. - Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB - The feature is disabled by default - Add new metrics to track the values * Change error message * checkstyle * Fix license * remove duplicate method after cherry-pick * Rename onDeallocate (cherry picked from commit 6fec66b) (cherry picked from commit 47c98e5) * [fix][broker][branch-2.10] limit the memory used by reads end-to-end (cherry picked from commit eeb80e1) * remove gpg plugin * remove release profile * remove release plugin * Revert "remove release plugin" This reverts commit 20522ea. * Revert "remove release profile" This reverts commit 64627fd. * Revert "remove gpg plugin" This reverts commit 8054d59. --------- Co-authored-by: Enrico Olivelli <eolivelli@apache.org>

…m storage/cache to the write to the consumer channel) (apache#18245) * InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) Motivation: Broker can go out of memory due to many reads enqueued on the PersistentDispatcherMultipleConsumers dispatchMessagesThread (that is used in case of dispatcherDispatchMessagesInSubscriptionThread set to true, that is the default value) The limit of the amount of memory retained due to reads MUST take into account also the entries coming from the Cache. When dispatcherDispatchMessagesInSubscriptionThread is false (the behaviour of Pulsar 2.10) there is some kind of natural (but still unpredictable!!) back pressure mechanism because the thread that receives the entries from BK of the cache dispatches immediately and synchronously the entries to the consumer and releases them Modifications: - Add a new component (InflightReadsLimiter) that keeps track of the overall amount of memory retained due to inflight reads. - Add a new configuration entry managedLedgerMaxReadsInFlightSizeInMB - The feature is disabled by default - Add new metrics to track the values * Change error message * checkstyle * Fix license * remove duplicate method after cherry-pick * Rename onDeallocate (cherry picked from commit 6fec66b) Signed-off-by: Zixuan Liu <nodeces@gmail.com>

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Oct 28, 2022

eolivelli requested a review from codelipenghui October 31, 2022 08:21

eolivelli added this to the 2.12.0 milestone Oct 31, 2022

codelipenghui assigned eolivelli Oct 31, 2022

codelipenghui requested review from 315157973, Jason918, Technoboy-, hangc0276, lhotari, mattisonchao, merlimat, michaeljmarshall and zymap October 31, 2022 10:59

Enrico Olivelli and others added 4 commits October 31, 2022 15:53

Change error message

cf3dee1

checkstyle

fc2eb94

Fix license

e1bdef6

eolivelli force-pushed the impl/max-inflight-requests branch from 82ab48f to e1bdef6 Compare October 31, 2022 14:53

remove duplicate method after cherry-pick

ff0083d

Merge branch 'master' into impl/max-inflight-requests

589f3d3

eolivelli added the ready-to-test label Nov 10, 2022

nicoloboschi approved these changes Nov 10, 2022

View reviewed changes

codelipenghui reviewed Nov 10, 2022

View reviewed changes

codelipenghui approved these changes Nov 11, 2022

View reviewed changes

Rename onDeallocate

8efc1a1

eolivelli force-pushed the impl/max-inflight-requests branch from a843dff to 8efc1a1 Compare November 11, 2022 07:45

lhotari approved these changes Nov 11, 2022

View reviewed changes

Jason918 reviewed Nov 11, 2022

View reviewed changes

Merge branch 'master' into impl/max-inflight-requests

9cc5757

Jason918 approved these changes Nov 11, 2022

View reviewed changes

Jason918 merged commit 6fec66b into apache:master Nov 11, 2022

Jason918 added the release/2.11.0 label Nov 11, 2022

Technoboy- modified the milestones: 2.12.0, 2.11.0 Nov 14, 2022

Technoboy- added the cherry-picked/branch-2.11 label Nov 14, 2022

codelipenghui mentioned this pull request Nov 16, 2022

Add a hard limit on the size of pending read requests to storage #17952

Closed

1 task

This was referenced Sep 15, 2023

[fix][broker][branch-2.10] limit the memory used by reads end-to-end #21190

Closed

limit the memory used by reads end-to-end streamnative/pulsar-archived#5905

Closed

lhotari mentioned this pull request Feb 21, 2024

InflightReadsLimiter - limit the memory used by reads end-to-en streamnative/pulsar-archived#5920

Merged

15 tasks

gaoran10 mentioned this pull request Jun 5, 2024

[fix] Release EntryImpl while reading exchange topic streamnative/aop#1256

Merged

3 tasks

lhotari mentioned this pull request Dec 9, 2025

[Bug] Consumption unevenness arises in the consumption performance pressure test. #25046

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) #18245

InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) #18245

Uh oh!

eolivelli commented Oct 28, 2022 •

edited

Loading

Uh oh!

codelipenghui commented Nov 1, 2022

Uh oh!

eolivelli commented Nov 10, 2022

Uh oh!

codecov-commenter commented Nov 10, 2022 •

edited

Loading

Uh oh!

nicoloboschi left a comment

Uh oh!

codelipenghui left a comment

Uh oh!

Uh oh!

Uh oh!

codelipenghui Nov 10, 2022

Uh oh!

eolivelli Nov 10, 2022

Uh oh!

codelipenghui commented Nov 11, 2022

Uh oh!

Jason918 Nov 11, 2022

Uh oh!

eolivelli Nov 11, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) #18245

InflightReadsLimiter - limit the memory used by reads end-to-end (from storage/cache to the write to the consumer channel) #18245

Uh oh!

Conversation

eolivelli commented Oct 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Documentation

Matching PR in forked repository

Uh oh!

codelipenghui commented Nov 1, 2022

Uh oh!

eolivelli commented Nov 10, 2022

Uh oh!

codecov-commenter commented Nov 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nicoloboschi left a comment

Choose a reason for hiding this comment

Uh oh!

codelipenghui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codelipenghui Nov 10, 2022

Choose a reason for hiding this comment

Uh oh!

eolivelli Nov 10, 2022

Choose a reason for hiding this comment

Uh oh!

codelipenghui commented Nov 11, 2022

Uh oh!

Jason918 Nov 11, 2022

Choose a reason for hiding this comment

Uh oh!

eolivelli Nov 11, 2022

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

eolivelli commented Oct 28, 2022 •

edited

Loading

codecov-commenter commented Nov 10, 2022 •

edited

Loading