support limit message size produced by protocol handlers by maxMessag… #13204

gaozhangmin · 2021-12-09T03:48:07Z

Motivation

See #13170 .
Also, Protocols handlers like KOP should make changes to add and decrease pendingBytes from BrokerService.

Modifications

Add a LongAdder totalPendingBytes in BrokerService to record total pending message size.

Documentation

Check the box below and label this PR (if you have committer privilege).

Need to update docs?

doc-required

(If you need help on updating docs, create a doc issue)
no-need-doc

(Please explain why)
doc

(If this PR contains doc changes)

github-actions · 2021-12-09T03:48:30Z

@gaozhangmin:Thanks for your contribution. For this PR, do we need to update docs?
(The PR template contains info about doc, which helps others know more about the changes. Can you provide doc-related info in this and future PR descriptions? Thanks)

github-actions · 2021-12-09T03:49:32Z

@gaozhangmin:Thanks for providing doc info!

wangjialing218 · 2021-12-09T07:27:59Z

There could have a case when we have both pulsar and kafka producer send msgs:

totalPendingBytes exceed maxMessagePublishBufferSizeInMB and pulsar broker and KoP both stop read from producer.
Pulsar have completed all it's pending msgs, but totalPendingBytes still exceed resumeThresholdPendingBytes since KoP may hold lots of pending buffer, pulsar will not start to read from pulsar producer.
KoP completed some pending msgs and totalPendingBytes become lower than resumeThresholdPendingBytes, KoP will start to read from kafka producer. But pulsar broker may still not start to read from pulsar producer because completedSendOperation will never called since all pulsar msgs are sent completed.

Could you please consider how to avoid this happen

gaozhangmin · 2021-12-09T09:31:04Z

Why the previous design had no this problem? I'm little confused. @wangjialing218

gaozhangmin · 2021-12-09T09:40:13Z

I got the reason, I will try fix this case.

wangjialing218 · 2021-12-09T09:46:28Z

As previous design，totalPendingBytes is only used by broker to record the msg bytes received from pulsar producer, so broker could resume reading from producer when totalPendingBytes become lower than resumeThresholdPendingBytes.
This PR will make totalPendingBytes shared with KoP, so this problem could happen.

wangjialing218 · 2021-12-10T01:23:41Z

We do not have to do LongAdder.sum() each time when add or decrease pending bytes, since it's a little heavy CPU work.
We could do sum() in another scheduler task such as 50ms each time, and then inform ServerCnx in broker and protocol handlers to stop or resume reading by listener.

gaozhangmin · 2021-12-14T12:58:32Z

@codelipenghui @eolivelli @wangjialing218 @Jason918 @BewareMyPower PTAL

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java

Jason918

It's better that we can describe more details about the downside of this solution.

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java

Jason918 · 2021-12-16T06:54:23Z

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java

    private Set<BrokerEntryMetadataInterceptor> brokerEntryMetadataInterceptors;
    private Set<ManagedLedgerPayloadProcessor> brokerEntryPayloadProcessors;

+    private final CopyOnWriteArrayList<Consumer<PublishBufferEvent>>


I would prefer a better data structure than CopyOnWriteArrayList if we have a lot of producers.

…ePublishBufferSizeInMB

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java

gaozhangmin · 2021-12-17T02:04:54Z

/pulsarbot run-failure-checks

gaozhangmin · 2021-12-17T04:57:47Z

/pulsarbot run-failure-checks

hangc0276 · 2021-12-18T02:12:50Z

@gaozhangmin Thanks for your contribution.
Before Pulsar 2.8.0, the broker publish throttle policy is similar to your implementation, it has a shortcoming of suddenly exceed the limit and will lead to broker direct memory OOM. After Pulsar 2.8.0, we introduced the throttle policy by io thread, it can limit the throughput of pulsar protocol, but out of control of other protocols, like KOP.

IMO, it's better to throttle in KOP side, do you have any other ideas? @BewareMyPower

BewareMyPower · 2021-12-18T08:38:39Z

@hangc0276 For now I didn't have much context. AFAIK, this PR is an implementation of idea from #12959. Could you answer hang's question? @wangjialing218

wangjialing218 · 2021-12-20T02:53:17Z

@hangc0276 You may mentioned #7406, which introduced the throttle policy by io thread.
This PR make more effective use of memory, and could also improve the points mentioned in #7406, except this:
-- There is a delay for detecting the memory over-commit, due to the background task running periodically
As idle connection will not call LoggAdder.add, we only need to count up the bytes from active connection in LoggAdder.sum. We could do the detecting more frequently to avoid OOM, such as 10ms per time.

IMO, it's better to throttle in KOP side, do you have any other ideas?

For this point, we could make sure broker and KoP share the IO thread pool, and do same throttle policy by io thread in KoP side. @BewareMyPower
I notice there is a configuration useSeparateThreadPoolForProtocolHandlers with default value true. I wonder if there is any disadvantage when set this configuration false.

BewareMyPower · 2021-12-20T04:06:33Z

I notice there is a configuration useSeparateThreadPoolForProtocolHandlers with default value true. I wonder if there is any disadvantage when set this configuration false.

The false configuration is the previous behavior that the potential deadlocks at KoP side might also block broker's IO thread.

github-actions bot assigned gaozhangmin Dec 9, 2021

github-actions bot added doc-label-missing and removed doc-label-missing labels Dec 9, 2021

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Dec 9, 2021

gaozhangmin force-pushed the maxMessagePublishBufferSizeInMB-protocolsHandler branch 3 times, most recently from 37a6eb3 to e2a9284 Compare December 14, 2021 12:57

BewareMyPower requested review from 315157973, BewareMyPower, codelipenghui, eolivelli, hangc0276 and merlimat and removed request for codelipenghui December 14, 2021 13:40

BewareMyPower added the area/broker label Dec 14, 2021

BewareMyPower added this to the 2.10.0 milestone Dec 14, 2021

wangjialing218 suggested changes Dec 15, 2021

View reviewed changes

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java Outdated Show resolved Hide resolved

wangjialing218 suggested changes Dec 15, 2021

View reviewed changes

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/BrokerService.java Outdated Show resolved Hide resolved

gaozhangmin force-pushed the maxMessagePublishBufferSizeInMB-protocolsHandler branch 2 times, most recently from 0dbecd8 to 6b32a39 Compare December 15, 2021 11:47

Jason918 requested changes Dec 16, 2021

View reviewed changes

support limit message size produced by protocol handlers by maxMessag…

62e65ac

…ePublishBufferSizeInMB

wangjialing218 reviewed Dec 16, 2021

View reviewed changes

pulsar-broker/src/main/java/org/apache/pulsar/broker/service/ServerCnx.java Show resolved Hide resolved

add unregister listeners

cacfda0

gaozhangmin force-pushed the maxMessagePublishBufferSizeInMB-protocolsHandler branch from 6b32a39 to cacfda0 Compare December 16, 2021 08:06

codelipenghui modified the milestones: 2.10.0, 2.11.0 Jan 21, 2022

gaozhangmin closed this Feb 15, 2022

support limit message size produced by protocol handlers by maxMessag… #13204

support limit message size produced by protocol handlers by maxMessag… #13204

Uh oh!

Conversation

gaozhangmin commented Dec 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Documentation

Uh oh!

github-actions bot commented Dec 9, 2021

Uh oh!

github-actions bot commented Dec 9, 2021

Uh oh!

wangjialing218 commented Dec 9, 2021

Uh oh!

gaozhangmin commented Dec 9, 2021

Uh oh!

gaozhangmin commented Dec 9, 2021

Uh oh!

wangjialing218 commented Dec 9, 2021

Uh oh!

wangjialing218 commented Dec 10, 2021

Uh oh!

gaozhangmin commented Dec 14, 2021

Uh oh!

Uh oh!

Uh oh!

Jason918 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Jason918 Dec 16, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gaozhangmin commented Dec 17, 2021

Uh oh!

gaozhangmin commented Dec 17, 2021

Uh oh!

hangc0276 commented Dec 18, 2021

Uh oh!

BewareMyPower commented Dec 18, 2021

Uh oh!

wangjialing218 commented Dec 20, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

BewareMyPower commented Dec 20, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

gaozhangmin commented Dec 9, 2021 •

edited

Loading

wangjialing218 commented Dec 20, 2021 •

edited

Loading