Skip to content

Conversation

@eolivelli
Copy link
Contributor

@eolivelli eolivelli commented Jul 6, 2022

Motivation

When you use a Shared subscription the Dispatcher (PersistentDispatcherMultipleConsumers) does our of order reads from the ManagedLedger, this is happening very frequently in the consumerFlow method, that basically re-reads messages that have not been processed yet.

This happens because consumerFlow calls readMoreEntries() and this in turn finds that there are messages to be re-delivered, because not yet acknowledged by any consumer.

This fact triggers backward seeks in the BlobStoreBackedReadHandleImpl. Even if the data is already loaded in memory you have to parse it again most of the time, because the seek currently is done to the beginning of a block, because the index keeps only track of some entries and not of every entry.

Modifications

Add a in memory cache of the offsets of each entry, this way when you have to do a backward seek we do the seek exactly to the position where you can find the entry.

The trade-off here is that we build this TreeMap that adds some memory costs

Verifying this change

This change added tests.
I did some manual testing with GCP and this patch brings a x2 improvement on throughput of a Shared subscription (with pulsar-perf) with a single consumer.

@dave2wave did more extensive testing with complex scenarios with OpenMessaging Benchmark and confirmed that this patch brings a big improvement on Shared subscriptions.

  • doc-not-needed

@eolivelli eolivelli added type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages area/tieredstorage doc-not-needed Your PR changes do not impact docs labels Jul 6, 2022
@eolivelli eolivelli added this to the 2.11.0 milestone Jul 6, 2022
@eolivelli eolivelli requested review from hangc0276 and zymap July 6, 2022 09:01
@eolivelli eolivelli self-assigned this Jul 6, 2022
@dave2wave
Copy link
Member

dave2wave commented Jul 6, 2022

We found trouble with consumption of an offloaded backlog using a shared subscription. (failover subscriptions perform well) Using the OMB we built a 64GB backlog of 100 byte messages with 10 topics with 3 partitions.

  1. Shared subscriptions were consumed at 3000-10000 messages per second.
  2. Failover subscriptions consumed at 1000000 messages per second

With this patch we see a big improvement and shared subscriptions are consumed like this:
Screen Shot 2022-07-06 at 8 32 29 AM

// never over to the last entry again.
if (!seeked) {
inputStream.seek(index.getIndexEntryForEntry(nextExpectedId).getDataOffset());
Long knownOffset = entryOffsets.get(nextExpectedId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can a function to package this logic and all the index.getIndexEntryForEntry(nextExpectedId).getDataOffset() call this function.

Other seek places also need to check the index cache first.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good, in my testing it looked that only this point needs the cache, btw it is better to use it everywhere

@eolivelli eolivelli force-pushed the impl/offloaders-improve-shared branch from 5997cd1 to afea8e9 Compare July 7, 2022 15:26
@eolivelli eolivelli requested a review from hangc0276 July 7, 2022 15:37
// this Cache is accessed only by one thread
private final Cache<Long, Long> entryOffsets = CacheBuilder
.newBuilder()
.expireAfterAccess(10, TimeUnit.MINUTES)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10 minutes may be too short, 30 minutes to 1 hour? This cache can be shared with other catch-up readers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed it to 30 minutes and made it configurable via system property
I don't think we have to document the system property, I made it configurable only in case of problems

Copy link
Contributor

@hangc0276 hangc0276 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!

@hangc0276
Copy link
Contributor

ping @zymap @horizonzy please help take a look, thanks.

@eolivelli eolivelli requested a review from hangc0276 July 8, 2022 10:24
@eolivelli eolivelli force-pushed the impl/offloaders-improve-shared branch 2 times, most recently from dba2dba to f0237a7 Compare July 8, 2022 11:55
@eolivelli eolivelli force-pushed the impl/offloaders-improve-shared branch from f0237a7 to 388912d Compare July 11, 2022 12:04
Copy link
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@codelipenghui
Copy link
Contributor

@eolivelli Please rebase the master branch

@eolivelli
Copy link
Contributor Author

@Technoboy- @codelipenghui
I have merged with latest master by Test Group 1 still fails with:
image

@eolivelli eolivelli merged commit 99bca8b into apache:master Jul 13, 2022
gaozhangmin pushed a commit to gaozhangmin/pulsar that referenced this pull request Jul 14, 2022
wuxuanqicn pushed a commit to wuxuanqicn/pulsar that referenced this pull request Jul 14, 2022
dragonls pushed a commit to dragonls/pulsar that referenced this pull request Oct 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/tieredstorage doc-not-needed Your PR changes do not impact docs type/enhancement The enhancements for the existing features or docs. e.g. reduce memory usage of the delayed messages

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants