Skip to content

[fix][broker] Use expireAfterAccess instead of expireAfterWrite for metadata cache#21095

Closed
codelipenghui wants to merge 1 commit intoapache:masterfrom
codelipenghui:penghui/metadata_cache_expiration
Closed

[fix][broker] Use expireAfterAccess instead of expireAfterWrite for metadata cache#21095
codelipenghui wants to merge 1 commit intoapache:masterfrom
codelipenghui:penghui/metadata_cache_expiration

Conversation

@codelipenghui
Copy link
Copy Markdown
Contributor

@codelipenghui codelipenghui commented Aug 30, 2023

Motivation

#14154 introduced a metadata cache expiration policy to avoid a potential memory leak in the Metadata caches. But it uses the expireAfterWrite to force expire the caches no matter if the entry has been actively accessed or not.

We also found another related regression from 2.9 to 2.10. The BookKeeper placement policy doesn't work as expected if the cache expires. The code itself also has the problem, but without the expireAfterWrite, it works great.

if (bookieMappingCache != null) {
CompletableFuture<Optional<BookiesRackConfiguration>> future =
bookieMappingCache.get(BookieRackAffinityMapping.BOOKIE_INFO_ROOT_PATH);
Optional<BookiesRackConfiguration> optRes = (future.isDone() && !future.isCompletedExceptionally())
? future.join() : Optional.empty();
if (!optRes.isPresent()) {
return blacklistedBookies;
}

For most cases, disabling the expireAfterWrite and enabling expireAfterAccess will resolve the issue because the new ledgers will happen more frequently than 10 mins if have some topics.

Modification

  • Disable expireAfterWrite by default
  • Enable expireAfterAccess by default

Does this pull request potentially affect one of the following parts:

If the box was checked, please highlight the changes

  • Dependencies (add or upgrade a dependency)
  • The public API
  • The schema
  • The default values of configurations
  • The threading model
  • The binary protocol
  • The REST endpoints
  • The admin CLI options
  • The metrics
  • Anything that affects deployment

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@codelipenghui codelipenghui added type/bug The PR fixed a bug or issue reported a bug area/broker release/3.0.2 release/2.11.3 release/2.10.6 release/3.1.1 category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost labels Aug 30, 2023
@codelipenghui codelipenghui added this to the 3.2.0 milestone Aug 30, 2023
@codelipenghui codelipenghui self-assigned this Aug 30, 2023
@github-actions github-actions Bot added the doc-not-needed Your PR changes do not impact docs label Aug 30, 2023
@codelipenghui
Copy link
Copy Markdown
Contributor Author

/pulsarbot run-failure-checks

Copy link
Copy Markdown
Member

@horizonzy horizonzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MetadataCacheConfig metadataCacheConfig = MetadataCacheConfig.builder()
.expireAfterWriteMillis(-1L)
.build();

We should also override the expireAfterAccess value in the LeaderElectionImpl

@codelipenghui
Copy link
Copy Markdown
Contributor Author

We should also override the expireAfterAccess value in the LeaderElectionImpl

Maybe we don't want to expire the load data?

@codelipenghui codelipenghui marked this pull request as draft August 31, 2023 00:23
@horizonzy
Copy link
Copy Markdown
Member

We should also override the expireAfterAccess value in the LeaderElectionImpl

Maybe we don't want to expire the load data?

If we don't want to expire the load data. After this PR, it will expire by expireAfterAccess(default value is not -1), so we should also override the expireAfterAccess using -1.

*/
@Builder.Default
private final long expireAfterWriteMillis = 2 * DEFAULT_CACHE_REFRESH_TIME_MILLIS;
private final long expireAfterWriteMillis = 0L;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change the default value to -1. Use -1 to disable the expire maybe better to understand.

@hangc0276
Copy link
Copy Markdown
Contributor

We should also override the expireAfterAccess value in the LeaderElectionImpl

Maybe we don't want to expire the load data?

If we don't want to expire the load data. After this PR, it will expire by expireAfterAccess(default value is not -1), so we should also override the expireAfterAccess using -1.

+1

hangc0276 pushed a commit that referenced this pull request Sep 5, 2023
### Modifications
When upgraded the pulsar version from 2.9.2 to 2.10.3, and the isolated group feature not work anymore.
Finally, we found the problem. In IsolatedBookieEnsemblePlacementPolicy, when it gets the bookie rack from the metadata store cache, uses future.isDone() to avoid sync operation. If the future is incomplete, return empty blacklists. 
The cache may expire due to the caffeine cache `getExpireAfterWriteMillis` config, if the cache expires, the future may be incomplete. (#21095 will correct the behavior)

In 2.9.2, it uses the sync to get data from the metadata store, we should also keep the behavior.
Technoboy- pushed a commit that referenced this pull request Sep 5, 2023
### Modifications
When upgraded the pulsar version from 2.9.2 to 2.10.3, and the isolated group feature not work anymore.
Finally, we found the problem. In IsolatedBookieEnsemblePlacementPolicy, when it gets the bookie rack from the metadata store cache, uses future.isDone() to avoid sync operation. If the future is incomplete, return empty blacklists. 
The cache may expire due to the caffeine cache `getExpireAfterWriteMillis` config, if the cache expires, the future may be incomplete. (#21095 will correct the behavior)

In 2.9.2, it uses the sync to get data from the metadata store, we should also keep the behavior.
Technoboy- pushed a commit that referenced this pull request Sep 11, 2023
When upgraded the pulsar version from 2.9.2 to 2.10.3, and the isolated group feature not work anymore.
Finally, we found the problem. In IsolatedBookieEnsemblePlacementPolicy, when it gets the bookie rack from the metadata store cache, uses future.isDone() to avoid sync operation. If the future is incomplete, return empty blacklists.
The cache may expire due to the caffeine cache `getExpireAfterWriteMillis` config, if the cache expires, the future may be incomplete. (#21095 will correct the behavior)

In 2.9.2, it uses the sync to get data from the metadata store, we should also keep the behavior.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Oct 1, 2023

The pr had no activity for 30 days, mark with Stale label.

@github-actions github-actions Bot added the Stale label Oct 1, 2023
@codelipenghui
Copy link
Copy Markdown
Contributor Author

Close this PR since the expireAfterAccess method will cost more CPU resources for tracking the access.

@shibd shibd removed this from the 3.2.0 milestone Oct 22, 2023
@codelipenghui codelipenghui deleted the penghui/metadata_cache_expiration branch March 26, 2024 00:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/broker category/reliability The function does not work properly in certain specific environments or failures. e.g. data lost doc-not-needed Your PR changes do not impact docs ready-to-test Stale type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants