Skip to content

Conversation

@Demogorgon314
Copy link
Member

@Demogorgon314 Demogorgon314 commented Feb 15, 2022

Fixes #14110 #14096 #14219

Motivation

Currently, the metadata cache refresh has race condition.

The main reason is MetadataCacheImpl#refresh is not atomic operations.

// 1. Caffeine.asyncLoad already begin and reads an old value but is not done yet.
if (objCache.getIfPresent(path) != null) {
  objCache.synchronous().invalidate(path);
  // 2. Caffeine.asyncLoad done, the old value will store to cache.
  objCache.synchronous().refresh(path);
}

I added some log in MetadataCacheImpl in order to observe:

@Override
public CompletableFuture<Optional<CacheGetResult<T>>> asyncLoad(String key, Executor executor) {
  log.info("Async load for objCache {}", key);
  CompletableFuture<Optional<CacheGetResult<T>>> future = readValueFromStore(key);
  future.whenComplete((op, ex) -> {
    if (ex == null){
      op.ifPresent(tCacheGetResult -> log.info("Async load for objCache complete {} - {}", key, tCacheGetResult));
    }
  });
  return future;
}
---------------------
@Override
public void refresh(String path) {
  log.info("1. Refresh path {}", path);
  // Refresh object of path if only it is cached before.
  if (objCache.getIfPresent(path) != null) {
    objCache.synchronous().invalidate(path);
    objCache.synchronous().refresh(path);
    log.info("2. Refresh path complete {}:{}", path, get(path).join().get());
  } else {
    log.info("0. Skip Refresh path {}", path);
  }
}

Cache inconsistency, you can see the async load for objCache complete in mid of refresh :

2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552641536875 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=1), stat=Stat(path=/key-33552641536875, version=0, creationTimestamp=1644885186668, modificationTimestamp=1644885186668, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,669+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,669+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,670+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552641536875 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=1), stat=Stat(path=/key-33552641536875, version=0, creationTimestamp=1644885186668, modificationTimestamp=1644885186668, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552641536875:MetadataCacheTest.MyClass(a=a, b=1) {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552641536875 {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552641536875 {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552641536875 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=2), stat=Stat(path=/key-33552641536875, version=1, creationTimestamp=1644885186668, modificationTimestamp=1644885186669, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,671+0800 INFO  [metadata-store-2437-1] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552641536875:MetadataCacheTest.MyClass(a=a, b=2) {}

Correct log:

2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552632155000 {}
2022-02-15T08:33:06,659+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552632155000 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=1), stat=Stat(path=/key-33552632155000, version=0, creationTimestamp=1644885186659, modificationTimestamp=1644885186659, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [metadata-store-2436-1] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [metadata-store-2436-1] o.a.p.m.c.i.MetadataCacheImpl@279 - 0. Skip Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552632155000 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=2), stat=Stat(path=/key-33552632155000, version=1, creationTimestamp=1644885186659, modificationTimestamp=1644885186659, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552632155000:MetadataCacheTest.MyClass(a=a, b=2) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@272 - 1. Refresh path /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552632155000 {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@84 - Async load for objCache complete /key-33552632155000 - CacheGetResult(value=MetadataCacheTest.MyClass(a=a, b=2), stat=Stat(path=/key-33552632155000, version=1, creationTimestamp=1644885186659, modificationTimestamp=1644885186659, ephemeral=false, createdBySelf=true)) {}
2022-02-15T08:33:06,661+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@277 - 2. Refresh path complete /key-33552632155000:MetadataCacheTest.MyClass(a=a, b=2) {}
2022-02-15T08:33:06,662+0800 INFO  [main] o.a.p.m.c.i.MetadataCacheImpl@80 - Async load for objCache /key-33552635148375 {}

Modifications

Make metadata cache refresh method atomic.

Verifying this change

  • Make sure that the change passes the CI checks.

Documentation

Need to update docs?

  • no-need-doc

@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Feb 15, 2022
@merlimat merlimat added the type/bug The PR fixed a bug or issue reported a bug label Feb 15, 2022
@merlimat merlimat added this to the 2.10.0 milestone Feb 15, 2022
@Demogorgon314 Demogorgon314 marked this pull request as draft February 15, 2022 01:58
@codelipenghui codelipenghui added the release/blocker Indicate the PR or issue that should block the release until it gets resolved label Feb 15, 2022
@Demogorgon314 Demogorgon314 force-pushed the fix/make-metadata-cache-refresh-async branch from a71692d to ab6ef2b Compare February 15, 2022 08:16
@Demogorgon314 Demogorgon314 changed the title [metadata-store] Fix race condition by making metadata cache refresh async [metadata-store] Fix metadata cache inconsistency on doing refresh Feb 15, 2022
@Demogorgon314 Demogorgon314 marked this pull request as ready for review February 15, 2022 09:18
@lhotari lhotari merged commit 2e16b43 into apache:master Feb 15, 2022
@Demogorgon314 Demogorgon314 deleted the fix/make-metadata-cache-refresh-async branch February 15, 2022 11:22
@gaozhangmin
Copy link
Contributor

@Demogorgon314 I still get test error: https://github.com/apache/pulsar/runs/5231565908?check_suite_focus=true

@Demogorgon314
Copy link
Member Author

@Demogorgon314 I still get test error: https://github.com/apache/pulsar/runs/5231565908?check_suite_focus=true

I'll push a PR to fix it.

@Demogorgon314
Copy link
Member Author

@gaozhangmin PTAL #14354 : )

congbobo184 pushed a commit that referenced this pull request Nov 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-picked/branch-2.9 Archived: 2.9 is end of life doc-not-needed Your PR changes do not impact docs release/blocker Indicate the PR or issue that should block the release until it gets resolved release/2.9.4 type/bug The PR fixed a bug or issue reported a bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Flaky-test: MetadataCacheTest.insertionDeletion

9 participants