Skip to content

Conversation

@poorbarcode
Copy link
Contributor

@poorbarcode poorbarcode commented Jul 29, 2024

Motivation

A deadlock occurs.

"metadata-store-13-1" #16 prio=5 os_prio=0 cpu=621052.68ms elapsed=249537.73s tid=0x00007f31415b8570 nid=0x6a waiting on condition  [0x00007f3118ffd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.11/Native Method)
	- parking to wait for  <0x00000000e50e8200> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.11/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.11/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.11/ForkJoinPool.java:3465)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.11/ForkJoinPool.java:3436)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.11/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.get(java.base@17.0.11/CompletableFuture.java:2072)
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription.getStats(PersistentSubscription.java:1209)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$asyncGetStats$90(PersistentTopic.java:2391)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1043/0x00007f30cc9c1028.accept(Unknown Source)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:563)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:277)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.asyncGetStats(PersistentTopic.java:2389)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.getStats(PersistentTopic.java:2349)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$167(PersistentTopicsBase.java:1513)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2195/0x00007f30ccc3d408.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniApplyNow(java.base@17.0.11/CompletableFuture.java:684)
	at java.util.concurrent.CompletableFuture.uniApplyStage(java.base@17.0.11/CompletableFuture.java:662)
	at java.util.concurrent.CompletableFuture.thenApply(java.base@17.0.11/CompletableFuture.java:2168)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$168(PersistentTopicsBase.java:1512)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2194/0x00007f30ccc3d1c0.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage(java.base@17.0.11/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose(java.base@17.0.11/CompletableFuture.java:2309)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$170(PersistentTopicsBase.java:1509)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2192/0x00007f30ccc3cd40.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture$UniAccept.tryFire(java.base@17.0.11/CompletableFuture.java:718)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.11/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.11/CompletableFuture.java:2147)
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.handleGetResult(ZKMetadataStore.java:288)
	at org.apache.pulsar.metadata.impl.ZKMetadataStore.lambda$batchOperation$7(ZKMetadataStore.java:232)
	at org.apache.pulsar.metadata.impl.ZKMetadataStore$$Lambda$225/0x00007f30cc34b5d0.run(Unknown Source)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@17.0.11/Executors.java:539)
	at java.util.concurrent.FutureTask.run(java.base@17.0.11/FutureTask.java:264)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@17.0.11/ScheduledThreadPoolExecutor.java:304)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.11/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.11/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.11/Thread.java:840)

"pulsar-load-manager-1-1" #51 prio=5 os_prio=0 cpu=5248.25ms elapsed=249536.68s tid=0x00007f314191a310 nid=0x8b waiting on condition  [0x00007f30c5ffe000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.11/Native Method)
	- parking to wait for  <0x00000000e50e8a28> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.11/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.11/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.11/ForkJoinPool.java:3465)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.11/ForkJoinPool.java:3436)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.11/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.join(java.base@17.0.11/CompletableFuture.java:2117)
	at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerImpl.writeBundleDataOnZooKeeper(ModularLoadManagerImpl.java:1170)
	at org.apache.pulsar.broker.loadbalance.impl.ModularLoadManagerWrapper.writeResourceQuotasToZooKeeper(ModularLoadManagerWrapper.java:118)
	at org.apache.pulsar.broker.loadbalance.LoadResourceQuotaUpdaterTask.run(LoadResourceQuotaUpdaterTask.java:39)
	at java.util.concurrent.Executors$RunnableAdapter.call(java.base@17.0.11/Executors.java:539)
	at java.util.concurrent.FutureTask.runAndReset(java.base@17.0.11/FutureTask.java:305)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@17.0.11/ScheduledThreadPoolExecutor.java:305)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.11/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.11/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.11/Thread.java:840)

"pulsar-web-42-3" #67 prio=5 os_prio=0 cpu=11269.15ms elapsed=249532.09s tid=0x00007f3141a5a580 nid=0x9a waiting on condition  [0x00007f30c50fd000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.11/Native Method)
	- parking to wait for  <0x00000000e4e87000> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.11/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.11/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.11/ForkJoinPool.java:3465)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.11/ForkJoinPool.java:3436)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.11/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.get(java.base@17.0.11/CompletableFuture.java:2072)
	at org.apache.pulsar.broker.service.persistent.PersistentSubscription.getStats(PersistentSubscription.java:1209)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.lambda$asyncGetStats$90(PersistentTopic.java:2391)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic$$Lambda$1043/0x00007f30cc9c1028.accept(Unknown Source)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap$Section.forEach(ConcurrentOpenHashMap.java:563)
	at org.apache.pulsar.common.util.collections.ConcurrentOpenHashMap.forEach(ConcurrentOpenHashMap.java:277)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.asyncGetStats(PersistentTopic.java:2389)
	at org.apache.pulsar.broker.service.persistent.PersistentTopic.getStats(PersistentTopic.java:2349)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$167(PersistentTopicsBase.java:1513)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2195/0x00007f30ccc3d408.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniApplyNow(java.base@17.0.11/CompletableFuture.java:684)
	at java.util.concurrent.CompletableFuture.uniApplyStage(java.base@17.0.11/CompletableFuture.java:662)
	at java.util.concurrent.CompletableFuture.thenApply(java.base@17.0.11/CompletableFuture.java:2168)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$168(PersistentTopicsBase.java:1512)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2194/0x00007f30ccc3d1c0.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniComposeStage(java.base@17.0.11/CompletableFuture.java:1187)
	at java.util.concurrent.CompletableFuture.thenCompose(java.base@17.0.11/CompletableFuture.java:2309)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.lambda$internalGetPartitionedStats$170(PersistentTopicsBase.java:1509)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase$$Lambda$2192/0x00007f30ccc3cd40.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniAcceptNow(java.base@17.0.11/CompletableFuture.java:757)
	at java.util.concurrent.CompletableFuture.uniAcceptStage(java.base@17.0.11/CompletableFuture.java:735)
	at java.util.concurrent.CompletableFuture.thenAccept(java.base@17.0.11/CompletableFuture.java:2182)
	at org.apache.pulsar.broker.admin.impl.PersistentTopicsBase.internalGetPartitionedStats(PersistentTopicsBase.java:1496)
	at org.apache.pulsar.broker.admin.v2.PersistentTopics.getPartitionedStats(PersistentTopics.java:1305)
	at jdk.internal.reflect.GeneratedMethodAccessor396.invoke(Unknown Source)
	at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@17.0.11/DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(java.base@17.0.11/Method.java:568)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:52)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$$Lambda$561/0x00007f30cc5b92e0.invoke(Unknown Source)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:124)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:167)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:159)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:79)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:475)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:397)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:81)
	at org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:255)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:248)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:244)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:292)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:274)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:244)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:265)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:234)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:680)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:394)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:346)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:366)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:319)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:205)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
	at org.apache.pulsar.broker.web.ResponseHandlerFilter.doFilter(ResponseHandlerFilter.java:66)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.apache.pulsar.broker.web.AuthenticationFilter.doFilter(AuthenticationFilter.java:65)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.apache.pulsar.broker.intercept.BrokerInterceptor.onFilter(BrokerInterceptor.java:224)
	at org.apache.pulsar.broker.web.ProcessHandlerFilter.doFilter(ProcessHandlerFilter.java:46)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.apache.pulsar.broker.web.PreInterceptFilter.doFilter(PreInterceptFilter.java:73)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.eclipse.jetty.servlets.QoSFilter.doFilter(QoSFilter.java:202)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1624)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1594)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
	at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:181)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel$$Lambda$811/0x00007f30cc7ab120.dispatch(Unknown Source)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@17.0.11/ThreadPoolExecutor.java:1136)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@17.0.11/ThreadPoolExecutor.java:635)
	at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
	at java.lang.Thread.run(java.base@17.0.11/Thread.java:840)

Modifications

Avoid blocking calls from Subscription.getStats

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

Matching PR in forked repository

PR in forked repository: x

@poorbarcode poorbarcode added this to the 3.4.0 milestone Jul 29, 2024
@poorbarcode poorbarcode self-assigned this Jul 29, 2024
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jul 29, 2024
@poorbarcode poorbarcode changed the title [fix] [broker] Fix deadlock when calling GetPartitionedStats [fix] [broker] Remove blocking calls from Subscription.getStats Jul 29, 2024
@poorbarcode poorbarcode force-pushed the fix_deadlock/get_partitioned_stats branch from 4612237 to 9da7cbd Compare July 29, 2024 02:01
@codelipenghui
Copy link
Contributor

@poorbarcode Do you have the stack information for another thread which hold the lock?

@poorbarcode
Copy link
Contributor Author

/pulsarbot rerun-failure-checks

@poorbarcode poorbarcode requested a review from nodece July 29, 2024 03:39
@poorbarcode poorbarcode requested a review from nodece July 29, 2024 03:44
@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2024

Codecov Report

❌ Patch coverage is 88.23529% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 73.44%. Comparing base (bbc6224) to head (d5449c4).
⚠️ Report is 1470 commits behind head on master.

Files with missing lines Patch % Lines
...ker/service/persistent/PersistentSubscription.java 73.33% 3 Missing and 1 partial ⚠️
...sar/broker/service/persistent/PersistentTopic.java 94.44% 2 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff              @@
##             master   #23088      +/-   ##
============================================
- Coverage     73.57%   73.44%   -0.13%     
- Complexity    32624    33516     +892     
============================================
  Files          1877     1917      +40     
  Lines        139502   144074    +4572     
  Branches      15299    15741     +442     
============================================
+ Hits         102638   105822    +3184     
- Misses        28908    30133    +1225     
- Partials       7956     8119     +163     
Flag Coverage Δ
inttests 27.79% <45.09%> (+3.20%) ⬆️
systests 24.78% <45.09%> (+0.45%) ⬆️
unittests 72.50% <88.23%> (-0.35%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...sar/broker/service/persistent/PersistentTopic.java 79.26% <94.44%> (+0.80%) ⬆️
...ker/service/persistent/PersistentSubscription.java 77.10% <73.33%> (+0.41%) ⬆️

... and 508 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@nodece nodece requested a review from lhotari July 29, 2024 07:16
@nodece nodece merged commit e59cd05 into apache:master Jul 29, 2024
poorbarcode added a commit that referenced this pull request Jul 29, 2024
poorbarcode added a commit that referenced this pull request Jul 29, 2024
nikhil-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 30, 2024
srinath-ctds pushed a commit to datastax/pulsar that referenced this pull request Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants