Skip to content

KAFKA-16905: Fixed potential thread block by admin API#16217

Merged
mumrah merged 2 commits intoapache:trunkfrom
apoorvmittal10:thread
Jun 6, 2024
Merged

KAFKA-16905: Fixed potential thread block by admin API#16217
mumrah merged 2 commits intoapache:trunkfrom
apoorvmittal10:thread

Conversation

@apoorvmittal10
Copy link
Copy Markdown
Contributor

The PR moves the blocking call of describeCluster to async. Though the issue coldn't be re-produced in Kafka intgeration tests but a thread block on describeCluster has been observed when using admin describeTopic API in kafka-rest integrations tests.

The threads blocks indefinetly

"kafka-admin-client-thread | adminclient-3" #114 daemon prio=5 os_prio=31 cpu=6.57ms elapsed=417.17s tid=0x00000001364fc200 nid=0x13403 waiting on condition  [0x00000002bb419000]
   java.lang.Thread.State: WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@17.0.7/Native Method)
	- parking to wait for  <0x0000000773804828> (a java.util.concurrent.CompletableFuture$Signaller)
	at java.util.concurrent.locks.LockSupport.park(java.base@17.0.7/LockSupport.java:211)
	at java.util.concurrent.CompletableFuture$Signaller.block(java.base@17.0.7/CompletableFuture.java:1864)
	at java.util.concurrent.ForkJoinPool.unmanagedBlock(java.base@17.0.7/ForkJoinPool.java:3463)
	at java.util.concurrent.ForkJoinPool.managedBlock(java.base@17.0.7/ForkJoinPool.java:3434)
	at java.util.concurrent.CompletableFuture.waitingGet(java.base@17.0.7/CompletableFuture.java:1898)
	at java.util.concurrent.CompletableFuture.get(java.base@17.0.7/CompletableFuture.java:2072)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
	at org.apache.kafka.clients.admin.KafkaAdminClient.handleDescribeTopicsByNamesWithDescribeTopicPartitionsApi(KafkaAdminClient.java:2324)
	at org.apache.kafka.clients.admin.KafkaAdminClient.describeTopics(KafkaAdminClient.java:2122)
	at org.apache.kafka.clients.admin.Admin.describeTopics(Admin.java:311)
	at io.confluent.kafkarest.controllers.TopicManagerImpl.describeTopics(TopicManagerImpl.java:155)
	at io.confluent.kafkarest.controllers.TopicManagerImpl.lambda$listTopics$2(TopicManagerImpl.java:76)
	at io.confluent.kafkarest.controllers.TopicManagerImpl$$Lambda$1925/0x0000000800891448.apply(Unknown Source)
	at java.util.concurrent.CompletableFuture$UniCompose.tryFire(java.base@17.0.7/CompletableFuture.java:1150)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.7/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.7/CompletableFuture.java:2147)
	at io.confluent.kafkarest.common.KafkaFutures.lambda$toCompletableFuture$0(KafkaFutures.java:45)
	at io.confluent.kafkarest.common.KafkaFutures$$Lambda$1909/0x0000000800897528.accept(Unknown Source)
	at org.apache.kafka.common.internals.KafkaFutureImpl.lambda$whenComplete$2(KafkaFutureImpl.java:107)
	at org.apache.kafka.common.internals.KafkaFutureImpl$$Lambda$1910/0x0000000800897750.accept(Unknown Source)
	at java.util.concurrent.CompletableFuture.uniWhenComplete(java.base@17.0.7/CompletableFuture.java:863)
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(java.base@17.0.7/CompletableFuture.java:841)
	at java.util.concurrent.CompletableFuture.postComplete(java.base@17.0.7/CompletableFuture.java:510)
	at java.util.concurrent.CompletableFuture.complete(java.base@17.0.7/CompletableFuture.java:2147)
	at org.apache.kafka.common.internals.KafkaCompletableFuture.kafkaComplete(KafkaCompletableFuture.java:39)
	at org.apache.kafka.common.internals.KafkaFutureImpl.complete(KafkaFutureImpl.java:122)
	at org.apache.kafka.clients.admin.KafkaAdminClient$4.handleResponse(KafkaAdminClient.java:2106)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.handleResponses(KafkaAdminClient.java:1370)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.processRequests(KafkaAdminClient.java:1523)
	at org.apache.kafka.clients.admin.KafkaAdminClient$AdminClientRunnable.run(KafkaAdminClient.java:1446)
	at java.lang.Thread.run(java.base@17.0.7/Thread.java:833)

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@apoorvmittal10 apoorvmittal10 marked this pull request as ready for review June 6, 2024 07:32
Copy link
Copy Markdown
Member

@AndrewJSchofield AndrewJSchofield left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR. Looks good to me.

@omkreddy
Copy link
Copy Markdown
Contributor

omkreddy commented Jun 6, 2024

there is checkstyle error

@apoorvmittal10
Copy link
Copy Markdown
Contributor Author

there is checkstyle error

My bad, fixed.

@mimaison
Copy link
Copy Markdown
Member

mimaison commented Jun 6, 2024

Can you open a Jira so it's easier for users to find? Thanks

@apoorvmittal10 apoorvmittal10 changed the title Fixed potential thread block by admin API KAFKA-16905: Fixed potential thread block by admin API Jun 6, 2024
@apoorvmittal10
Copy link
Copy Markdown
Contributor Author

Can you open a Jira so it's easier for users to find? Thanks

Done, https://issues.apache.org/jira/browse/KAFKA-16905

Copy link
Copy Markdown
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @apoorvmittal10, LGTM.

Copy link
Copy Markdown
Member

@mumrah mumrah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fix this @apoorvmittal10 👍

LGTM

@mumrah mumrah merged commit c01279b into apache:trunk Jun 6, 2024
@ableegoldman
Copy link
Copy Markdown
Member

Hey guys -- just a gentle reminder to please check the builds carefully before merging. This patch seems to have introduced an issue that caused several tests to hang indefinitely, as you can see in the PR build for this which aborted after 7+ hours

We already have a fix and are just waiting to confirm it fixes the issue so no harm done (I've definitely merged something that broke the build myself in the past). Just wanted to raise this since the 3.8 code freeze is fast approaching 🙂

showuon pushed a commit that referenced this pull request Jun 8, 2024
Fix to complete Future which was stuck due the exception.getCause() throws an error.

The fix in the #16217 unblocked blocking thread but exception in catch block from blocking get call was wrapped in ExecutionException which is not the case when moved to async workflow hence getCause is not needed.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
…eTopics (apache#16217)

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, David Arthur <mumrah@gmail.com>
TaiJuWu pushed a commit to TaiJuWu/kafka that referenced this pull request Jun 8, 2024
Fix to complete Future which was stuck due the exception.getCause() throws an error.

The fix in the apache#16217 unblocked blocking thread but exception in catch block from blocking get call was wrapped in ExecutionException which is not the case when moved to async workflow hence getCause is not needed.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
…eTopics (apache#16217)

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, David Arthur <mumrah@gmail.com>
gongxuanzhang pushed a commit to gongxuanzhang/kafka that referenced this pull request Jun 12, 2024
Fix to complete Future which was stuck due the exception.getCause() throws an error.

The fix in the apache#16217 unblocked blocking thread but exception in catch block from blocking get call was wrapped in ExecutionException which is not the case when moved to async workflow hence getCause is not needed.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
jlprat pushed a commit to jlprat/kafka that referenced this pull request Jul 15, 2024
…eTopics (apache#16217)

Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, David Arthur <mumrah@gmail.com>
jlprat pushed a commit to jlprat/kafka that referenced this pull request Jul 15, 2024
Fix to complete Future which was stuck due the exception.getCause() throws an error.

The fix in the apache#16217 unblocked blocking thread but exception in catch block from blocking get call was wrapped in ExecutionException which is not the case when moved to async workflow hence getCause is not needed.

Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants