Skip to content

KAFKA-14194: Fix NPE in Cluster.nodeIfOnline#12584

Merged
dajac merged 2 commits intoapache:trunkfrom
ndrwdn:KAFKA-14194-nodeIfOnline-npe
Sep 5, 2022
Merged

KAFKA-14194: Fix NPE in Cluster.nodeIfOnline#12584
dajac merged 2 commits intoapache:trunkfrom
ndrwdn:KAFKA-14194-nodeIfOnline-npe

Conversation

@ndrwdn
Copy link
Copy Markdown
Contributor

@ndrwdn ndrwdn commented Sep 1, 2022

When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the Cluster.nodeIfOnline method throws an NPE. This PR checks to make sure that a given topic-partition has partition info available before using that partition info.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ndrwdn Thanks for the patch. Good catch! I left a few comments for considerations.

public void testNodeIfOnlineNonExistentTopicPartition() {
Node node0 = new Node(0, "localhost", 9092);

MetadataResponse metadataResponse = RequestTestUtils.metadataUpdateWith("dummy", 2, Collections.emptyMap(), Collections.emptyMap(), _tp -> 99,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use RequestTestUtils.metadataUpdateWith(2, Collections.emptyMap())? It seems to me that we would end up with the same result because of the empty maps.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out this test helper, I made the change.

new MetadataResponse.PartitionMetadata(error, partition, Optional.of(node0.id()), leaderEpoch,
Collections.singletonList(node0.id()), Collections.emptyList(),
Collections.emptyList()), ApiKeys.METADATA.latestVersion(), Collections.emptyMap());
metadata.updateWithCurrentRequestVersion(emptyMetadataResponse(), false, 0L);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this one really required?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it doesn't seem to be, removed.

Comment thread clients/src/test/java/org/apache/kafka/clients/MetadataTest.java
@dajac dajac changed the title KAFKA-14194 Fix NPE in Cluster.nodeIfOnline KAFKA-14194: Fix NPE in Cluster.nodeIfOnline Sep 2, 2022
@ndrwdn
Copy link
Copy Markdown
Contributor Author

ndrwdn commented Sep 2, 2022

@dajac Thank you for the feedback! I've incorporated the changes you suggested and ran the test case without my change to Cluster.nodeIfOnline to verify it still catches the bug. Please let me know if there is anything else you spot!

@ndrwdn ndrwdn requested a review from dajac September 2, 2022 16:42
Copy link
Copy Markdown
Member

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the patch.

@dajac dajac merged commit 0cbf74a into apache:trunk Sep 5, 2022
@ijuma
Copy link
Copy Markdown
Member

ijuma commented Sep 6, 2022

@dajac do we need to backport this?

dajac pushed a commit that referenced this pull request Sep 9, 2022
When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the `Cluster.nodeIfOnline` method throws an NPE. This patch checks to make sure that a given topic-partition has partition info available before using that partition info.

Reviewers: David Jacot <djacot@confluent.io>
dajac pushed a commit that referenced this pull request Sep 9, 2022
When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the `Cluster.nodeIfOnline` method throws an NPE. This patch checks to make sure that a given topic-partition has partition info available before using that partition info.

Reviewers: David Jacot <djacot@confluent.io>
@dajac
Copy link
Copy Markdown
Member

dajac commented Sep 9, 2022

@ijuma That makes sense. Backported it to 3.3 and 3.2. cc @jsancio

philipnee pushed a commit to confluentinc/kafka that referenced this pull request Sep 14, 2022
When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the `Cluster.nodeIfOnline` method throws an NPE. This patch checks to make sure that a given topic-partition has partition info available before using that partition info.

Reviewers: David Jacot <djacot@confluent.io>
fmin added a commit to confluentinc/kafka that referenced this pull request Sep 14, 2022
…2-14-SEP-2022

* apache-kafka/3.2: (45 commits)
  MINOR: Bump version in upgrade guide to 3.2.3
  KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits (apache#12626)
  KAFKA-14196; Do not continue fetching partitions awaiting auto-commit prior to revocation (apache#12603)
  MINOR: 3.2 branch version to 3.2.3-SNAPSHOT
  Bump version to 3.2.2
  Upgrade Netty and Jackson versions for CVE fixes [KAFKA-14044] (apache#12376)
  KAFKA-14194: Fix NPE in Cluster.nodeIfOnline (apache#12584)
  MINOR: Update LICENSE-binary
  MINOR: Align Scala version to 2.13.8
  MINOR: Bump version in upgrade guide to 3.2.2
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants