KAFKA-13111: Re-evaluate Fetch Sessions when using topic IDs by jolshan · Pull Request #11331 · apache/kafka

jolshan · 2021-09-16T18:18:48Z

With the changes for topic IDs, we have a different flow. When a broker receives a request, it uses a map to convert the topic ID to topic names. If the topic ID is not found in the map, we return a top level error and close the session. This decision was motivated by the difficulty to store “unresolved” partitions in the session. In earlier iterations we stored an “unresolved” partition object in the cache, but it was somewhat hard to reason about and required extra logic to try to resolve the topic ID on each incremental request and add to the session. It also required extra logic to forget the topic (either by topic ID if the topic name was never known or by topic name if it was finally resolved when we wanted to remove from the session.)

One helpful simplifying factor is that we only allow one type of request (uses topic ID or does not use topic ID) in the session. That means we can rely on a session continuing to have the same information. We don’t have to worry about converting topics only known by name to topic ID for a response and we won’t need to convert topics only known by ID to name for a response.

This PR introduces a change to store the "unresolved partitions" in the cached partition object. If a version 13+ request is sent with a topic ID that is unknown, a cached partition will be created with that fetch request data and a null topic name. On subsequent incremental requests, unresolved partitions may be resolved with the new IDs found in the metadata cache. When handling the request, getting all partitions will return a TopicIdPartition object that will be used to handle the request and build the response. Since we can rely on only one type of request (with IDs or without), the cached partitions map will have different keys depending on what fetch request version is being used.

This PR involves changes both in FetchSessionHandler and FetchSession. Some major changes are outlined below.

FetchSessionHandler: Forgetting a topic and adding a new topic with the same name - We may have a case where there is a topic foo with ID 1 in the session. Upon a subsequent metadata update, we may have topic foo with ID 2. This means that topic foo has been deleted and recreated. When sending fetch requests version 13+ we will send a request to add foo ID 2 to the session and remove foo ID 1. Otherwise, we will fall back to the same behavior for versions 12 and below
FetchSession: Resolving in Incremental Sessions - Incremental sessions contain two distinct sets of partitions. Partitions that are sent in the latest request that are new/updates/forgotten partitions and the partitions already in the session. If we want to resolve unknown topic IDs we will need to handle both cases.
- Partitions in the request - These partitions are either new or updating/forgetting previous partitions in the session. The new partitions are trivial. We either have a resolved partition or create a partition that is unresolved. For the other cases, we need to be a bit more careful.
  - For updated partitions we have a few cases – keep in mind, we may not programmatically know if a partition is an update:
    1. partition in session is resolved, update is resolved: trivial 
    2. partition in session is unresolved, update is unresolved: in code, this is equivalent to the case above, so trivial as well 
    3. partition in session is unresolved, update is resolved: this means the partition in the session does not have a name, but the metadata cache now contains the name – to fix this we can check if there exists a cached partition with the given ID and update it both with the partition update and with the topic name. 
    4. partition in session is resolved, update is unresolved: this means the partition in the session has a name, but the update was unable to be resolved (ie, the topic is deleted) – this is the odd case. We will look up the partition using the ID. We will find the old version with a name but will not replace the name. This will lead to an UNKNOWN_TOPIC_OR_PARTITION or INCONSISTENT_TOPIC_ID error which will be handled with a metadata update. Likely a future request will forget the partition, and we will be able to do so by ID. 
    5. Two partitions in the session have IDs, but they are different: only one topic ID should exist in the metadata at a time, so likely only one topic ID is in the fetch set. The other one should be in the toForget. We will be able to remove this partition from the session. If for some reason, we don't try to forget this partition — one of the partitions in the session will cause an inconsistent topic ID error and the metadata for this partition will be refreshed — this should result in the old ID being removed from the session. This should not happen if the FetchSessionHandler is correctly in sync. 
  - For the forgotten partitions we have the same cases:
    1. partition in session is resolved, forgotten is resolved: trivial 
    2. partition in session is unresolved, forgotten is unresolved: in code, this is equivalent to the case above, so trivial as well 
    3. partition in session is unresolved, forgotten is resolved: this means the partition in the session does not have a name, but the metadata cache now contains the name – to fix this we can check if there exists a cached partition with the given ID and try to forget it before we check the resolved name case. 
    4. partition in session is resolved, update is unresolved: this means the partition in the session has a name, but the update was unable to be resolved (ie, the topic is deleted) We will look up the partition using the ID. We will find the old version with a name and be able to delete it. 
    5. both partitions in the session have IDs, but they are different: This should be the same case as described above. If we somehow do not have the ID in the session, no partition will be removed. This should not happen unless the Fetch Session Handler is out of sync. 
- Partitions in the session - there may be some partitions in the session already that are unresolved. We can resolve them in forEachPartition using a method that checks if the partition is unresolved and tries to resolve it using a topicName map from the request. The partition will be resolved before the function using the cached partition is applied.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

…ions already in the session and inconsistent IDs.

dajac

@jolshan I have left few high level comments/questions. I think that we are missing few things on the fetcher side as well (e.g. topic id errors must be handled differently).

…ht away.

ijuma · 2021-10-02T13:17:52Z

Thanks for the PR. A high-level question, what are we trying to optimize for here?

Requests that don't include topic ids
Requests that include topic ids
Both
Some kind of balance of both where we compromise a bit to keep the code maintainable

jolshan · 2021-10-04T20:15:01Z

@ijuma

Thanks for the PR. A high-level question, what are we trying to optimize for here?
Requests that don't include topic ids
Requests that include topic ids
Both
Some kind of balance of both where we compromise a bit to keep the code maintainable

The goal of this PR is to gracefully handle the new topic case. Currently in kafka, when we create a new topic, the leader and Isr request is sent first, then the update metadata request. This means that we will often encounter transient "unknown_topic_id" errors. In the new world of topic IDs, we will see this as "unknown topic ID" errors. The current logic returns a top level error and delays all partitions. This is a regression from previous behavior, and so this PR's goal is to return to the behavior where we store the unknown partition in the session until it can be resolved. See https://issues.apache.org/jira/browse/KAFKA-13111 for more information.

…ameter, other minor fixes

…hCode methods

…ixes

dajac

@jolshan Thanks for the update. I left some comments.

dajac

Thanks for the updates. Left a few more comments.

jolshan · 2021-11-10T17:07:04Z

System test results: http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2021-11-10--001.system-test-kafka-branch-builder--1636543025--jolshan--KAFKA-13111--165a106bf3/report.html

A previous run was all green, so will need to confirm the 3 failed tests are unrelated to this change.

jolshan · 2021-11-10T19:38:05Z

Looks like the topic id partition changes broke the build. I'll probably need to pull the latest version.

dajac · 2021-11-11T14:19:24Z

System test results: http://confluent-kafka-branch-builder-system-test-results.s3-us-west-2.amazonaws.com/2021-11-10--001.system-test-kafka-branch-builder--1636543025--jolshan--KAFKA-13111--165a106bf3/report.html

A previous run was all green, so will need to confirm the 3 failed tests are unrelated to this change.

@jolshan Have you been able to triage these failures?

dajac

LGTM! Thanks @jolshan for your effort on this one. We can merge the PR once the system tests status is clarified.

With the changes for topic IDs, we have a different flow. When a broker receives a request, it uses a map to convert the topic ID to topic names. If the topic ID is not found in the map, we return a top level error and close the session. This decision was motivated by the difficulty to store “unresolved” partitions in the session. In earlier iterations we stored an “unresolved” partition object in the cache, but it was somewhat hard to reason about and required extra logic to try to resolve the topic ID on each incremental request and add to the session. It also required extra logic to forget the topic (either by topic ID if the topic name was never known or by topic name if it was finally resolved when we wanted to remove from the session.) One helpful simplifying factor is that we only allow one type of request (uses topic ID or does not use topic ID) in the session. That means we can rely on a session continuing to have the same information. We don’t have to worry about converting topics only known by name to topic ID for a response and we won’t need to convert topics only known by ID to name for a response. This PR introduces a change to store the "unresolved partitions" in the cached partition object. If a version 13+ request is sent with a topic ID that is unknown, a cached partition will be created with that fetch request data and a null topic name. On subsequent incremental requests, unresolved partitions may be resolved with the new IDs found in the metadata cache. When handling the request, getting all partitions will return a TopicIdPartition object that will be used to handle the request and build the response. Since we can rely on only one type of request (with IDs or without), the cached partitions map will have different keys depending on what fetch request version is being used. This PR involves changes both in FetchSessionHandler and FetchSession. Some major changes are outlined below. 1. FetchSessionHandler: Forgetting a topic and adding a new topic with the same name - We may have a case where there is a topic foo with ID 1 in the session. Upon a subsequent metadata update, we may have topic foo with ID 2. This means that topic foo has been deleted and recreated. When sending fetch requests version 13+ we will send a request to add foo ID 2 to the session and remove foo ID 1. Otherwise, we will fall back to the same behavior for versions 12 and below 2. FetchSession: Resolving in Incremental Sessions - Incremental sessions contain two distinct sets of partitions. Partitions that are sent in the latest request that are new/updates/forgotten partitions and the partitions already in the session. If we want to resolve unknown topic IDs we will need to handle both cases. * Partitions in the request - These partitions are either new or updating/forgetting previous partitions in the session. The new partitions are trivial. We either have a resolved partition or create a partition that is unresolved. For the other cases, we need to be a bit more careful. * For updated partitions we have a few cases – keep in mind, we may not programmatically know if a partition is an update: 1. partition in session is resolved, update is resolved: trivial  2. partition in session is unresolved, update is unresolved: in code, this is equivalent to the case above, so trivial as well  3. partition in session is unresolved, update is resolved: this means the partition in the session does not have a name, but the metadata cache now contains the name – to fix this we can check if there exists a cached partition with the given ID and update it both with the partition update and with the topic name.  4. partition in session is resolved, update is unresolved: this means the partition in the session has a name, but the update was unable to be resolved (ie, the topic is deleted) – this is the odd case. We will look up the partition using the ID. We will find the old version with a name but will not replace the name. This will lead to an UNKNOWN_TOPIC_OR_PARTITION or INCONSISTENT_TOPIC_ID error which will be handled with a metadata update. Likely a future request will forget the partition, and we will be able to do so by ID.  5. Two partitions in the session have IDs, but they are different: only one topic ID should exist in the metadata at a time, so likely only one topic ID is in the fetch set. The other one should be in the toForget. We will be able to remove this partition from the session. If for some reason, we don't try to forget this partition — one of the partitions in the session will cause an inconsistent topic ID error and the metadata for this partition will be refreshed — this should result in the old ID being removed from the session. This should not happen if the FetchSessionHandler is correctly in sync.  * For the forgotten partitions we have the same cases: 1. partition in session is resolved, forgotten is resolved: trivial  2. partition in session is unresolved, forgotten is unresolved: in code, this is equivalent to the case above, so trivial as well  3. partition in session is unresolved, forgotten is resolved: this means the partition in the session does not have a name, but the metadata cache now contains the name – to fix this we can check if there exists a cached partition with the given ID and try to forget it before we check the resolved name case.  4. partition in session is resolved, update is unresolved: this means the partition in the session has a name, but the update was unable to be resolved (ie, the topic is deleted) We will look up the partition using the ID. We will find the old version with a name and be able to delete it.  5. both partitions in the session have IDs, but they are different: This should be the same case as described above. If we somehow do not have the ID in the session, no partition will be removed. This should not happen unless the Fetch Session Handler is out of sync.  * Partitions in the session - there may be some partitions in the session already that are unresolved. We can resolve them in forEachPartition using a method that checks if the partition is unresolved and tries to resolve it using a topicName map from the request. The partition will be resolved before the function using the cached partition is applied. Reviewers: David Jacot <djacot@confluent.io>

dajac · 2021-11-15T09:06:16Z

System test failures are not related. Merged to trunk and to 3.1.

…11331) With the changes for topic IDs, we have a different flow. When a broker receives a request, it uses a map to convert the topic ID to topic names. If the topic ID is not found in the map, we return a top level error and close the session. This decision was motivated by the difficulty to store “unresolved” partitions in the session. In earlier iterations we stored an “unresolved” partition object in the cache, but it was somewhat hard to reason about and required extra logic to try to resolve the topic ID on each incremental request and add to the session. It also required extra logic to forget the topic (either by topic ID if the topic name was never known or by topic name if it was finally resolved when we wanted to remove from the session.) One helpful simplifying factor is that we only allow one type of request (uses topic ID or does not use topic ID) in the session. That means we can rely on a session continuing to have the same information. We don’t have to worry about converting topics only known by name to topic ID for a response and we won’t need to convert topics only known by ID to name for a response. This PR introduces a change to store the "unresolved partitions" in the cached partition object. If a version 13+ request is sent with a topic ID that is unknown, a cached partition will be created with that fetch request data and a null topic name. On subsequent incremental requests, unresolved partitions may be resolved with the new IDs found in the metadata cache. When handling the request, getting all partitions will return a TopicIdPartition object that will be used to handle the request and build the response. Since we can rely on only one type of request (with IDs or without), the cached partitions map will have different keys depending on what fetch request version is being used. This PR involves changes both in FetchSessionHandler and FetchSession. Some major changes are outlined below. 1. FetchSessionHandler: Forgetting a topic and adding a new topic with the same name - We may have a case where there is a topic foo with ID 1 in the session. Upon a subsequent metadata update, we may have topic foo with ID 2. This means that topic foo has been deleted and recreated. When sending fetch requests version 13+ we will send a request to add foo ID 2 to the session and remove foo ID 1. Otherwise, we will fall back to the same behavior for versions 12 and below 2. FetchSession: Resolving in Incremental Sessions - Incremental sessions contain two distinct sets of partitions. Partitions that are sent in the latest request that are new/updates/forgotten partitions and the partitions already in the session. If we want to resolve unknown topic IDs we will need to handle both cases. * Partitions in the request - These partitions are either new or updating/forgetting previous partitions in the session. The new partitions are trivial. We either have a resolved partition or create a partition that is unresolved. For the other cases, we need to be a bit more careful. * For updated partitions we have a few cases – keep in mind, we may not programmatically know if a partition is an update: 1. partition in session is resolved, update is resolved: trivial  2. partition in session is unresolved, update is unresolved: in code, this is equivalent to the case above, so trivial as well  3. partition in session is unresolved, update is resolved: this means the partition in the session does not have a name, but the metadata cache now contains the name – to fix this we can check if there exists a cached partition with the given ID and update it both with the partition update and with the topic name.  4. partition in session is resolved, update is unresolved: this means the partition in the session has a name, but the update was unable to be resolved (ie, the topic is deleted) – this is the odd case. We will look up the partition using the ID. We will find the old version with a name but will not replace the name. This will lead to an UNKNOWN_TOPIC_OR_PARTITION or INCONSISTENT_TOPIC_ID error which will be handled with a metadata update. Likely a future request will forget the partition, and we will be able to do so by ID.  5. Two partitions in the session have IDs, but they are different: only one topic ID should exist in the metadata at a time, so likely only one topic ID is in the fetch set. The other one should be in the toForget. We will be able to remove this partition from the session. If for some reason, we don't try to forget this partition — one of the partitions in the session will cause an inconsistent topic ID error and the metadata for this partition will be refreshed — this should result in the old ID being removed from the session. This should not happen if the FetchSessionHandler is correctly in sync.  * For the forgotten partitions we have the same cases: 1. partition in session is resolved, forgotten is resolved: trivial  2. partition in session is unresolved, forgotten is unresolved: in code, this is equivalent to the case above, so trivial as well  3. partition in session is unresolved, forgotten is resolved: this means the partition in the session does not have a name, but the metadata cache now contains the name – to fix this we can check if there exists a cached partition with the given ID and try to forget it before we check the resolved name case.  4. partition in session is resolved, update is unresolved: this means the partition in the session has a name, but the update was unable to be resolved (ie, the topic is deleted) We will look up the partition using the ID. We will find the old version with a name and be able to delete it.  5. both partitions in the session have IDs, but they are different: This should be the same case as described above. If we somehow do not have the ID in the session, no partition will be removed. This should not happen unless the Fetch Session Handler is out of sync.  * Partitions in the session - there may be some partitions in the session already that are unresolved. We can resolve them in forEachPartition using a method that checks if the partition is unresolved and tries to resolve it using a topicName map from the request. The partition will be resolved before the function using the cached partition is applied. Reviewers: David Jacot <djacot@confluent.io>

jolshan added 6 commits September 8, 2021 16:11

change to topicIdPartition

2179f0c

Handle unknown topic IDs for full/sessionless cases.

28acda9

Handling some incremental session things. Still need to handle partit…

8be7a96

…ions already in the session and inconsistent IDs.

fix inconsistent topic ID handling for sessionless fetch contexts

79319f3

finish up first draft of handling incremental partitions

9dcfa3f

Remove unnecessary error handling

643d7ff

dajac self-requested a review September 16, 2021 18:31

jolshan commented Sep 16, 2021

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/common/requests/FetchRequest.java Outdated

jolshan commented Sep 16, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/FetchSession.scala Outdated

jolshan commented Sep 17, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/FetchSession.scala Outdated

jolshan commented Sep 17, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/ReplicaAlterLogDirsThread.scala

jolshan commented Sep 20, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/DelayedFetch.scala Outdated

jolshan commented Sep 20, 2021

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/server/FetchRequestMaxBytesTest.scala

jolshan commented Sep 20, 2021

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/server/FetchSessionTest.scala Outdated

jolshan added 3 commits September 20, 2021 14:22

cleanups

eb8a6ed

Merge branch 'trunk' of github.com:apache/kafka into KAFKA-13111

11deaa8

Change equality checks based on which version we are using

938d35e

dajac reviewed Sep 24, 2021

View reviewed changes

jolshan added 3 commits September 28, 2021 15:36

Merge branch 'trunk' of github.com:apache/kafka into KAFKA-13111

8ea8653

try to transform fetch session handler

815e8c0

Change FetchSessionHandler again to handle creating a new session rig…

4caa2dd

…ht away.

ijuma mentioned this pull request Oct 2, 2021

MINOR: TopicIdPartition improvements #11374

Merged

3 tasks

jolshan and others added 3 commits October 6, 2021 09:41

Move topic ID error to partition level, remove extra topic ID map par…

5e7b628

…ameter, other minor fixes

check version using nodeApiVersions

32c6297

KAFKA-13111; FetchSessionHandler WIP

a1de391

dajac reviewed Oct 12, 2021

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/clients/FetchSessionHandler.java Outdated

more refactor

2708655

dajac reviewed Oct 13, 2021

View reviewed changes

Comment thread core/src/main/scala/kafka/server/KafkaApis.scala Outdated

dajac added 2 commits November 8, 2021 15:24

PartitionData should take into account topicId in both equals and has…

43d69a1

…hCode methods

Add more tests to FetcherTest; Update logic in FetchSessionHandler; F…

594b549

…ixes

jolshan commented Nov 8, 2021

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/clients/FetchSessionHandler.java

jolshan commented Nov 8, 2021

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/clients/FetchSessionHandler.java

jolshan commented Nov 8, 2021

View reviewed changes

Comment thread clients/src/main/java/org/apache/kafka/clients/FetchSessionHandler.java

Fix tests, remove outdated error comments from FetchRequest/Response

321ea1d

jolshan commented Nov 8, 2021

View reviewed changes

Comment thread clients/src/test/java/org/apache/kafka/clients/consumer/internals/FetcherTest.java

Add tests for AbstractFetcherThread and ReplicaFetcherThread

f84b6f9

dajac reviewed Nov 9, 2021

View reviewed changes

Add testResolveUnknownPartitions test

63bc476

jolshan commented Nov 10, 2021

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/server/FetchSessionTest.scala

jolshan and others added 3 commits November 9, 2021 20:27

More cleanups and test fixes

1a5fa71

Merge branch 'trunk' of github.com:apache/kafka into KAFKA-13111

165a106

Small fix in testResolveUnknownPartitions

f8b1d14

dajac reviewed Nov 10, 2021

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/server/KafkaApisTest.scala Outdated

dajac reviewed Nov 10, 2021

View reviewed changes

jolshan commented Nov 10, 2021

View reviewed changes

Comment thread core/src/test/scala/unit/kafka/server/FetchSessionTest.scala Outdated

jolshan and others added 5 commits November 10, 2021 15:27

Further test fixes

09fad0a

prepare for topicIdPartition refactor

eab5380

Merge branch 'trunk' of github.com:apache/kafka into KAFKA-13111

5048d03

fix build

ae81d04

fix build

d458fd1

dajac approved these changes Nov 11, 2021

View reviewed changes

dajac merged commit e8818e2 into apache:trunk Nov 15, 2021

Conversation

jolshan commented Sep 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Committer Checklist (excluded from commit message)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ijuma commented Oct 2, 2021

Uh oh!

jolshan commented Oct 4, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jolshan commented Nov 10, 2021

Uh oh!

Uh oh!

jolshan commented Nov 10, 2021

Uh oh!

dajac commented Nov 11, 2021

Uh oh!

dajac left a comment

Choose a reason for hiding this comment

Uh oh!

dajac commented Nov 15, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jolshan commented Sep 16, 2021 •

edited

Loading