MINOR: retry upon missing source topic#20284
Conversation
lucasbru
left a comment
There was a problem hiding this comment.
Mostly looking good to me. I left a few comments to cleanup/simplify the code.
| private volatile KafkaFutureImpl<Uuid> producerInstanceIdFuture = new KafkaFutureImpl<>(); | ||
|
|
||
| // Missing source topic timeout tracking | ||
| private long firstMissingSourceTopicTime = -1L; |
There was a problem hiding this comment.
Maybe it would make things slighly more easy to read if we'd use
org.apache.kafka.common.utils.Timer for this?
There was a problem hiding this comment.
Also, can we rename this to a more generic topicsReadyTimer? I think we may want to reuse the timer to also time out when internal topics are not created in time.
| private volatile KafkaFutureImpl<Uuid> restoreConsumerInstanceIdFuture = new KafkaFutureImpl<>(); | ||
| private volatile KafkaFutureImpl<Uuid> producerInstanceIdFuture = new KafkaFutureImpl<>(); | ||
|
|
||
| // Missing source topic timeout tracking |
There was a problem hiding this comment.
If you describe a member, I'd use a javadoc comment. But this comment isn't adding anything on top of the variable name, so maybe we can drop it altogether?
| handleMissingSourceTopicsWithTimeout(missingTopicsDetail); | ||
| } else { | ||
| // Reset timeout tracking when no missing source topics are reported | ||
| firstMissingSourceTopicTime = -1L; |
There was a problem hiding this comment.
I think if you use org.apache.kafka.common.utils.Timer and call reset here, you don't need the inline comment.
|
|
||
| // Advance time beyond max.poll.interval.ms (default is 300000ms) to trigger timeout | ||
| mockTime.sleep(300001); | ||
|
|
There was a problem hiding this comment.
Suggestion: advance time less than 5 min and check if 2nd call throws exception and also check the log message (if easy) and then next step advancing time beyond 5 min as you did!
|
|
||
| // First call should not throw exception (within timeout) | ||
| thread.runOnceWithoutProcessingThreads(); | ||
|
|
There was a problem hiding this comment.
The same suggestion (below) here.
|
@RaidenE1 Thank you, the PR looks good to me now. I had a suggestion, but it’s not necessary to address since you’re already checking the condition in a later test. |
| log.error(errorMsg); | ||
|
|
||
| // Reset timer for next timeout cycle | ||
| topicsReadyTimer.updateAndReset(maxPollTimeMs); |
There was a problem hiding this comment.
Why do we need to update the timer? We throw MissingSourceTopicException below, and this should shut down the thread?
| final String errorMsg = status.statusDetail(); | ||
| log.error(errorMsg); | ||
| throw new MissingSourceTopicException(errorMsg); | ||
| throw new TopologyException(errorMsg); |
There was a problem hiding this comment.
It seems this case is newly added, but we did not add a new test for it?
…eleted (#20735) Integration tests expecting `MissingSourceTopicException` were timing out and taking 5+ minutes because [PR #20284](#20284) set the missing topics timeout to `max.poll.interval.ms` (default 300 seconds). This is inappropriate because: - When source topics don't exist, actual poll frequency is `poll.ms` (100ms), not `max.poll.interval.ms` - The exception should be raised much faster based on heartbeat frequency ## Solution Use `2 * heartbeatIntervalMs` (from broker via KIP-1071) as the timeout instead: - Ensures at least one heartbeat is sent before raising the exception - Falls back to `max.poll.interval.ms` for backward compatibility - Fixes slow integration tests ## Changes 1. Track `heartbeatIntervalMs` in `StreamsRebalanceData` and update it on each heartbeat response 2. Use `2 * heartbeatIntervalMs` as timeout in `StreamThread.handleMissingSourceTopicsWithTimeout()` 3. Update `HandlingSourceTopicDeletionIntegrationTest` to test both "classic" and "streams" protocols 4. Update `StreamsRebalanceDataTest`, `StreamsGroupHeartbeatRequestManagerTest` and `StreamThreadTest` to test `StreamsRebalanceData` and its behaviour Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…eleted (apache#20735) Integration tests expecting `MissingSourceTopicException` were timing out and taking 5+ minutes because [PR apache#20284](apache#20284) set the missing topics timeout to `max.poll.interval.ms` (default 300 seconds). This is inappropriate because: - When source topics don't exist, actual poll frequency is `poll.ms` (100ms), not `max.poll.interval.ms` - The exception should be raised much faster based on heartbeat frequency ## Solution Use `2 * heartbeatIntervalMs` (from broker via KIP-1071) as the timeout instead: - Ensures at least one heartbeat is sent before raising the exception - Falls back to `max.poll.interval.ms` for backward compatibility - Fixes slow integration tests ## Changes 1. Track `heartbeatIntervalMs` in `StreamsRebalanceData` and update it on each heartbeat response 2. Use `2 * heartbeatIntervalMs` as timeout in `StreamThread.handleMissingSourceTopicsWithTimeout()` 3. Update `HandlingSourceTopicDeletionIntegrationTest` to test both "classic" and "streams" protocols 4. Update `StreamsRebalanceDataTest`, `StreamsGroupHeartbeatRequestManagerTest` and `StreamThreadTest` to test `StreamsRebalanceData` and its behaviour Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…eleted (apache#20735) Integration tests expecting `MissingSourceTopicException` were timing out and taking 5+ minutes because [PR apache#20284](apache#20284) set the missing topics timeout to `max.poll.interval.ms` (default 300 seconds). This is inappropriate because: - When source topics don't exist, actual poll frequency is `poll.ms` (100ms), not `max.poll.interval.ms` - The exception should be raised much faster based on heartbeat frequency ## Solution Use `2 * heartbeatIntervalMs` (from broker via KIP-1071) as the timeout instead: - Ensures at least one heartbeat is sent before raising the exception - Falls back to `max.poll.interval.ms` for backward compatibility - Fixes slow integration tests ## Changes 1. Track `heartbeatIntervalMs` in `StreamsRebalanceData` and update it on each heartbeat response 2. Use `2 * heartbeatIntervalMs` as timeout in `StreamThread.handleMissingSourceTopicsWithTimeout()` 3. Update `HandlingSourceTopicDeletionIntegrationTest` to test both "classic" and "streams" protocols 4. Update `StreamsRebalanceDataTest`, `StreamsGroupHeartbeatRequestManagerTest` and `StreamThreadTest` to test `StreamsRebalanceData` and its behaviour Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…eleted (apache#20735) Integration tests expecting `MissingSourceTopicException` were timing out and taking 5+ minutes because [PR apache#20284](apache#20284) set the missing topics timeout to `max.poll.interval.ms` (default 300 seconds). This is inappropriate because: - When source topics don't exist, actual poll frequency is `poll.ms` (100ms), not `max.poll.interval.ms` - The exception should be raised much faster based on heartbeat frequency ## Solution Use `2 * heartbeatIntervalMs` (from broker via KIP-1071) as the timeout instead: - Ensures at least one heartbeat is sent before raising the exception - Falls back to `max.poll.interval.ms` for backward compatibility - Fixes slow integration tests ## Changes 1. Track `heartbeatIntervalMs` in `StreamsRebalanceData` and update it on each heartbeat response 2. Use `2 * heartbeatIntervalMs` as timeout in `StreamThread.handleMissingSourceTopicsWithTimeout()` 3. Update `HandlingSourceTopicDeletionIntegrationTest` to test both "classic" and "streams" protocols 4. Update `StreamsRebalanceDataTest`, `StreamsGroupHeartbeatRequestManagerTest` and `StreamThreadTest` to test `StreamsRebalanceData` and its behaviour Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…eleted (apache#20735) Integration tests expecting `MissingSourceTopicException` were timing out and taking 5+ minutes because [PR apache#20284](apache#20284) set the missing topics timeout to `max.poll.interval.ms` (default 300 seconds). This is inappropriate because: - When source topics don't exist, actual poll frequency is `poll.ms` (100ms), not `max.poll.interval.ms` - The exception should be raised much faster based on heartbeat frequency ## Solution Use `2 * heartbeatIntervalMs` (from broker via KIP-1071) as the timeout instead: - Ensures at least one heartbeat is sent before raising the exception - Falls back to `max.poll.interval.ms` for backward compatibility - Fixes slow integration tests ## Changes 1. Track `heartbeatIntervalMs` in `StreamsRebalanceData` and update it on each heartbeat response 2. Use `2 * heartbeatIntervalMs` as timeout in `StreamThread.handleMissingSourceTopicsWithTimeout()` 3. Update `HandlingSourceTopicDeletionIntegrationTest` to test both "classic" and "streams" protocols 4. Update `StreamsRebalanceDataTest`, `StreamsGroupHeartbeatRequestManagerTest` and `StreamThreadTest` to test `StreamsRebalanceData` and its behaviour Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Implements a timeout mechanism (using maxPollTimeMs) that waits for
missing source topics to be created before failing, instead of
immediately throwing exceptions in the new Streams protocol.
Additionally, throw TopologyException when partition count mismatch is
detected.
Reviewers: Lucas Brutschy lbrutschy@confluent.io, Alieh Saeedi
asaeedi@confluent.io, Matthias J. Sax matthias@confluent.io