Skip to content

KAFKA-8785: fix request timeout by waiting for metadata cache up-to-date#11681

Merged
showuon merged 3 commits intoapache:trunkfrom
showuon:KAFKA-8785
Apr 19, 2022
Merged

KAFKA-8785: fix request timeout by waiting for metadata cache up-to-date#11681
showuon merged 3 commits intoapache:trunkfrom
showuon:KAFKA-8785

Conversation

@showuon
Copy link
Copy Markdown
Member

@showuon showuon commented Jan 15, 2022

We initializing adminClient, we'll first fetch the metadata of the brokers. Usually, we expected the brokers are already up (and metadata are updated in server side). So that we can choose 1 node to connect to in the following request. But if we only fetch "partial" metadata of the brokers, and after the "partial" brokers shutdown, the adminClient will not work anymore due to no nodes to connect to.

So, the reason why this test is flaky is because we have race condition at the beginning of the test, when brokers are staring up, and the adminClient is requesting for brokers metadata. Once the adminClient only got broker2/broker3, the test will fail, because in these tests, broker2 and broker3 will be shutdown to test leader election.

Fix this issue by explicitly waiting for metadata cache up-to-date in waitForReadyBrokers, and let admin client get created after waitForReadyBrokers.

java.util.concurrent.ExecutionException: org.apache.kafka.common.errors.TimeoutException: Timed out waiting for a node assignment. Call: describeTopics
	at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:395)
	at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1999)
	at org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
	at kafka.utils.TestUtils$.$anonfun$waitForBrokersOutOfIsr$1(TestUtils.scala:1897)
	at kafka.utils.TestUtils$.waitForBrokersOutOfIsr(TestUtils.scala:992)
	at kafka.admin.LeaderElectionCommandTest10.testPathToJsonFile(LeaderElectionCommandTest10.scala:155)

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@showuon showuon changed the title KAFKA-8785: fix flaky request timeout error by waiting for brokers up at the begining KAFKA-8785: fix request timeout by waiting for brokers up at the begining Jan 15, 2022
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

side fix: the configOverrides should be passed into the clientProperties.

Comment on lines 407 to 416
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

support no argument and configOverrides argument.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 15, 2022

@hachikuji @dajac , please take a look. Thanks.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Jan 16, 2022

Run twice and have no failed tests from LeaderElectionCommandTest

    Build / JDK 11 and Scala 2.13 / kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords()
    Build / JDK 11 and Scala 2.13 / kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords()
    Build / JDK 8 and Scala 2.12 / org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest.testReplication()
    Build / JDK 8 and Scala 2.12 / kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords()
    Build / JDK 8 and Scala 2.12 / kafka.api.PlaintextAdminIntegrationTest.testReplicaCanFetchFromLogStartOffsetAfterDeleteRecords()
    Build / ARM / kafka.server.ReplicaManagerTest.[1] usesTopicIds=true
    Build / JDK 17 and Scala 2.13 / kafka.api.PlaintextAdminIntegrationTest.testInvalidAlterPartitionReassignments()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At L73, there is cluster.waitForReadyBrokers() which wait until the cluster is ready. Would it make sense to move the creating of the admin client after that in all tests?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 This makes sense to me.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hachikuji , it doesn't work to put the admin client creating after cluster.waitForReadyBrokers(), because the waitForReadyBrokers only wait for all brokers registered and unfenced (ref: here), but the metadata cache in broker might have not updated the cluster metadata. I think we need to explicitly wait for all brokers up via describeCluster and then create adminClient for testing.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't see @showuon's comment below about metadata cache consistency. I guess the issue is that unfencing does not necessarily imply that the broker is caught up to the end of the metadata log. I know previously we were considering having the broker consume its own registration before it declared itself ready to unfence, but it looks like we haven't implemented that. Perhaps a short-term fix is to change KafkaClusterTestKit.waitForReadyBrokers to check metadata caches directly?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a short-term fix is to change KafkaClusterTestKit.waitForReadyBrokers to check metadata caches directly?

Sounds good. Let me give it a try. Thanks for the comment.

@dajac
Copy link
Copy Markdown
Member

dajac commented Feb 5, 2022

@showuon Thanks for the PR. I left a suggestion. Overall, I wonder if the real issue is in the admin client. If we have all the brokers in the bootstrap list, it should be able to recover if there are no nodes. However, it seems that it does not try to re-resolve the bootstrap list in this case.

We could perhaps apply the suggestion that I've made to fix the flaky tests and file a jira if there is a real limitation in the admin client. What do you think?

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Feb 5, 2022

@dajac , thanks for the comment.

We could perhaps apply the suggestion that I've made to fix the flaky tests

Sure. I'll update the PR later.

file a jira if there is a real limitation in the admin client. What do you think?

Agree!

Thank you.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Feb 6, 2022

@dajac , it doesn't work to put the admin client creating after cluster.waitForReadyBrokers(), because the waitForReadyBrokers only wait for all brokers registered and unfenced (ref: here), but the metadata cache in broker might have not updated the cluster metadata. I think we need to explicitly wait for all brokers up via describeCluster and then create adminClient for testing.

WDYT?

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Feb 8, 2022

@dajac , KAFKA-13653 is created for the improvement to proactively discover alive brokers from bootstrap server lists when all nodes are down. Thanks.

@hachikuji
Copy link
Copy Markdown
Contributor

it doesn't work to put the admin client creating after cluster.waitForReadyBrokers(), because the waitForReadyBrokers only wait for all brokers registered and unfenced (ref: here), but the metadata cache in broker might have not updated the cluster metadata. I think we need to explicitly wait for all brokers up via describeCluster and then create adminClient for testing.

Hmm, I wonder if this is right. I checked the QuorumController logic and unfencing implies that the broker has caught up to the current end of the log. At least it implies that the offset reported in the heartbeat request is up to date. From what I can tell, there might still be a race between the time that this offset gets updated in BrokerMetadataListener and the time that the latest entries have been published to BrokerMetadataPublisher. Is that delay enough to explain the issue?

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Feb 17, 2022

@hachikuji , let me check it and let you know.

@showuon showuon changed the title KAFKA-8785: fix request timeout by waiting for brokers up at the begining KAFKA-8785: fix request timeout by waiting for metadata cache up-to-date Mar 1, 2022
@showuon
Copy link
Copy Markdown
Member Author

showuon commented Mar 1, 2022

@hachikuji , I finally got time to investigate it. Short answer to your question: Yes, you're right!

From what I can tell, there might still be a race between the time that this offset gets updated in BrokerMetadataListener and the time that the latest entries have been published to BrokerMetadataPublisher. Is that delay enough to explain the issue?

Yes, when broker unfenced, it means it has caught up to the current end of the log. So basically, the broker is ready. But the BrokerMetadataPublisher is in another thread, which will have race condition with the main thread (in test). That is, even when waitForReadyBrokers passed, all the broker images might still be out-of-date.

Fix it now by waiting the metadata cache up-to-date in waitForReadyBrokers. Thank you.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Mar 1, 2022

Also updated the PR title and description. Thanks.

@hachikuji
Copy link
Copy Markdown
Contributor

@showuon Thanks for the update. What do you think about modifying the heartbeating logic so that we do not advertise an offset to the controller until it has been published?

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Mar 14, 2022

What do you think about modifying the heartbeating logic so that we do not advertise an offset to the controller until it has been published?

@hachikuji , thanks for your suggestion. But sorry, I'm not very familiar with kraft. As far as I can see, we'll wait until the broker has caught up with the latest metadata from the controller quorum (i.e. initialCatchUpFuture) before we unfenced the broker. Then we send out the unfenced heartbeat to controller, controller check wantFence and high offset to see if this broker can be unfenced. Next, it'll commit the unfenced metadata, and wait for the broker to read the update, and let the listener to publish the update to the metadata cache. That's what my understand (correct me if I'm wrong).

So, if we don't advertise the offset to controller, the controller won't unfence the broker becaue metadata is not caught up, and no metadata commit happen, and the broker listener won't publish the unfenced info. I don't know how we can achieve it. Could you help explain to me?

Thank you.

Another thing is that do you think we could take this temporary solution (i.e. waiting the metadata cache up-to-date in waitForReadyBrokers) to fix the tests first, and we can work on the better solution you suggested in another PR if it takes time. WDYT?

@dengziming
Copy link
Copy Markdown
Member

if we don't advertise the offset to controller,

@showuon, Here, Jason means we postpone advertising offset after the publisher has finished publishing, which means MetadataCache has been updated.

We can achieve it in a simpler way here:

  1. Add a method/field in MetadataPublisher, def highestMetadataOffset long
  2. every time we call BrokerMetadataPublisher.publish(), we set the highestMetadataOffset to the newest offset, maybe we should add a parameter logEndOffset to BrokerMetadataPublisher.publish()
  3. Change the code lifecycleManager.start(() => metadataListener.highestMetadataOffset ) to lifecycleManager.start(() => metadataPublisher.highestMetadataOffset in BrokerServer.

PTAL.

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Mar 25, 2022

@dengziming , cool! Thanks for the info. Are you interested in submitting another PR to address it? That would be helpful and make v3.2.0 release sooner!

@dengziming
Copy link
Copy Markdown
Member

@showuon Thank you, I will have a try.

@cadonna
Copy link
Copy Markdown
Member

cadonna commented Apr 7, 2022

@showuon @dengziming @hachikuji @dajac What is the status of this PR? This PR would resolve a blocker for the 3.2.0 release. Would be great if we could merge this PR as soon as possible?

Copy link
Copy Markdown
Member

@dengziming dengziming left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR has addressed the specified problem, we can fix the remaining problem in another PR?
with @hachikuji 's suggestion, it still can't be perfectly solved since we will encounter the "chicken and egg" problem here: QuorumController waits for the broker to publish UnfenceBrokerRecord to unfenced it, the broker waits for the QuorumController to unfenced it and generate UnfenceBrokerRecord. see #11951 (comment)

public void waitForReadyBrokers() throws ExecutionException, InterruptedException {
// We can choose any controller, not just the active controller.
// If we choose a standby controller, we will wait slightly longer.
ControllerServer controllerServer = controllers.values().iterator().next();
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we no longer have to call controller.waitForReadyBrokers() if we will wait for metadataCache to catch up.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dengziming , yes, you are right, it can still work well if we remove waitForReadyBrokers. But I think it's still useful to keep it, because if we have test failed with "metadata cache not update in time", we can't know if all the brokers are ready (registered and unfenced) or not. That is, with the controller.waitForReadyBrokers(), we can make the tests better for troubleshooting in the future. WDYT?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this makes sense.

Copy link
Copy Markdown
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM

@showuon
Copy link
Copy Markdown
Member Author

showuon commented Apr 19, 2022

Rebase to the latest trunk and the failed tests are unrelated:

    Build / JDK 8 and Scala 2.12 / kafka.admin.TopicCommandIntegrationTest.testDescribeAtMinIsrPartitions(String).quorum=zk
    Build / JDK 8 and Scala 2.12 / kafka.admin.TopicCommandIntegrationTest.testAlterAssignment(String).quorum=kraft
    Build / JDK 8 and Scala 2.12 / kafka.server.KRaftClusterTest.testIncrementalAlterConfigs()
    Build / JDK 8 and Scala 2.12 / kafka.server.KRaftClusterTest.testSetLog4jConfigurations()
    Build / JDK 8 and Scala 2.12 / kafka.server.KRaftClusterTest.testLegacyAlterConfigs()
    Build / JDK 8 and Scala 2.12 / kafka.server.KRaftClusterTest.testCreateClusterAndPerformReassignment()
    Build / JDK 17 and Scala 2.13 / kafka.admin.TopicCommandIntegrationTest.testAlterAssignment(String).quorum=kraft
    Build / JDK 11 and Scala 2.13 / kafka.admin.TopicCommandIntegrationTest.testAlterAssignment(String).quorum=kraft
    Build / JDK 11 and Scala 2.13 / kafka.admin.TopicCommandIntegrationTest.testAlterAssignment(String).quorum=kraft

@showuon showuon merged commit 44906bd into apache:trunk Apr 19, 2022
philipnee added a commit to philipnee/kafka that referenced this pull request Apr 22, 2022
revert test commit

test commit

revert test commit

MINOR: Supplement the description of `Valid Values` in the documentation of `compression.type` (apache#11985)

Because a validator is added to ProducerConfig.COMPRESSION_TYPE_CONFIG and KafkaConfig.CompressionTypeProp, the corresponding testCase is improved to verify whether the wrong value of compression.type will throw a ConfigException.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

MINOR: Correct Connect docs on connector/task states (apache#11914)

The `DESTROYED` state is represented internally as a tombstone record when running in distributed mode and by the removal of the connector/task from the in-memory status map when running in standalone mode. As a result, it will never appear to users of the REST API, and we should remove mention of it from our docs so that developers creating tooling against the REST API don't write unnecessary logic to account for that state.

Reviewers: Mickael Maison <mickael.maison@gmail.com>

KAFKA-13828; Ensure reasons sent by the consumer are small (apache#12043)

This PR reworks the reasons used in the ConsumerCoordinator to ensure that they remain reasonably short.

Reviewers: Bruno Cadonna <bruno@confluent.io>

KAFKA-13542: Add rebalance reason in Kafka Streams (apache#12018)

Reviewers: Bruno Cadonna <bruno@confluent.io>, David Jacot <djacot@confluent.io>

MINOR: Verify stopReplica if broker epoch not stale (apache#12040)

Verify that ReplicaManager.stopReplica is called if the stop replica
request doesn't result in a stale broker epoch error.

Reviewers: Mickael Maison <mimaison@users.noreply.github.com>

KAFKA-13651; Add audit logging to `StandardAuthorizer` (apache#12031)

This patch adds audit support through the kafka.authorizer.logger logger to StandardAuthorizer. It
follows the same conventions as AclAuthorizer with a similarly formatted log message. When
logIfAllowed is set in the Action, then the log message is at DEBUG level; otherwise, we log at
trace. When logIfDenied is set, then the log message is at INFO level; otherwise, we again log at
TRACE.

Reviewers: Colin P. McCabe <cmccabe@apache.org>

KAFKA-13743: Prevent topics with conflicting metrics names from being created in KRaft mode apache#11910

In ZK mode, the topic "foo_bar" will conflict with "foo.bar" because of limitations in metric
names. We should implement this in KRaft mode.  This PR also changes TopicCommandIntegrationTest to
support KRaft mode.

Reviewers: Colin P. McCabe <cmccabe@apache.org>

KAFKA-12613: Fix inconsistent validation logic between KafkaConfig and LogConfig (apache#10472)

Reviewers: Mickael Maison <mickael.maison@gmail.com>

KAFKA-13823 Feature flag changes from KIP-778 (apache#12036)

This PR includes the changes to feature flags that were outlined in KIP-778.  Specifically, it
changes UpdateFeatures and FeatureLevelRecord to remove the maximum version level. It also adds
dry-run to the RPC so the controller can actually attempt the upgrade (rather than the client). It
introduces an upgrade type enum, which supersedes the allowDowngrade boolean. Because
FeatureLevelRecord was unused previously, we do not need to introduce a new version.

The kafka-features.sh tool was overhauled in KIP-778 and now includes the describe, upgrade,
downgrade, and disable sub-commands.  Refer to
[KIP-778](https://cwiki.apache.org/confluence/display/KAFKA/KIP-778%3A+KRaft+Upgrades) for more
details on the new command structure.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>

MINOR: Update LICENSE-binary (apache#12051)

Updates the license file.

Reviewer: Bill Bejeck <bbejeck@apache.org>

MINOR: Remove redundant conditional judgments in Selector.clear() (apache#12048)

Condition 'sendFailed' is always 'false' when reached.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

MINOR: ignore unused configuration when ConsumerCoordinator is not constructed (apache#12041)

Following PR apache#11940, ignore unused config when ConsumerCoordinator is not constructed.

Reviewers: Guozhang Wang <wangguoz@gmail.com>

[MINOR] Update upgrade documentation for 3.2 (apache#12055)

Reviewer: Bruno Cadonna <cadonna@apache.org>

MINOR: Move some TopicCommand and ConfigCommand integration tests to unit tests (apache#12024)

Move some TopicCommand and ConfigCommand integration tests to unit tests to speed up the tests

Reviewers: Luke Chen <showuon@gmail.com>

MINOR: Improve the description of principal under different mechanisms of sasl (apache#11947)

Reviewers: Mickael Maison <mickael.maison@gmail.com>

MINOR: update comment in LocalLog.replaceSegments() (apache#12054)

Reviewers: Luke Chen <showuon@gmail.com>

MINOR: Make link in quickstart dynamic (apache#12057)

Reviewer: Matthias J. Sax <mjsax@apache.org>

KAFKA-13769: Explicitly route FK join results to correct partitions (apache#11945)

Prior to this commit FK response sink routed FK results to
SubscriptionResolverJoinProcessorSupplier using the primary key.

There are cases, where this behavior is incorrect. For example,
if KTable key serde differs from the data source serde which might
happen without a key changing operation.

Instead of determining the resolver partition by serializing the PK
this patch includes target partition in SubscriptionWrapper payloads.
Default FK response-sink partitioner extracts the correct partition
from the value and routes the message accordingly.

Reviewers: Matthias J. Sax <matthias@confluent.io>

KAFKA-13807: Fix incrementalAlterConfig and refactor some things (apache#12033)

Ensure that we can set log.flush.interval.ms at the broker or cluster level via
IncrementalAlterConfigs. This was broken by KAFKA-13749, which added log.flush.interval.ms as the
second synonym rather than the first. Add a regression test to DynamicConfigChangeTest.

Create ControllerRequestContext and pass it to every controller API. This gives us a uniform way to
pass through information like the deadline (if there is one) and the Kafka principal which is
making the request (in the future we will want to log this information).

In ControllerApis, enforce a timeout for broker heartbeat requests which is equal to the heartbeat
request interval, to avoid heartbeats piling up on the controller queue. This should have been done
previously, but we overlooked it.

Add a builder for ClusterControlManager and ReplicationControlManager to avoid the need to deal
with a lot of churn (especially in test code) whenever a new constructor parameter gets added for
one of these.

In ControllerConfigurationValidator, create a separate function for when we just want to validate
that a ConfigResource is a valid target for DescribeConfigs. Previously we had been re-using the
validation code for IncrementalAlterConfigs, but this was messy.

Split out the replica placement code into a separate package and reorganize it a bit.

Reviewers: David Arthur <mumrah@gmail.com

MINOR: Correct spelling errors in KafkaRaftClient (apache#12061)

Correct spelling errors in KafkaRaftClient

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>

MINOR: Fix TestDowngrade.test_upgrade_and_downgrade (apache#12027)

The second validation does not verify the second bounce because the verified producer and the verified consumer are stopped in `self.run_validation`. This means that the second `run_validation` just spit out the same information as the first one. Instead, we should just run the validation at the end.

Reviewers: Jason Gustafson <jason@confluent.io>

KAFKA-8785: fix request timeout by waiting for metadata cache up-to-date (apache#11681)

The reason why this test is flaky is because we have race condition at the beginning of the test, when brokers are staring up, and the adminClient is requesting for brokers metadata. Once the adminClient only got partial metadata, the test will fail, because in these tests, brokers will be shutdown to test leader election.

Fix this issue by explicitly waiting for metadata cache up-to-date in waitForReadyBrokers, and let admin client get created after waitForReadyBrokers.

Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, dengziming <dengziming1993@gmail.com>

KAFKA-13832: Fix flaky testAlterAssignment (apache#12060)

In KRaft mode the metadata is not propagate in time, so we should should wait for it before make assertions.

Reviewers:  Luke Chen <showuon@gmail.com>

KAFKA-13654: Extend KStream process with new Processor API (apache#11993)

Updates the KStream process API to cover the use cases
of both process and transform, and deprecate the KStream transform API.

Implements KIP-820

Reviewer: John Roesler <vvcephei@apache.org>

KAFKA-13835: Fix two bugs related to dynamic broker configs in KRaft (apache#12063)

Fix two bugs related to dynamic broker configs in KRaft. The first bug is that we are calling reloadUpdatedFilesWithoutConfigChange when a topic configuration is changed, but not when a
broker configuration is changed. This is backwards. This function must be called only for broker
configs, and never for topic configs or cluster configs.

The second bug is that there were several configurations such as max.connections which are related
to broker listeners, but which do not involve changing the registered listeners. We should support
these configurations in KRaft. This PR fixes the configuration change validation to support this case.

Reviewers: Jason Gustafson <jason@confluent.io>, Matthew de Detrich <mdedetrich@gmail.com>

KAFKA-10095: Add stricter assertion in LogCleanerManagerTest (apache#12004)

Reviewers: Mickael Maison <mickael.maison@gmail.com>

MINOR: Scala cleanups in core (apache#12058)

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>

KAFKA-13799: Improve documentation for Kafka zero-copy (apache#12052)

Improve documentation for Kafka zero-copy. Kafka combines pagecache and zero-copy to greatly improve message consumption efficiency. But zero-copy only works in PlaintextTransportLayer.

Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

KAFKA-13785: [5/N][emit final] cache for time ordered window store (apache#12030)

A new cache for RocksDBTimeOrderedWindowStore. Need this because RocksDBTimeOrderedWindowStore's key ordering is different from CachingWindowStore which has issues for MergedSortedCacheWindowStoreIterator

Reviewers: Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>

KAFKA-13588: consolidate `changelogFor` methods to simplify the generation of internal topic names (apache#11703)

Reviewers: Guozhang Wang <wangguoz@gmail.com>

HOTFIX: fix broken trunk due to conflicting and overlapping commits (apache#12074)

Reviewers: Victoria Xia <victoria.xia@confluent.io>, David Arthur <mumrah@gmail.com>

MINOR: cleanup for postProcessAndValidateIdempotenceConfigs method (apache#12069)

Reviewers: Luke Chen <showuon@gmail.com>

KAFKA-13834: fix drain batch starving issue (apache#12066)

In drainBatchesForOneNode method, there's possibility causing some partitions in a node will never get picked. Fix this issue by maintaining a drainIndex for each node.

Reviewers: Luke Chen <showuon@gmail.com>, RivenSun <91005273+RivenSun2@users.noreply.github.com>

MINOR: Improve Gradle Caching and Fix Deprecations (apache#12003)

* Fix UP-TO-DATE check in `create*VersionFile` tasks

`create*VersionFile` tasks explicitly declared output UP-TO-DATE status
 as being false. This change properly sets the inputs to
`create*VersionFile` tasks to the `commitId` and `version` values and
sets `receiptFile` locally rather than in an extra property.

* Enable output caching for `process*Messages` tasks

`process*Messages` tasks did not have output caching enabled. This
change enables that caching, as well as setting a property name and
RELATIVE path sensitivity.

* Fix existing Gradle deprecations

Replaces `JavaExec#main` with `JavaExec#mainClass`

Replaces `Report#destination` with `Report#outputLocation`

Adds a `generator` configuration to projects that need to resolve
the `generator` project (rather than referencing the runtimeClasspath
of the `generator` project from other project contexts.

Reviewers: Mickael Maison <mickael.maison@gmail.com>


MINOR; Fix partition change record noop check (apache#12073)

When LeaderRecoveryState was added to the PartitionChangeRecord, the
check for being a noop was not updated. This commit fixes that and
improves the associated test to avoid this oversight in the future.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>

MINOR: revert back to 60s session timeout for static membership test (apache#11881)

Reviewers: Guozhang Wang <wangguoz@gmail.com>

KAFKA-13841: Fix a case where we were unable to place on fenced brokers in KRaft mode (apache#12075)

This PR fixes a case where we were unable to place on fenced brokers In KRaft mode. Specifically,
if we had a broker registration in the metadata log, but no associated heartbeat, previously the
HeartbeatManager would not track the fenced broker. This PR fixes this by adding this logic to the
metadata log replay path in ClusterControlManager.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants